Jack Norris | Strata-Hadoop World 2012

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George Gilder	PERSON	0.99+
Ted Dunning	PERSON	0.99+
Kristin Filetti	PERSON	0.99+
Joel Hellison	PERSON	0.99+
John Schroeder	PERSON	0.99+
Joe	PERSON	0.99+
Jack	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Jack Norris	PERSON	0.99+
John	PERSON	0.99+
40 days	QUANTITY	0.99+
Melinda Graham	PERSON	0.99+
64%	QUANTITY	0.99+
$99	QUANTITY	0.99+
comScore	ORGANIZATION	0.99+
Tim	PERSON	0.99+
Dave	PERSON	0.99+
Tuesday	DATE	0.99+
Matt BARR	PERSON	0.99+
Hellerstein	PERSON	0.99+
Google	ORGANIZATION	0.99+
George Gilder	PERSON	0.99+
Ted	PERSON	0.99+
John ferry	PERSON	0.99+
30 years	QUANTITY	0.99+
30,000 times	QUANTITY	0.99+
today	DATE	0.99+
IBM	ORGANIZATION	0.99+
a week later	DATE	0.99+
yesterday	DATE	0.99+
two	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Dana	PERSON	0.99+
Tim SDS	PERSON	0.99+
one point	QUANTITY	0.99+
Java	TITLE	0.99+
first	QUANTITY	0.99+
six months later	DATE	0.99+
one	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
one customer	QUANTITY	0.99+
Linux	TITLE	0.98+
once a week	QUANTITY	0.98+
18 months	QUANTITY	0.98+
Rubicon	ORGANIZATION	0.98+
HBase	TITLE	0.98+
Kozum	PERSON	0.98+
Gartner	ORGANIZATION	0.98+
this morning	DATE	0.97+
Telekom	ORGANIZATION	0.97+
this week	DATE	0.97+
10 years ago	DATE	0.97+
second dimension	QUANTITY	0.97+
both	QUANTITY	0.97+
Kozum	ORGANIZATION	0.95+
third one	QUANTITY	0.95+
One	QUANTITY	0.94+
three things	QUANTITY	0.94+
a year ago	DATE	0.94+
Hadoop	TITLE	0.93+
siliconangle.com	OTHER	0.93+
Knicks	ORGANIZATION	0.93+
Regents	ORGANIZATION	0.92+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Live from Midtown Manhattan, it's theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (calm electronic music) >> Basil Faruqui, who's the Solutions Marketing Manger at BMC, welcome to theCUBE. >> Thank you, good to be back on theCUBE. >> So first of all, heard you guys had a tough time in Houston, so hope everything's gettin' better, and best wishes to everyone down in-- >> We're definitely in recovery mode now. >> Yeah and so hopefully that can get straightened out quick. What's going on with BMC? Give us a quick update in context to BigData NYC. What's happening, what is BMC doing in the big data space now, the AI space now, the IOT space now, the cloud space? >> So like you said that, you know, the data link space, the IOT space, the AI space, there are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline it's ingestion, storage, processing, and analytics. What keeps changing around it, is the infrastructure, the types of data, the volume of data, and the applications that surround it. And the rate of change has picked up immensely over the last few years with Hadoop coming in to the picture, public cloud providers pushing it. It's obviously creating a number of challenges, but one of the biggest challenges that we are seeing in the market, and we're helping costumers address, is a challenge of automating this and, obviously, the benefit of automation is in scalability as well and reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex, how do you automate all of this from a single point of control? How do you continue to absorb new technologies, and not re-architect our automation strategy every time, whether it's it Hadoop, whether it's bringing in machine learning from a cloud provider? And that is the issue we've been solving for customers-- >> Alright let me jump into it. So, first of all, you mention some things that never change, ingestion, storage, and what's the third one? >> Ingestion, storage, processing and eventually analytics. >> And analytics. >> Okay so that's cool, totally buy that. Now if your move and say, hey okay, if you believe that standard, but now in the modern era that we live in, which is complex, you want breath of data, but also you want the specialization when you get down to machine limits highly bounded, that's where the automation is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Correct >> How do you architect that? If I'm a CXO, or I'm a CDO, what's in it for me? How do I architect this? 'Cause that's really the number one thing, as I know what the building blocks are, but they've changed in their dynamics to the market place. >> So the way I look at it, is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot, and you spend three months on it, and you deliver some results, but if you cannot roll it out worldwide, nationwide, whatever it is, essentially the project has failed. The analogy I often given is Walmart has been testing the pick-up tower, I don't know if you've seen. So this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about a hundred stores today. Now if that's a success, and Walmart wants to roll this out nation wide, how much time do you think their IT department's going to have? Is this a five year project, a ten year project? No, and the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation, you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> But you're describing a very complex automation scenario. How can you automate in a hurry without sacrificing the details of what needs to be? In other words, there would seem to call for repurposing or reusing prior automation scripts and rules, so forth. How can the Walmart's of the world do that fast, but also do it well? >> Yeah so we do it, we go about it in two ways. One is that out of the box we provide a lot of pre-built integrations to some of the most commonly used systems in an enterprise. All the way from the Mainframes, Oracles, SAPs, Hadoop, Tableaus of the world, they're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago was that the automation was something that was considered close to the project becoming operational. Okay, and that's where a lot of rework happened because developers had been writing their own scripts using point solutions, so we said alright, it's time to shift automation left, and allow companies to build automations and artifact very early in the developmental life cycle. About a month ago, we released what we call Control-M Workbench, its essentially a community edition of Control-M, targeted towards developers so that instead of writing their own scripts, they can use Control-M in a completely offline manner, without having to connect to an enterprise system. As they build, and test, and iterate, they're using Control-M to do that. So as the application progresses through the development life cycle, and all of that work can then translate easily into an enterprise edition of Control-M. >> Just want to quickly define what shift left means for the folks that might not know software methodologies, they don't think >> Yeah, so. of left political, left or right. >> So, we're not shifting Control-M-- >> Alt-left, alt-right, I mean, this is software development, so quickly take a minute and explain what shift left means, and the importance of it. >> Correct, so if you think of software development as a straight line continuum, you've got, you will start with building some code, you will do some testing, then unit testing, then user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. Developers would come in and deliver the application to Ops and Ops would say, well hang on a second, all this Crontab, and these other point solutions we've been using for automation, that's not what we use in production, and we need you to now go right in-- >> So test early and often. >> Test early and often. So the challenge was the developers, the tools they used were not the tools that were being used on the production end of the site. And there was good reason for it, because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Now Control-M Workbench is a very light version, which is targeted at developers and focuses on the needs that they have when they're building and developing it. So as the application progresses-- >> How much are you seeing waterfall-- >> But how much can they, go ahead. >> How much are you seeing waterfall, and then people shifting left becoming more prominent now? What percentage of your customers have moved to Agile, and shifting left percentage wise? >> So we survey our customers on a regular basis, and the last survey showed that eighty percent of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it, And that's the other-- >> And getting close to a 100 as possible, pretty much. >> Yeah, exactly. The tipping point is reached. >> And what is driving. >> What is driving all is the need from the business. The days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. >> Iteration, yeah, yeah. And we have also innovated in that space, and the approach we call jobs as code, where you can build entire complex data pipelines in code format, so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and I'll let you take the floor and get a word in soon, but I have one final question on this BMC methodology thing. You guys have a history, obviously BMC goes way back. Remember Max Watson CEO, and Bob Beach, back in '97 we used to chat with him, dominated that landscape. But we're kind of going back to a systems mindset. The question for you is, how do you view the issue of this holy grail, the promised land of AI and machine learning, where end-to-end visibility is really the goal, right? At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So there's a trade-off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market? Because customers want the end-to-end promise, but they don't want to try to get there too fast. There's a diseconomies of scale potentially. How do you talk about that? >> Correct. >> And that's exactly the approach we've taken with Control-M Workbench, the Community Edition, because earlier on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build and test and show value, okay, and they don't need something that is with all the bells and whistles. We're allowing you to handle that piece, in that manner, through Control-M Workbench. As things progress and the application progresses, the needs change as well. Well now I'm closer to delivering this to the business, I need to be able to manage this within an SLA, I need to be able to manage this end-to-end and connect this to other systems of record, and streaming data, and clickstream data, all of that. So that, we believe that it doesn't have to be a trade off, that you don't have to compromise speed and quality for end-to-end visibility and enterprise grade automation. >> You mentioned trade offs, so the Control-M Workbench, the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation when the tool's offline? I mean it seems like the more development they do offline, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate, the mitigation risk in using Control-M Workbench. >> Sure, so we spend a lot of time observing how developers work, right? And very early in the development stage, all they're doing is working off of their Mac or their laptop, and they're not really connected to any. And that is where they end up writing a lot of scripts, because whatever code business logic they've written, the way they're going to make it run is by writing scripts. And that, essentially, becomes the problem, because then you have scripts managing more scripts, and as the application progresses, you have this complex web of scripts and Crontabs and maybe some opensource solutions, trying to simply make all of this run. And by doing this on an offline manner, that doesn't mean that they're losing all of the other Control-M capabilities. Simply, as the application progresses, whatever automation that the builtin Control-M can seamlessly now flow into the next stage. So when you are ready to take an application into production, there's essentially no rework required from an automation perspective. All of that, that was built, can now be translated into the enterprise-grade Control M, and that's where operations can then go in and add the other artifacts, such as SLA management and forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives, 'cause, so you're like an analyst here, so Jim, I want you guys to comment. My question to both of you would be, lookin' at this time in history, obviously in the BMC side we mention some of the history, you guys are transforming on a new journey in extending that capability of this world. Jim, you're covering state-of-the-art AI machine learning. What's your take of this space now? Strata Data, which is now Hadoop World, which is Cloud Air went public, Hortonworks is now public, kind of the big, the Hadoop guys kind of grew up, but the world has changed around them, it's not just about Hadoop anymore. So I'd like to get your thoughts on this kind of perspective, that we're seeing a much broader picture in big data in NYC, versus the Strata Hadoop show, which seems to be losing steam, but I mean in terms of the focus. The bigger focus is much broader, horizontally scalable. And your thoughts on the ecosystem right now? >> Let the Basil answer fist, unless Basil wants me to go first. >> I think that the reason the focus is changing, is because of where the projects are in their lifecycle. Now what we're seeing is most companies are grappling with, how do I take this to the next level? How do I scale? How do I go from just proving out one or two use cases to making the entire organization data driven, and really inject data driven decision making in all facets of decision making? So that is, I believe what's driving the change that we're seeing, that now you've gone from Strata Hadoop to being Strata Data, and focus on that element. And, like I said earlier, the difference between success and failure is your ability to scale and operationalize. Take machine learning for an example. >> Good, that's where there's no, it's not a hype market, it's show me the meat on the bone, show me scale, I got operational concerns of security and what not. >> And machine learning, that's one of the hottest topics. A recent survey I read, which pulled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models, and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of a data scientist's time, and that is exactly one of the problems we're solving for our customers around the world. >> That needs to be automated to the hilt. To help them >> Correct. to be more productive, to deliver faster results. >> Ecosystem perspective, Jim, what's your thoughts? >> Yeah, everything that Basil said, and I'll just point out that many of the core uses cases for AI are automation of the data pipeline. It's driving machine learning driven predictions, classifications, abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's goin' on. The history, historical data, what's goin' on in terms of current streaming data, to drive optimal outcomes, using predictive models and so forth, in line to applications. So really, fundamentally then, what's goin' on is that automation is an artifact that needs to be driven into your application architecture as a repurposable resource for a variety of-- >> Do customers even know what to automate? I mean, that's the question, what do I-- >> You're automating human judgment. You're automating effort, like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that can be automated, 'cause those are pattern structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on with a Glass'Gim CSK, with that scale, and his attitude is, we see the results from the users, then we double down and pay for it and automate it. So the automation question, it's an option question, it's a rhetorical question, but it just begs the question, which is who's writing the algorithms as machines get smarter and start throwing off their own real-time data? What are you looking at? How do you determine? You're going to need machine learning for machine learning? Are you going to need AI for AI? Who writes the algorithms >> It's actually, that's. for the algorithm? >> Automated machine learning is a hot, hot not only research focus, but we're seeing it more and more solution providers, like Microsoft and Google and others, are goin' deep down, doubling down in investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion. I see you're startin' to some things with blockchain and some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now? Because people might have a historical view of BMC. What's the latest, what should they know? What's the new Instagram picture of BMC? What should they know about you guys? >> So I think what I would say people should know about BMC is that all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very, very mature solution, an example of which is Navistar. Their CIO's actually speaking at the Keno tomorrow. They've had Control-M for 15, 20 years, and they've automated virtually every business function through Control-M. And when they started their predictive maintenance project, where they're ingesting data from about 300,000 vehicles today to figure out when this vehicle might break, and to predict maintenance on it. When they started their journey, they said that they always knew that they were going to use Control-M for it, because that was the enterprise standard, and they knew that they could simply now extend that capability into this area. And when they started about three, four years ago, they were ingesting data from about 100,000 vehicles. That has now scaled to over 325,000 vehicles, and they have no had to re-architect their strategy as they grow and scale. So I would say that is one of the key messages that we are taking to market, is that we are bringing innovation that spans over 25 years, and evolving it-- >> Modernizing it, basically. >> Modernizing it, and bringing it to newer platforms. >> Well congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing kind of the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here, on BigData NYC, this is the theCUBE, I'm John Furrier. Jim Kobielus here in New York city. More live coverage, for three days we'll be here, today, tomorrow and Thursday, and BigData NYC, more coverage after this short break. (calm electronic music) (vibrant electronic music)

Published Date : Feb 11 2019

SUMMARY :

Brought to you by SiliconANGLE Media who's the Solutions Marketing Manger at BMC, in the big data space now, the AI space now, And that is the issue we've been solving for customers-- So, first of all, you mention some things that never change, and eventually analytics. but now in the modern era that we live in, 'Cause that's really the number one thing, No, and the management's going to How can the Walmart's of the world do that fast, One is that out of the box we provide a lot of left political, left or right. Alt-left, alt-right, I mean, this is software development, and we need you to now go right in-- and focuses on the needs that they have And getting close to a 100 The tipping point is reached. The days of the five year implementation timelines are gone. and the approach we call jobs as code, At the same time, you want bounded experiences at root level And that's exactly the approach I mean it seems like the more development and as the application progresses, kind of the big, the Hadoop guys kind of grew up, Let the Basil answer fist, and focus on that element. it's not a hype market, it's show me the meat of the problems we're solving That needs to be automated to the hilt. to be more productive, to deliver faster results. and I'll just point out that many of the core uses cases like the judgments that a working data engineer makes So the automation question, it's an option question, for the algorithm? doubling down in investments in exactly that area. What's the latest, what should they know? should know about BMC is that all the work kind of modernizing kind of the core things. Thanks for coming and sharing the BMC perspective

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
BMC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Basil Faruqui	PERSON	0.99+
five year	QUANTITY	0.99+
ten months	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Furrier	PERSON	0.99+
15	QUANTITY	0.99+
Basil	PERSON	0.99+
Houston	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
BMC Software	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
One	QUANTITY	0.99+
ten year	QUANTITY	0.99+
over 25 years	QUANTITY	0.99+
over 325,000 vehicles	QUANTITY	0.99+
about 300,000 vehicles	QUANTITY	0.99+
third one	QUANTITY	0.99+
three days	QUANTITY	0.99+
about 100,000 vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
Thursday	DATE	0.98+
eighty percent	QUANTITY	0.98+
today	DATE	0.98+
20 years	QUANTITY	0.98+
one quick question	QUANTITY	0.98+
single point	QUANTITY	0.98+
Bob Beach	PERSON	0.97+
four years ago	DATE	0.97+
two use cases	QUANTITY	0.97+
one final question	QUANTITY	0.97+
'97	DATE	0.97+
Instagram	ORGANIZATION	0.97+
Agile	TITLE	0.96+
New York city	LOCATION	0.96+
About a month ago	DATE	0.96+
Oracles	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
about a hundred stores	QUANTITY	0.94+
less than 3%	QUANTITY	0.94+
2017	DATE	0.93+
Glass'Gim	ORGANIZATION	0.92+
about	QUANTITY	0.92+
first	QUANTITY	0.91+
Ops	ORGANIZATION	0.91+
Hadoop	ORGANIZATION	0.9+
Max Watson	PERSON	0.88+
100	QUANTITY	0.88+
theCUBE	ORGANIZATION	0.88+
Mainframes	ORGANIZATION	0.88+
Navistar	ORGANIZATION	0.86+

Basil Faruqui, BMC | theCUBE NYC 2018

(upbeat music) >> Live from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Okay, welcome back everyone to theCUBE NYC. This is theCUBE's live coverage covering CubeNYC Strata Hadoop Strata Data Conference. All things data happen here in New York this week. I'm John Furrier with Peter Burris. Our next guest is Basil Faruqui lead solutions marketing manager digital business automation within BMC returns, he was here last year with us and also Big Data SV, which has been renamed CubeNYC, Cube SV because it's not just big data anymore. We're hearing words like multi cloud, Istio, all those Kubernetes. Data now is so important, it's now up and down the stack, impacting everyone, we talked about this last year with Control M, how you guys are automating in a hurry. The four pillars of pipelining data. The setup days are over; welcome to theCUBE. >> Well thank you and it's great to be back on theCUBE. And yeah, what you said is exactly right, so you know, big data has really, I think now been distilled down to data. Everybody understands data is big, and it's important, and it is really you know, it's quite a cliche, but to a larger degree, data is the new oil, as some people say. And I think what you said earlier is important in that we've been very fortunate to be able to not only follow the journey of our customers but be a part of it. So about six years ago, some of the early adopters of Hadoop came to us and said that look, we use your products for traditional data warehousing on the ERP side for orchestration workloads. We're about to take some of these projects on Hadoop into production and really feel that the Hadoop ecosystem is lacking enterprise-grade workflow orchestration tools. So we partnered with them and some of the earliest goals they wanted to achieve was build a data lake, provide richer and wider data sets to the end users to be able to do some dashboarding, customer 360, and things of that nature. Very quickly, in about five years time, we have seen a lot of these projects mature from how do I build a data lake to now applying cutting-edge ML and AI and cloud is a major enabler of that. You know, it's really, as we were talking about earlier, it's really taking away excuses for not being able to scale quickly from an infrastructure perspective. Now you're talking about is it Hadoop or is it S3 or is it Azure Blob Storage, is it Snowflake? And from a control-end perspective, we're very platform and technology agnostic, so some of our customers who had started with Hadoop as a platform, they are now looking at other technologies like Snowflake, so one of our customers describes it as kind of the spine or a power strip of orchestration where regardless of what technology you have, you can just plug and play in and not worry about how do I rewire the orchestration workflows because control end is taking care of it. >> Well you probably always will have to worry about that to some degree. But I think where you're going, and this is where I'm going to test with you, is that as analytics, as data is increasingly recognized as a strategic asset, as analytics increasingly recognizes the way that you create value out of those data assets, and as a business becomes increasingly dependent upon the output of analytics to make decisions and ultimately through AI to act differently in markets, you are embedding these capabilities or these technologies deeper into business. They have to become capabilities. They have to become dependable. They have to become reliable, predictable, cost, performance, all these other things. That suggests that ultimately, the historical approach of focusing on the technology and trying to apply it to a periodic or series of data science problems has to become a little bit more mature so it actually becomes a strategic capability. So the business can say we're operating on this, but the technologies to take that underlying data science technology to turn into business operations that's where a lot of the net work has to happen. Is that what you guys are focused on? >> Yeah, absolutely, and I think one of the big differences that we're seeing in general in the industry is that this time around, the pull of how do you enable technology to drive the business is really coming from the line of business, versus starting on the technology side of the house and then coming to the business and saying hey we've got some cool technologies that can probably help you, it's really line of business now saying no, I need better analytics so I can drive new business models for my company, right? So the need for speed is greater than ever because the pull is from the line of business side. And this is another area where we are unique is that, you know, Control M has been designed in a way where it's not just a set of solutions or tools for the technical guys. Now, the line of business is getting closer and closer, you know, it's blending into the technical side as well. They have a very, very keen interest in understanding are the dashboards going to be refreshed on time? Are we going to be able to get all the right promotional offers at the right time? I mean, we're here at NYC Strata, there's a lot of real-time promotion happening here. The line of business has direct interest in the delivery and the timing of all of this, so we have always had multiple interfaces to Control M where a business user who has an interest in understanding are the promotional offers going to happen at the right time and is that on schedule? They have a mobile app for them to do that. A developer who's building up complex, multi-application platform, they have an API and a programmatic interface to do that. Operations that has to monitor all of this has rich dashboards to be able to do that. That's one of the areas that has been key for our success over the last couple decades, and we're seeing that translate very well into the big data place. >> So I just want to go under the hood for a minute because I love that answer. And I'd like to pivot off what Peter said, tying it back to the business, okay, that's awesome. And I want to learn a little bit more about this because we talked about this last year and I kind of am seeing it now. Kubernetes and all this orchestration is about workloads. You guys nailed the workflow issue, complex workflows. Because if you look at it, if you're adding line of business into the equation, that's just complexity in and of itself. As more workflows exist within its own line of business, whether it's recommendations and offers and workflow issues, more lines of business in there is complex for even IT to deal with, so you guys have nailed that. How does that work? Do you plug it in and the lines of businesses have their own developers, so the people who work with the workflows engage how? >> So that's a good question, with sort of orchestration and automation now becoming very, very generic, it's kind of important to classify where we play. So there's a lot of tools that do release and build automation. There's a lot of tools that'll do infrastructure automation and orchestration. All of this infrastructure and release management process is done ultimately to run applications on top of it, and the workflows of the application need orchestration and that's the layer that we play in. And if you think about how does the end user, the business and consumer interact with all of this technology is through applications, k? So the orchestration of the workflow's inside the applications, whether you start all the way from an ERP or a CRM and then you land into a data lake and then do an ML model, and then out come the recommendations analytics, that's the layer we are automating today. Obviously, all of this-- >> By the way, the technical complexity for the user's in the app. >> Correct, so the line of business obviously has a lot more control, you're seeing roles like chief digital officers emerge, you're seeing CTOs that have mandates like okay you're going to be responsible for all applications that are facing customer facing where the CIO is going to take care of everything that's inward facing. It's not a settled structure or science involved. >> It's evolving fast. >> It's evolving fast. But what's clear is that line of business has a lot more interest and influence in driving these technology projects and it's important that technologies evolve in a way where line of business can not only understand but take advantage of that. >> So I think it's a great question, John, and I want to build on that and then ask you something. So the way we look at the world is we say the first fifty years of computing were known process, unknown technology. The next fifty years are going to be unknown process, known technology. It's all going to look like a cloud. But think about what that means. Known process, unknown technology, Control M and related types of technologies tended to focus on how you put in place predictable workflows in the technology layer. And now, unknown process, known technology, driven by the line of business, now we're talking about controlling process flows that are being created, bespoke, strategic, differentiating doing business. >> Well, dynamic, too, I mean, dynamic. >> Highly dynamic, and those workflows in many respects, those technologies, piecing applications and services together, become the process that differentiates the business. Again, you're still focused on the infrastructure a bit, but you've moved it up. Is that right? >> Yeah, that's exactly right. We see our goal as abstracting the complexity of the underlying application data and infrastructure. So, I mean, it's quite amazing-- >> So it could be easily reconfigured to a business's needs. >> Exactly, so whether you're on Hadoop and now you're thinking about moving to Snowflake or tomorrow something else that comes up, the orchestration or the workflow, you know, that's as a business as a product that's our goal is to continue to evolve quickly and in a manner that we continue to abstract the complexity so from-- >> So I've got to ask you, we've been having a lot of conversations around Hadoop versus Kubernetes on multi cloud, so as cloud has certainly come in and changed the game, there's no debate on that. How it changes is debatable, but we know that multiple clouds is going to be the modus operandus for customers. >> Correct. >> So I got a lot of data and now I've got pipelining complexities and workflows are going to get even more complex, potentially. How do you see the impact of the cloud, how are you guys looking at that, and what are some customer use cases that you see for you guys? >> So the, what I mentioned earlier, that being platform and technology agnostic is actually one of the unique differentiating factors for us, so whether you are an AWS or an Azure or a Google or On-Prem or still on a mainframe, a lot of, we're in New York, a lot of the banks, insurance companies here still do some of the most critical processing on the mainframe. The ability to abstract all of that whether it's cloud or legacy solutions is one of our key enablers for our customers, and I'll give you an example. So Malwarebytes is one of our customers and they've been using Control M for several years. Primarily the entire structure is built on AWS, but they are now utilizing Google cloud for some of their recommendation analysis on sentiment analysis because their goal is to pick the best of breed technology for the problem they're looking to solve. >> Service, the best breed service is in the cloud. >> The best breed service is in the cloud to solve the business problem. So from Control M's perspective, transcending from AWS to Google cloud is completely abstracted for them, so runs Google tomorrow it's Azure, they decide to build a private cloud, they will be able to extend the same workflow orchestration. >> But you can build these workflows across whatever set of services are available. >> Correct, and you bring up an important point. It's not only being able to build the workflows across platforms but being able to define dependencies and track the dependencies across all of this, because none of this is happening in silos. If you want to use Google's API to do the recommendations, well, you've got to feed it the data, and the data's pipeline, like we talked about last time, data ingestion, data storage, data processing, and analytics have very, very intricate dependencies, and these solutions should be able to manage not only the building of the workflow but the dependencies as well. >> But you're defining those elements as fundamental building blocks through a control model >> Correct. >> That allows you to treat the higher level services as reliable, consistent, capabilities. >> Correct, and the other thing I would like to add here is not only just build complex multiplatform, multiapplication workflows, but never lose focus of the business service of the business process there, so you can tie all of this to a business service and then, these things are complex, there are problems, let's say there's an ETL job that fails somewhere upstream, Control M will immediately be able to predict the impact and be able to tell you this means the recommendation engine will not be able to make the recommendations. Now, the staff that's going to work under mediation understands the business impact versus looking at a screen where there's 500 jobs and one of them has failed. What does that really mean? >> Set priorities and focal points and everything else. >> Right. >> So I just want to wrap up by asking you how your talk went at Strata Hadoop Data Conference. What were you talking about, what was the core message? Was it Control M, was it customer presentations? What was the focus? >> So the focus of yesterday's talk was actually, you know, one of the things is academic talk is great, but it's important to, you know, show how things work in real life. The session was focused on a real-use case from a customer. Navistar, they have IOT data-driven pipelines where they are predicting failures of parts inside trucks and buses that they manufacture, you know, reducing vehicle downtime. So we wanted to simulate a demo like that, so that's exactly what we did. It was very well received. In real-time, we spun up EMR environment in AWS, automatically provision control of infrastructure there, we applied spark and machine learning algorithms to the data and out came the recommendation at the end was that, you know, here are the vehicles that are-- >> Fix their brakes. (laughing) >> Exactly, so it was very, very well received. >> I mean, there's a real-world example, there's real money to be saved, maintenance, scheduling, potential liability, accidents. >> Liability is a huge issue for a lot of manufacturers. >> And Navistar has been at the leading edge of how to apply technologies in that business. >> They really have been a poster child for visual transformation. >> They sure have. >> Here's a company that's been around for 100 plus years and when we talk to them they tell us that we have every technology under the sun that has come since the mainframe, and for them to be transforming and leading in this way, we're very fortunate to be part of their journey. >> Well we'd love to talk more about some of these customer use cases. Other people love about theCUBE, we want to do more of them, share those examples, people love to see proof in real-world examples, not just talk so appreciate it sharing. >> Absolutely. >> Thanks for sharing, thanks for the insights. We're here Cube live in New York City, part of CubeNYC, we're getting all the data, sharing that with you. I'm John Furrier with Peter Burris. Stay with us for more day two coverage after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media with Control M, how you guys are automating in a hurry. describes it as kind of the spine or a power strip but the technologies to take that underlying of the house and then coming to the business You guys nailed the workflow issue, and that's the layer that we play in. for the user's in the app. Correct, so the line of business and it's important that technologies evolve in a way So the way we look at the world is we say that differentiates the business. of the underlying application data and infrastructure. so as cloud has certainly come in and changed the game, and what are some customer use cases that you see for the problem they're looking to solve. is in the cloud. The best breed service is in the cloud But you can build these workflows across and the data's pipeline, like we talked about last time, That allows you to treat the higher level services and be able to tell you this means the recommendation engine So I just want to wrap up by asking you at the end was that, you know, Fix their brakes. there's real money to be saved, And Navistar has been at the leading edge of how They really have been a poster child for and for them to be transforming and leading in this way, people love to see proof in real-world examples, Thanks for sharing, thanks for the insights.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
Peter Burris	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Peter	PERSON	0.99+
500 jobs	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
New York	LOCATION	0.99+
last year	DATE	0.99+
AWS	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Hadoop	TITLE	0.99+
first fifty years	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
Navistar	ORGANIZATION	0.99+
tomorrow	DATE	0.98+
yesterday	DATE	0.98+
one	QUANTITY	0.98+
this week	DATE	0.97+
Malwarebytes	ORGANIZATION	0.97+
Cube	ORGANIZATION	0.95+
Control M	ORGANIZATION	0.95+
NYC	LOCATION	0.95+
Snowflake	TITLE	0.95+
Strata Hadoop Data Conference	EVENT	0.94+
100 plus years	QUANTITY	0.93+
CubeNYC Strata Hadoop Strata Data Conference	EVENT	0.92+
last couple decades	DATE	0.91+
Azure	TITLE	0.91+
about five years	QUANTITY	0.91+
Istio	ORGANIZATION	0.9+
CubeNYC	ORGANIZATION	0.89+
day	QUANTITY	0.87+
about six years ago	DATE	0.85+
Kubernetes	TITLE	0.85+
today	DATE	0.84+
NYC Strata	ORGANIZATION	0.83+
Hadoop	ORGANIZATION	0.78+
one of them	QUANTITY	0.77+
Big Data SV	ORGANIZATION	0.75+
2018	EVENT	0.7+
Kubernetes	ORGANIZATION	0.66+
fifty years	DATE	0.62+
Control M	TITLE	0.61+
four pillars	QUANTITY	0.61+
two	QUANTITY	0.6+
-Prem	ORGANIZATION	0.6+
Cube SV	COMMERCIAL_ITEM	0.58+
a minute	QUANTITY	0.58+
S3	TITLE	0.55+
Azure	ORGANIZATION	0.49+
cloud	TITLE	0.49+
2018	DATE	0.43+

Stephanie McReynolds, Alation | theCUBE NYC 2018

>> Live from New York, It's theCUBE! Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. >> Hello and welcome back to theCUBE live in New York City, here for CUBE NYC. In conjunct with Strata Conference, Strata Data, Strata Hadoop This is our ninth year covering the big data ecosystem which has evolved into machine learning, A.I., data science, cloud, a lot of great things happening all things data, impacting all businesses I'm John Furrier, your host with Dave Vellante and Peter Burris, Peter is filling in for Dave Vellante. Next guest, Stephanie McReynolds who is the CMO, VP of Marketing for Alation, thanks for joining us. >> Thanks for having me. >> Good to see you. So you guys have a pretty spectacular exhibit here in New York. I want to get to that right away, top story is Attack of the Bots. And you're showing a great demo. Explain what you guys are doing in the show. >> Yah, well it's robot fighting time in our booth, so we brought a little fun to the show floor my kids are.. >> You mean big data is not fun enough? >> Well big data is pretty fun but occasionally you got to get your geek battle on there so we're having fun with robots but I think the real story in the Alation booth is about the product and how machine learning data catalogs are helping a whole variety of users in the organization everything from improving analyst productivity and even some business user productivity of data to then really supporting data scientists in their work by helping them to distribute their data products through a data catalog. >> You guys are one of the new guard companies that are doing things that make it really easy for people who want to use data, practitioners that the average data citizen has been called, or people who want productivity. Not necessarily the hardcore, setting up clusters, really kind of like the big data user. What's that market look like right now, has it met your expectations, how's business, what's the update? >> Yah, I think we have a strong perspective that for us to close the final mile and get to real value out of the data, it's a human challenge, there's a trust gap with managers. Today on stage over at STRATA it was interesting because Google had a speaker and it wasn't their chief data officer it was their chief decision scientist and I think that reflects what that final mile is is that making decisions and it's the trust gap that managers have with data because they don't know how the insides are coming to them, what are all the details underneath. In order to be able to trust decisions you have to understand who processed the data, what decision making criteria did they use, was this data governed well, are we introducing some bias into our algorithms, and can that be controlled? And so Alation becomes a platform for supporting getting answers to those issues. And then there's plenty of other companies that are optimizing the performance of those QUERYS and the storage of that data, but we're trying to really to close that trust gap. >> It's very interesting because from a management standpoint we're trying to do more evidence based management. So there's a major trend in board rooms, and executive offices to try to find ways to acculturate the executive team to using data, evidence based management healthcare now being applied to a lot of other domains. We've also historically had a situation where the people who focused or worked with the data was a relatively small coterie of individuals that crave these crazy systems to try to bring those two together. It sounds like what you're doing, and I really like the idea of the data scientists, being able to create data products that then can be distributed. It sounds like you're trying to look at data as an asset to be created, to be distributed so they can be more easily used by more people in your organization, have we got that right? >> Absolutely. So we're now seeing we're in just over a hundred production implementations of Alation, at large enterprises, and we're now seeing those production implementations get into the thousands of users. So this is going beyond those data specialists. Beyond the unicorn data scientists that understand the systems and math and technology. >> And business. >> And business, right. In business. So what we're seeing now is that a data catalog can be a point of collaboration across those different audiences in an enterprise. So whereas three years ago some of our initial customers kept the data catalog implementations small, right. They were getting access to the specialists to this catalog and asked them to certify data assets for others, what were starting to see is a proliferation of creation of self service data assets, a certification process that now is enterprise-wide, and thousands of users in these organizations. So Ebay has over a thousand weekly logins, Munich Reinsurance was on stage yesterday, their head of data engineering said they have 2,000 users on Alation at this point on their data lake, Fiserv is going to speak on Thursday and they're getting up to those numbers as well, so we see some really solid organizations that are solving medical, pharmaceutical issues, right, the largest re insurer in the world leading tech companies, starting to adopt a data catalog as a foundation for how their going to make those data driven decisions in the organization. >> Talk about how the product works because essentially you're bringing kind of the decision scientists, for lack of a better word, and productivity worker, almost like a business office suite concept, as a SAS, so you got a SAS model that says "Hey you want to play with data, use it but you have to do some front end work." Take us through how you guys roll out the platform, how are your customers consuming the service, take us through the engagement with customers. >> I think for customers, the most interesting part of this product is that it displays itself as an application that anyone can use, right? So there's a super familiar search interface that, rather than bringing back webpages, allows you to search for data assets in your organization. If you want more information on that data asset you click on those search results and you can see all of the information of how that data has been used in the organization, as well as the technical details and the technical metadata. And I think what's even more powerful is we actually have a recommendation engine that recommends data assets to the user. And that can be plugged into Tablo and Salesworth, Einstein Analytics, and a whole variety of other data science tools like Data Haiku that you might be using in your organization. So this looks like a very easy to use application that folks are familiar with that you just need a web browser to access, but on the backend, the hard work that's happening is the automation that we do with the platform. So by going out and crawling these source systems and looking at not just the technical descriptions of data, the metadata that exists, but then being able to understand by parsing the sequel weblogs, how that data is actually being used in the organization. We call it behavior I.O. by looking at the behaviors of how that data's being used, from those logs, we can actually give you a really good sense of how that data should be used in the future or where you might have gaps in governing that data or how you might want to reorient your storage or compute infrastructure to support the type of analytics that are actually being executed by real humans in your organization. And that's eye opening to a lot of I.T. sources. >> So you're providing insights to the data usage so that the business could get optimized for whether it's I.T. footprint component, or kinds of use cases, is that kind of how it's working? >> So what's interesting is the optimization actually happens in a pretty automated way, because we can make recommendations to those consumers of data of how they want to navigate the system. Kind of like Google makes recommendations as you browse the web, right? >> If you misspell something, "Oh did you mean this", kind of thing? >> "Did you mean this, might you also be interested in this", right? It's kind of a cross between Google and Amazon. Others like you may have used these other data assets in the past to determine revenue for that particular region, have you thought about using this filter, have you thought about using this join, did you know that you're trying to do analysis that maybe the sales ops guy has already done, and here's the certified report, why don't you just start with that? We're seeing a lot of reuse in organizations, wherein the past I think as an industry when Tablo and Click and all these B.I tools that were very self service oriented started to take off it was all about democratizing visualization by letting every user do their own thing and now we're realizing to get speed and accuracy and efficiency and effectiveness maybe there's more reuse of the work we've already done in existing data assets and by recommending those and expanding the data literacy around the interpretation of those, you might actually close this trust gap with the data. >> But there's one really important point that you raised, and I want to come back to it, and that is this notion of bias. So you know, Alation knows something about the data, knows a lot about the metadata, so therefore, I don't want to say understands, but it's capable of categorizing data in that way. And you're also able to look at the usage of that data by parsing some of sequel statements and then making a determination of the data as it's identified is appropriately being used based on how people are actually applying it so you can identify potential bias or potential misuse or whatever else it might be. That is an incredibly important thing. As you know John, we had an event last night and one of the things that popped up is how do you deal with emergence in data science in A.I, etc. And what methods do you put in place to actually ensure that the governance model can be extended to understand how those things are potentially in a very soft way, corrupting the use of the data. So could you spend a little bit more time talking about that because it's something a lot of people are interested in, quite frankly we don't know about a lot of tools that are doing that kind of work right now. It's an important point. >> I think the traditional viewpoint was if we just can manage the data we will be able to have a govern system. So if we control the inputs then well have a safe environment, and that was kind of like the classic single source of truth, data warehouse type model. >> Stewards of the data. >> What we're seeing is with the proliferation of sources of data and how quickly with IOT and new modern sources, data is getting created, you're not able to manage data at that point of that entry point. And it's not just about systems, it's about individuals that go on the web and find a dataset and then load it into a corporate database, right? Or you merge an Excel file with something that in a database. And so I think what we see happening, not only when you look at bias but if you look at some of the new regulations like [Inaudible] >> Sure. Ownership, [Inaudible] >> The logic that you're using to process that data, the algorithm itself can be biased, if you have a biased training data site that you feed it into a machine learning algorithm, the algorithm itself is going to be biased. And so the control point in this world where data is proliferating and we're not sure we can control that entirely, becomes the logic embedded in the algorithm. Even if that's a simple sequel statement that's feeding a report. And so Alation is able to introspect that sequel and highlight that maybe there is bias at work and how this algorithm is composed. So with GDPR the consumer owns their own data, if they want to pull it out from a training data set, you got to rerun that algorithm without that consumer data and that's your control point then going forward for the organization on different governance issues that pop up. >> Talk about the psychology of the user base because one of the things that shifted in the data world is a few stewards of data managed everything, now you've got a model where literally thousands of people of an organization could be users, productivity users, so you get a social component in here that people know who's doing data work, which in a way, creates a new persona or class of worker. A non techy worker. >> Yeah. It's interesting if you think about moving access to the data and moving the individuals that are creating algorithms out to a broader user group, what's important, you have to make sure that you're educating and training and sharing knowledge with that democratized audience, right? And to be able to do that you kind of want to work with human psychology, right? You want to be able to give people guidance in the course of their work rather than have them memorize a set of rules and try to remember to apply those. If you had a specialist group you can kind of control and force them to memorize and then apply, the more modern approach is to say "look, with some of these machine learning techniques that we have, why don't we make a recommendation." What you're going to do is introduce bias into that calculation. >> And we're capturing that information as you use the data. >> Well were also making a recommendation to say "Hey do you know you're doing this? Maybe you don't want to do that." Most people are using the data are not bad actors. They just can't remember all the rule sets to apply. So what were trying to do is cut someone behaviorally in the act before they make that mistake and say hey just a bit of a reminder, a bit of a coaching moment, did you know what you're doing? Maybe you can think of another approach to this. And we've found that many organizations that changes the discussion around data governance. It's no longer this top down constraint to finding insight, which frustrates an audience, is trying to use that data. It's more like a coach helping you improve and then social aspect of wanting to contribute to the system comes into play and people start communicating, collaborating, the platform and curating information a little bit. >> I remember when Microsoft Excel came out, the spreadsheet, or Lotus 123, oh my God, people are going to use these amazing things with spreadsheets, they did. You're taking a similar approach with analytics, much bigger surface area of work to kind of attack from a data perspective, but in a way kind of the same kind of concept, put the hands of the users, have the data in their hands so to speak. >> Yeah, enable everyone to make data driven decisions. But make sure that they're interpreting that data in the right way, right? Give them enough guidance, don't let them just kind of attack the wild west and fair it out. >> Well looking back at the Microsoft Excel spreadsheet example, I remember when a finance department would send a formatted spreadsheet with all the rules for how to use it out of 50 different groups around the world, and everyone figured out that you can go in and manipulate the macros and deliver any results they want. And so it's that same notion, you have to know something about that, but this site, in many respects Stephanie you're describing a data governance model that really is more truly governance, that if we think about a data asset it's how do we mediate a lot of different claims against that set of data so that its used appropriately, so its not corrupted, so that it doesn't effect other people, but very importantly so that the out6comes are easier to agree upon because there's some trust and there's some valid behaviors and there's some verification in the flow of the data utilization. >> And where we give voice to a number of different constituencies. Because business opinions from different departments can run slightly counter to one another. There can be friction in how to use particular data assets in the business depending on the lens that you have in that business and so what were trying to do is surface those different perspectives, give them voice, allow those constituencies to work that out in a platform that captures that debate, captures that knowledge, makes that debate a knowledge of foundation to build upon so in many ways its kind of like the scientific method, right? As a scientist I publish a paper. >> Get peer reviewed. >> Get peer reviewed, let other people weigh in. >> And it becomes part of the canon of knowledge. >> And it becomes part of the canon. And in the scientific community over the last several years you see that folks are publishing their data sets out publicly, why can't an enterprise do the same thing internally for different business groups internally. Take the same approach. Allow others to weigh in. It gets them better insights and it gets them more trust in that foundation. >> You get collective intelligence from the user base to help come in and make the data smarter and sharper. >> Yeah and have reusable assets that you can then build upon to find the higher level insights. Don't run the same report that a hundred people in the organization have already run. >> So the final question for you. As you guys are emerging, starting to do really well, you have a unique approach, honestly we think it fits in kind of the new guard of analytics, a productivity worker with data, which is we think is going to be a huge persona, where are you guys winning, and why are you winning with your customer base? What are some things that are resonating as you go in and engage with prospects and customers and existing customers? What are they attracted to, what are they like, and why are you beating the competition in your sales and opportunities? >> I think this concept of a more agile, grassroots approach to data governance is a breath of fresh air for anyone who spend their career in the data space. Were at a turning point in industry where you're now seeing chief decision scientists, chief data officers, chief analytic officers take a leadership role in organizations. Munich Reinsurance is using their data team to actually invest and hold new arms of their business. That's how they're pushing the envelope on leadership in the insurance space and were seeing that across our install base. Alation becomes this knowledge repository for all of those mines in the organization, and encourages a community to be built around data and insightful questions of data. And in that way the whole organization raises to the next level and I think its that vision of what can be created internally, how we can move away from just claiming that were a big data organization and really starting to see the impact of how new business models can be creative in these data assets, that's exciting to our customer base. >> Well congratulations. A hot start up. Alation here on theCUBE in New York City for cubeNYC. Changing the game on analytics, bringing a breath of fresh air to hands of the users. A new persona developing. Congratulations, great to have you. Stephanie McReynolds. Its the cube. Stay with us for more live coverage, day one of two days live in New York City. We'll be right back.

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media the CMO, VP of Marketing for Alation, thanks for joining us. So you guys have a pretty spectacular so we brought a little fun to the show floor in the Alation booth is about the product You guys are one of the new guard companies is that making decisions and it's the trust gap and I really like the idea of the data scientists, production implementations get into the thousands of users. and asked them to certify data assets for others, kind of the decision scientists, gaps in governing that data or how you might want to so that the business could get optimized as you browse the web, right? in the past to determine revenue for that particular region, and one of the things that popped up is how do you deal and that was kind of like the classic it's about individuals that go on the web and find a dataset the algorithm itself is going to be biased. because one of the things that shifted in the data world And to be able to do that you kind of They just can't remember all the rule sets to apply. have the data in their hands so to speak. that data in the right way, right? and everyone figured out that you can go in in the business depending on the lens that you have And in the scientific community over the last several years You get collective intelligence from the user base Yeah and have reusable assets that you can then build upon and why are you winning with your customer base? and really starting to see the impact of how new business bringing a breath of fresh air to hands of the users.

ENTITIES

Entity	Category	Confidence
Stephanie McReynolds	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Peter Burris	PERSON	0.99+
Google	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
Thursday	DATE	0.99+
New York	LOCATION	0.99+
John Furrier	PERSON	0.99+
50 different groups	QUANTITY	0.99+
Peter	PERSON	0.99+
New York City	LOCATION	0.99+
Ebay	ORGANIZATION	0.99+
2,000 users	QUANTITY	0.99+
Excel	TITLE	0.99+
Attack of the Bots	TITLE	0.99+
thousands	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
yesterday	DATE	0.99+
ninth year	QUANTITY	0.99+
two	QUANTITY	0.99+
STRATA	ORGANIZATION	0.99+
Today	DATE	0.99+
Fiserv	ORGANIZATION	0.99+
last night	DATE	0.99+
three years ago	DATE	0.99+
Alation	PERSON	0.99+
NYC	LOCATION	0.98+
Lotus 123	TITLE	0.98+
Munich Reinsurance	ORGANIZATION	0.98+
one	QUANTITY	0.98+
GDPR	TITLE	0.97+
Alation	ORGANIZATION	0.96+
Microsoft	ORGANIZATION	0.94+
SAS	ORGANIZATION	0.94+
over a thousand weekly logins	QUANTITY	0.91+
theCUBE	ORGANIZATION	0.9+
Strata Conference	EVENT	0.89+
single source	QUANTITY	0.86+
thousands of people	QUANTITY	0.86+
thousands of users	QUANTITY	0.84+
Tablo	ORGANIZATION	0.83+
day one	QUANTITY	0.78+
2018	EVENT	0.75+
CUBE	ORGANIZATION	0.75+
Salesworth	ORGANIZATION	0.74+
Einstein Analytics	ORGANIZATION	0.73+
Tablo	TITLE	0.73+
Strata Hadoop	EVENT	0.73+
a hundred people	QUANTITY	0.7+
2018	DATE	0.66+
point	QUANTITY	0.63+
years	DATE	0.63+
Alation	LOCATION	0.62+
Click	ORGANIZATION	0.62+
Munich Reinsurance	TITLE	0.6+
over a hundred	QUANTITY	0.59+
Data	ORGANIZATION	0.58+
Strata Data	EVENT	0.57+
last	DATE	0.55+
Haiku	TITLE	0.47+

Kickoff | theCUBE NYC 2018

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

ENTITIES

Entity	Category	Confidence
Apple	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Diane Greene	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
John	PERSON	0.99+
Alibaba	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jeff Hammerbacher	PERSON	0.99+
$30	QUANTITY	0.99+
New York	LOCATION	0.99+
2010	DATE	0.99+
IBM	ORGANIZATION	0.99+
Doug Cutting	PERSON	0.99+
Mike Olson	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Dallas	LOCATION	0.99+
O'Reilly	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
five	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Abi Mehda	PERSON	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
$2.5 billion	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
MapR	ORGANIZATION	0.99+
Amr Awadallah	PERSON	0.99+
$40 billion	QUANTITY	0.99+
17 employees	QUANTITY	0.99+
VMworld	ORGANIZATION	0.99+
Today	DATE	0.99+
Impala	ORGANIZATION	0.99+
Nine years	QUANTITY	0.99+
four years ago	DATE	0.98+
last night	DATE	0.98+
last decade	DATE	0.98+
Strata Data Conference	EVENT	0.98+
Strata Conference	EVENT	0.98+
Hadoop Summit	EVENT	0.98+
ninth year	QUANTITY	0.98+
Four years ago	DATE	0.98+
two worlds	QUANTITY	0.97+
five companies	QUANTITY	0.97+
today	DATE	0.97+
Strata Hadoop	EVENT	0.97+
Hadoop World	EVENT	0.96+
CUBE	ORGANIZATION	0.96+
Google Next	ORGANIZATION	0.95+
Twitter	ORGANIZATION	0.95+
this year	DATE	0.95+
Spark	ORGANIZATION	0.95+
US	LOCATION	0.94+
CUBENYC	EVENT	0.94+
Strata O'Reilly	ORGANIZATION	0.93+
next decade	DATE	0.93+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

ENTITIES

Entity	Category	Confidence
Stephanie	PERSON	0.99+
Stephanie McReynolds	PERSON	0.99+
Greg Sands	PERSON	0.99+
John	PERSON	0.99+
Caleb	PERSON	0.99+
John Furrier	PERSON	0.99+
Nenshad	PERSON	0.99+
New York	LOCATION	0.99+
Prakash	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Aaron	PERSON	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
2013	DATE	0.99+
thousands	QUANTITY	0.99+
Costanoa Ventures	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
two companies	QUANTITY	0.99+
both companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Trifacta	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Strata Data	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
Paxata	ORGANIZATION	0.99+
Nenshad Bardoliwalla	PERSON	0.99+
eBay	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
two values	QUANTITY	0.99+
NYC	LOCATION	0.99+
hundreds	QUANTITY	0.99+
Big Data	ORGANIZATION	0.99+
first	QUANTITY	0.99+
one	QUANTITY	0.99+
both	QUANTITY	0.99+
Strata Hadoop	ORGANIZATION	0.99+
Hadoop World	ORGANIZATION	0.99+
earlier this week	DATE	0.98+
Paxata	PERSON	0.98+
today	DATE	0.98+
Day Three	QUANTITY	0.98+
Parquet	TITLE	0.96+
three years ago	DATE	0.96+

Rob Thomas, IBM | Big Data NYC 2017

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Rob Thomas	PERSON	0.99+
O'Reilly Media	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
10	QUANTITY	0.99+
New York	LOCATION	0.99+
10 products	QUANTITY	0.99+
O'Reilly	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
first book	QUANTITY	0.99+
two books	QUANTITY	0.99+
a day	QUANTITY	0.99+
Rob	PERSON	0.99+
Today	DATE	0.99+
yesterday	DATE	0.99+
New York City	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
San Francisco Bay	LOCATION	0.99+
five products	QUANTITY	0.99+
second book	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
this week	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first	QUANTITY	0.99+
first one	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
eight years	QUANTITY	0.99+
Spark	TITLE	0.99+
SQL	TITLE	0.99+
Common SQL	TITLE	0.98+
datascience.ibm.com	OTHER	0.98+
eighth year	QUANTITY	0.98+
One	QUANTITY	0.98+
one issue	QUANTITY	0.97+
Hortonworks Dataplane	ORGANIZATION	0.97+
three platforms	QUANTITY	0.97+
Strata Hadoop	TITLE	0.97+
today	DATE	0.97+
The Data Revolution	TITLE	0.97+
Cloudera	ORGANIZATION	0.97+
second	QUANTITY	0.96+
NYC	LOCATION	0.96+
two big problems	QUANTITY	0.96+
Analytics University	ORGANIZATION	0.96+
step two	QUANTITY	0.96+
one way	QUANTITY	0.96+
November first	DATE	0.96+
Big Data Revolution	TITLE	0.95+
one	QUANTITY	0.94+
Every Company is a Tech Company	TITLE	0.94+
CUBE	ORGANIZATION	0.93+
this year	DATE	0.93+
two different concepts	QUANTITY	0.92+
one system	QUANTITY	0.92+
step one	QUANTITY	0.92+

Greg Sands, Costanoa | Big Data NYC 2017

(electronic music) >> Host: Live from Midtown Manhattan it's The Cube! Covering Big Data New York City 2017, brought to you by Silicon Angle Media, and its Ecosystem sponsors. >> Okay, welcome back everyone. We are here live, The Cube in New York City for Big Data NYC, this is our fifth year, doing our own event, not with O'Reilly or Cloud Era at Strata Data, which as Hadoop World, Strata Conference, Strata Hadoop, now called Strata Data, probably called Strata AI next year, we're The Cube every year, bringing you all the great data, and what's going on. Entrepreneurs, VCs, thought leaders, we interview them and bring that to you. I'm John Furrier with our next guest, Greg Sands, who's the managing director and founder of Costa Nova ventures in Palo Alto, started out as an entrepreneur himself, then single shingle out there, now he's a big VC firm on a third fund. >> On the third fund. >> Third fund. How much in that fund? >> 175 million dollar fund. >> So now you're a big firm now, congratulations, and really great to see your success. >> Thanks very much. I mean, we're still very much an early stage boutique focused on companies that change the way the world does business, but it is the case that we have a bigger team and a bigger fund, to go do the same thing. >> Well you've been great to work with, I've been following you, we've known each other for a while, watched you left Sir Hill and start Costanova, but what's interesting is that, I can kind of joke and kid you, the VC inside joke about being a big firm, because I know you want to be small, and like to be small, help entrepreneurs, that's your thing. But it's really not a big firm, it's a few partners, but a lot of people helping companies, that's your ethos, that's what you're all about at your firm. Take a minute to just share with the folks the kinds of things you do and how you get involved in companies, you're hands on, you roll up your sleeves. You get out of the way at the right time, you help when you can, share your ethos. >> Yeah, absolutely so the way we think of it is, combining the craft of old school venture capital, with a modern operating team, and so since most founder these days are product-oriented, our job is to think like product people, not think like investors. So we think like product people, we do product level analysis, we do customer discovery, we do, we go ride along on sales calls when we're making investment decisions. And then we do the things that great venture capitalists have done for years, and so for example, at Alatian, who I know has been on the show today, we were able to incubate them in our office for a year, I had many conversations with Sathien after he'd sold the first two or three customers. Okay, who's the next person we hire? Who isn't a founder? Who's going to go out and sell? What does that person look like? Do you go straight to a VP? Or do you hire an individual contributor? Do you hire someone for domain, or do you hire someone for talent? And that's the thing that we love doing. Now we've actually built out an operating team so marketing partner, Martino Alcenco, and Jim Wilson as a sales partner, to really help turn that into a program, so that they can, we can take these founders who find product market fit, and say, how do we help you build the right sales process and marketing process, sales team and marketing team, for your company, your customer, your product? >> Well it's interesting since you mention old school venture capital, I'll get into some of the dynamics that are going on in Silicon valley, but it's important to bring that forward, because now with cloud you can get to critical mass on the fly wheel, on economics, you can see the visibility faster now. >> Greg: Absolutely. >> So the game of the old school venture capitalist is all the same, how do you get to cruising altitude, whatever metaphor you want to use, the key was getting there, and sometimes it took a couple of rounds, but now you can get these companies with five million, maybe $10 million funding, they can have unit economics visibility, scales insight, then the scale game comes in, so that seems to be the secret trick right now in venture is, don't overspend, keep the valuation in range and allows you to look for multiple exits potentially, or growth. Talk about that dynamic, because this is like, I call it the hour glass. You get through the hour glass, everyone's down here, but if you can sneak through and get the visibility on the economics, then you grow quickly. >> Absolutely. I mean, it's exactly right an I haven't heard the hour glass metaphor before but I like it. You want to basically get through the narrows of product market fit and the beginnings of scalable sales and marketing. You don't need to know all the answers, but you can do that in a capital-efficient way, building really solid foundations for future explosive growth, look, everybody loves fast growth and big markets, and being grown into. But the number of people who basically don't build those foundations and then say, go big or go home! And they take a ton of money, and they go spend all the money, doing things that just fundamentally don't work, and they blow themselves up. >> Well this is the hourglass problem. You have, once you get through that unique economics, then you have true scale, and value will increase. Everybody wins there so it's about getting through that, and you can get through it fast with good mentoring, but here's the challenge that entrepreneurs fall into the trap. I call it the, I think I made it trap. And what happens is they think they're on the other side of the hourglass, but they still haven't even gone through the straight and narrow yet, and they don't know it. And what they do is they over fund and implode. That seems to be a major trap I see a lot of entrepreneurs fall into, while I got a 50 million pre on my B round, or some monster valuation, and they get way too much cash, and they're behaving as if they're scaling, and they haven't even nailed it yet. >> Well, I think that's right. So there's certainly, there are stages of product market fit, and so I think people hit that first stage, and they say, oh I've got it. And they try to explode out of the gates. And we, in fact I know one good example of somebody saying, hey, by the way, we're doing great in field sales, and our investors want us to go really fast, so we are going to go inside and we, my job was to hire 50 inside people, without ever having tried it. And so we always preach crawl, walk, run, right? Hire a couple, see how it works. Right, in a new channel. Or a new category, or an adjacent space, and I think that it's helpful to have an investor who has seen the whole picture to say, yeah, I know it looks like light at the end of the tunnel, but see how it's a relatively small dot? You still got to go a little farther, and then the other thing I say is, look, don't build your company to feed your venture capitalist ego. Right? People do these big rounds of big valuations, and the big dog investors say, go, go, go! But, you're the CEO. Your job is analyze the data. >> John: You can find during the day (laughs). >> And say, you know, given what we know, how fast should we go? Which investments should we make? And you've got to own that. And I think sometimes our job is just to be the pulling guard and clear space for the CEO to make good decisions. >> So you know I'm a big fan, so my bias is pretty much out there, love what you guys are doing. Tim Carr is a Pivot North doing the same thing. Really adding value, getting down and dirty, but the question that entrepreneurs always ask me and talk privately, not about you, but in general, I don't want the VC to get in the way. I want them, I don't want them to preach to me, I don't want too many know-it-alls on my board, I want added value, but again, I don't want the preaching, I don't want them to get in the way, 'cause that's the fear. I'm not saying the same about VCs in general, but that's kind of the mentality of an entrepreneur. I want someone who's going to help me, be in the boat with me, but not be in my way. How do you address that concern to the founders who think, not think like that, but might have a fear. >> Well, by the way, I think it's a legitimate fear, and I think it actually is uncorrelated with added value, right? I think the idea that the board has certain responsibilities, and management has certain responsibilities, is incredibly important. And I think, I can speak for myself in saying, I'm quite conscious of not crossing that line, I think you talk. >> John: You got to build a return, that's the thing. >> But ultimately I would say to an entrepreneur, I'd just say, hey look, call references. And by the way, here are 30 names and phone numbers, and call any one of them, because I think that people who are, so a venture capital know-it-all, in the board room, telling CEOs what to do, destroys value. It's sand in the gears, and it's bad for the company. >> Absolutely, I agree 100% >> And some of my, when I talk about being a pulling guard for the CEO, that's what I'm talking about, which is blocking people who are destructive. >> And rolling the block for a touchdown, kind of use the metaphor. Adding value, that's the key, and that's why I wanted to get that out there because most guys don't get that nuance, and entrepreneurs, especially the younger ones. So it's good and important. Okay, let's talk about culture, obviously in Silicon Valley, I get, reading this morning in the Wymo guy, and they're writing it, that's the Silicon Valley, that's not crazy, there's a lot of great people in Silicon Valley, you're one of them. The culture's certainly an innovative culture, there's been some things in the press, inclusion and diversity, obviously is super important. This whole brogrammer thing that's been kind of kicked around. How are you dealing with all that? Because, you know, this is a cultural shift, but I think it's being made out more than it really is, but there's still our core issues, your thoughts on the whole inclusion and diversity, and this whole brogrammer blowback thing. >> Yeah, well so I think, so first of all, really important issues, glad we're talking about them, and we all need to get better. And to me the question for us has been, what role do we play? And because I would say it is a relatively small subset of the tech industry, and the venture capital industry. At the same time the behavior of that has become public is appalling. It's appalling and totally unacceptable, and so the question is, okay, how can we be a part of the stand-up part of the ecosystem, and some of which is calling things out when we see them. Though frankly we work with and hang out with people and we don't see them that often, and then part of which is, how do we find a couple of ways to contribute meaningfully? So for example this summer we ran what we called the Costanova Access Fellowship, intentionally, trying to provide first opportunity and venture capital for people who traditionally haven't had as much access. We created an event in the spring called, Seat at the Table, really, particularly around women in the tech industry, and it went so well that we're running it in New York on October 19th, so if you're a woman in tech in New York, we'd love to see you then. And we're just trying to figure-- >> You're doing it in an authentic way though, you're not really doing it from a promotional standpoint. It's legit. >> Yeah, we're just trying to do, you know, pick off a couple of things that we can do, so that we can be on the side of the good guys. >> So I guess what you're saying is just have high integrity, and be part of the solution not part of the problem. >> That's right, and by the way, both of these initiatives were ones that were kicked off in late 2016, so it's not a reaction to things like binary capital, and the problems at uper, both of which are appalling. >> Self-awareness is critical. Let's get back to the nuts and bolts of the real reason why I wanted you to come on, one was to find out how much money you have to spend for the entrepreneurs that are watching. Give us the update on the last fund, so you got a new fund that you just closed, the new fund, fund three. You have your other funds that are still out there, and some funds reserved, which, what's the number amount, how much are you writing checks for? Give the whole thesis. >> Absoluteley. So we're an early stage investor, so we lead series A and seed financing companies that change the way the world does business, so up and down the stack, a business-facing software, data-driven applications. Machine-learning and AI driven applications. >> John: But the filter is changing the way the world works? >> The way, yes, but in particularly the way the world does business. You can think of it as a business-facing software stack. We're not social media investors, it's not what we know, it's not what we're good at. And it includes security and management, and the data stack and-- >> Joe: Enterprise and emerging tech. >> That's right. And the-- >> And every crazy idea in between. >> That's right. (laughs) Absolutely, and so we're participate in or leave seed financings as most typically are half a million to maybe one and a quarter, and we'll lead series A financing, small ones might be two or two and a half million dollars at the outer edge is probably a six million dollar check. We were just opening up in the next couple of days, a thousand square feet of incubation space at world headquarters at Palo Alto. >> John: Nice. >> So Alation, Acme Ticketing and Zen IQ are companies that we invested in. >> Joe: What location is this going to be at? >> That's, near the Fills in downtown Palo Alto, 164 staff, and those three companies are ones where we effectively invested at formation and incubated it for a year, we love doing that. >> At the hangout at Philsmore and get the data. And so you got some funds, what else do you have going on? 175 million? >> So one was a $100 million fund, and then fund two was $135 million fund, and the last investment of fund two which we announced about three weeks ago was called Roadster, so it's ecommerce enablement for the modern dealerships. So Omnichannel and Mobile First infrastructure for auto-dealers. We have already closed, and had the first board meeting for the first new investment of fund three, which isn't yet announced, but in the land of computer vision and deep learning, so a couple of the subjects that we care deeply about, and spend a lot of time thinking about. >> And the average check size for the A round again, seed and A, what do you know about the? The lowest and highest? >> The average for the seed is half a million to one and a quarter, and probably average for a series A is four or five. >> And you'll lead As. >> And we will lead As. >> Okay great. What's the coolest thing you're working on right now that gets you excited? It doesn't have to be a portfolio company, but the research you're doing, thing, tires you're kicking, in subjects, or domains? >> You know, so honestly, one of the great benefits of the venture capital business is that I get up and my neurons are firing right away every day. And I do think that for example, one of the things that we love is is all of the adulant infrastructure and so we've got our friends at Victor Ops that are in the middle of that space, and the thinking about how the modern programmer works, how everybody-- >> Joe: Is security on your radar? >> Security is very much on our radar, in fact, someone who you should have on your show is Asheesh Guptar, and Casey Ella, so she's just joined Bug Crowd as the CEO and Casey moves over to CTO, and the word Bug Bounty was just entered into the Oxford Dictionary for the first time last week, so that to me is the ultimate in category creation. So security and dev ops tools are among the things that we really like. >> And bounties will become the norm as more and more decentralized apps hit the scene. Are you doing anything on decentralized applications? I'm not saying Blockchain in particular, but Blockchain like apps, distributing computing you're well versed on. >> That's right, well we-- >> Blockchain will have an impact in your area. >> Blockchain will have an impact, we just spent an hour talking about it in the context our off site in Decosona Lodge in Pascadero, it felt like it was important that we go there. And digging into it. I think actually the edge computing is actually more actionable for us right now, given the things that we're, given the things that we're interested in, and we're doing and they, it is just fascinating how compute centralizes and then decentralizes, centralizes and then decentralizes again, and I do think that there are a set of things that are fascinating about what your process at the edge, and what you send back to the core. >> As Pet Gelson here said in the QU, if you're not out in front of that next wave, you're driftwood, a lot of big waves coming in, you've seen a lot of waves, you were part of one that changed the world, Netscape browser, or the business plan for that first project manager, congratulations. Now you're at a whole nother generation. You ready? (laughs) >> Absolutely, I'm totally ready, I'm ready to go. >> Greg Sands here in The Cube in New York City, part of Big Data NYC, more live coverage with The Cube after this short break, thanks for watching. (electronic jingle) (inspiring electronic music)

Published Date : Sep 29 2017

SUMMARY :

brought to you by Silicon Angle Media, and founder of Costa Nova ventures in Palo Alto, How much in that fund? congratulations, and really great to see your success. but it is the case that we have the kinds of things you do and how you get And that's the thing that we love doing. I'll get into some of the dynamics that are going on is all the same, how do you get to But the number of people who basically but here's the challenge that and the big dog investors say, go, go, go! for the CEO to make good decisions. but that's kind of the mentality of an entrepreneur. Well, by the way, I think it's a legitimate fear, And by the way, here are 30 names and phone numbers, And some of my, and entrepreneurs, especially the younger ones. and so the question is, okay, You're doing it in an authentic way though, so that we can be on the side of the good guys. not part of the problem. and the problems at uper, of the real reason why I wanted you to come on, companies that change the way the world does business, and the data stack and-- And the-- and a half million dollars at the outer edge So Alation, Acme Ticketing and Zen IQ That's, near the Fills in downtown Palo Alto, And so you got some funds, and the last investment of fund two The average for the seed is but the research you're doing, and the thinking about how the modern are among the things that we really like. more and more decentralized apps hit the scene. and what you send back to the core. or the business plan for that first I'm ready to go. Greg Sands here in The Cube in New York City,

ENTITIES

Entity	Category	Confidence
Greg Sands	PERSON	0.99+
Asheesh Guptar	PERSON	0.99+
John	PERSON	0.99+
two	QUANTITY	0.99+
Tim Carr	PERSON	0.99+
John Furrier	PERSON	0.99+
Costa Nova	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
Joe	PERSON	0.99+
October 19th	DATE	0.99+
Costanova	ORGANIZATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
$10 million	QUANTITY	0.99+
New York	LOCATION	0.99+
$100 million	QUANTITY	0.99+
five million	QUANTITY	0.99+
Casey Ella	PERSON	0.99+
$135 million	QUANTITY	0.99+
Zen IQ	ORGANIZATION	0.99+
Omnichannel	ORGANIZATION	0.99+
50 million	QUANTITY	0.99+
three companies	QUANTITY	0.99+
Pascadero	LOCATION	0.99+
Greg	PERSON	0.99+
New York City	LOCATION	0.99+
100%	QUANTITY	0.99+
50	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
Jim Wilson	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Casey	PERSON	0.99+
Alation	ORGANIZATION	0.99+
half a million	QUANTITY	0.99+
30 names	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
175 million	QUANTITY	0.99+
first	QUANTITY	0.99+
Victor Ops	ORGANIZATION	0.99+
Pet Gelson	PERSON	0.99+
both	QUANTITY	0.99+
last week	DATE	0.99+
four	QUANTITY	0.99+
three customers	QUANTITY	0.99+
late 2016	DATE	0.99+
fifth year	QUANTITY	0.99+
Cloud Era	ORGANIZATION	0.99+
Acme Ticketing	ORGANIZATION	0.98+
164 staff	QUANTITY	0.98+
NYC	LOCATION	0.98+
five	QUANTITY	0.98+
Oxford Dictionary	TITLE	0.98+
Midtown Manhattan	LOCATION	0.98+
Alatian	ORGANIZATION	0.98+
175 million dollar	QUANTITY	0.98+
next year	DATE	0.98+
today	DATE	0.97+
first time	QUANTITY	0.97+
third fund	QUANTITY	0.97+
first board	QUANTITY	0.97+
Costanoa	PERSON	0.97+
a year	QUANTITY	0.97+
six	QUANTITY	0.97+
one	QUANTITY	0.97+
one and a quarter	QUANTITY	0.96+
Strata Conference	EVENT	0.96+
The Cube	TITLE	0.96+
Strata AI	EVENT	0.96+
million dollar	QUANTITY	0.96+
2017	EVENT	0.95+
first project	QUANTITY	0.95+
two and a half million dollars	QUANTITY	0.95+
Hadoop World	EVENT	0.94+
Sathien	PERSON	0.93+
single shingle	QUANTITY	0.93+
first two	QUANTITY	0.93+
an hour	QUANTITY	0.92+
this summer	DATE	0.92+
first stage	QUANTITY	0.92+
Bug Crowd	ORGANIZATION	0.91+

Matt Maccaux, Dell EMC | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan. It's the CUBE. Covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsor. (electronic music) >> Hey, welcome back everyone, live here in New York. This is the CUBE here in Manhattan for Big Data NYC's three days of coverage. We're one day three, things are starting to settle in, starting to see the patterns out there. I'll say it's Big Data week here, in conjunction with Hadoop World, formerly known as Strata Conference, Strata-Hadoop, Strata-Data, soon to be Strata-AI, soon to be Strata-IOT. Big Data, Mike Maccaux who's the Global Big Data Practice Lead at Dell EMC. We've been in this world now for multiple years and, well, what a riot it's been. >> Yeah, it has. It's been really interesting as the organizations have gone from their legacy systems, they have been modernizing. And we've sort of seen Big Data 1.0 a couple years ago. Big Data 2.0 and now we're moving on sort of the what's next? >> Yeah. >> And it's interesting because the Big Data space has really lagged the application space. You talk about microservices-based applications, and deploying in the cloud and stateless things. The data technologies and the data space has not quite caught up. The technology's there, but the thinking around it, and the deployment of those, it seems to be a slower, more methodical process. And so what we're seeing in a lot of enterprises is that the ones that got in early, have built out capabilities, are now looking for that, how do we get to the next level? How do we provide self-service? How do we enable our data scientists to be more productive within the enterprise, right? If you're a startup, it's easy, right? You're somewhere in the public cloud, you're using cloud based API, it's all fine. But if you're an enterprise, with the inertia of those legacy systems and governance and controls, it's a different problem to solve for. >> Let's just face it. We'll just call a spade a spade. Total cost of ownership was out of control. Hadoop was great, but it was built for something that tried to be something else as it evolved. And that's good also, because we need to decentralize and democratize the incumbent big data warehouse stuff. But let's face it, Hadoop is not the game anymore, it's everything else. >> Right, yep. >> Around it so, we've seen that, that's a couple years old. It's about business value right now. That seems to be the big thing. The separation between the players that can deliver value for the customers. >> Matt: Yep. >> And show a little bit of headroom for future AI things, they've seen that. And have the cloud on premise play. >> Yep. >> Right now, to me, that's the call here. What do you, do you agree? >> I absolutely see it. It's funny, you talk to organizations and they say, "We're going cloud, we're doing cloud." Well what does that mean? Can you even put your data in the cloud? Are you allowed to? How are you going to manage that? How are you going to govern that? How are you going to secure that? So many organizations, once they've asked those questions, they've realized, maybe we should start with the model of cloud on premise. And figure out what works and what doesn't. How do users actually want to self serve? What do we templatize for them? And what do we give them the freedom to do themselves? >> Yeah. >> And they sort of get their sea legs with that, and then we look at sort of a hybrid cloud model. How do we be able to span on premise, off premise, whatever your public cloud is, in a seamless way? Because we don't want to end up with the same thing that we had with mainframes decades ago, where it was, IBM had the best, it was the fastest, it was the most efficient, it was the new paradigm. And then 10 years later, organizations realized they were locked in, there was different technology. The same thing's true if you go cloud native. You're sort of locked in. So how do you be cloud agnostic? >> How do you get locked in a cloud native? You mean with Amazon? >> Or any of them, right? >> Okay. >> So they all have their own APIs that are really good for doing certain things. So Google's TensorFlow happens to be very good. >> Yeah. Amazon EMR. >> But you build applications that are using those native APIS, you're sort of locked. And maybe you want to switch to something else. How do you do that? So the idea is to-- >> That's why Kubernetes is so important, right now. That's a very key workload and orchestration container-based system. >> That's right, so we believe that containerization of workloads that you can define in one place, and deploy anywhere is the path forward, right? Deploy 'em on prem, deploy 'em in a private cloud, public cloud, it doesn't matter the infrastructure. Infrastructure's irrelevant. Just like Hadoop is sort of not that important anymore. >> So let me get your reaction on this. >> Yeah. So Dell EMC, so you guys have actually been a supplier. They've been the leading supplier, and now with Dell EMC across the portfolio of everything. From Dell computers, servers and what not, to storage, EMC's run the table on that for many generations. Yeah, there's people nippin' at your heels like Pure, okay that's fine. >> Sure. It's still storage is storage. You got to store the data somewhere, so storage will always be around. Here's what I heard from a CXO. This is the pattern I hear, but I'll just summarize it in one conversation. And then you can give a reaction to it. John, my life is hell. I have application development investment plan, it's just boot up all these new developers. New dev ops guys. We're going to do open source, I got to build that out. I got that, trying to get dev ops going on. >> Yep. >> That's a huge initiative. I got the security team. I'm unbundling from my IT department, into a new, difference in a reporting to the board. And then I got all this data governance crap underneath here, and then I got IOT over the top, and I still don't know where my security holes are. >> Yep. And you want to sell me what? (Matt laughs) So that's the fear. >> That's right. >> Their plates are full. How do you guys help that scenario? You walk in, actually security's pretty much, important obviously you can see that. But how do you walk into that conversation? >> Yeah, it's sort of stop the madness, right? >> (laughs) That's right. >> And all of that matters-- >> No, but this is all critical. Every room in the house is on fire. >> It is. >> And I got to get my house in order, so your comment to me better not be hype. TensorFlow, don't give me this TensorFlow stuff. >> That's right. >> I want real deal. >> Right, I need, my guys are-- >> I love TensorFlow but, doesn't put the fire out. >> They just want spark, right? I need to speed up my-- >> John: All right, so how do you help me? >> So, what we'd do is, we want to complement and augment their existing capabilities with better ways of scaling their architecture. So let's help them containerize their big data workload so that they can deploy them anywhere. Let's help them define centralized security policies that can be defined once and enforced everywhere, so that now we have a way to automate the deployment of environments. And users can bring their own tools. They can bring their data from outside, but because we have intelligent centralized policies, we can enforce that. And so with our elastic data platform, we are doing that with partners in the industry, Blue Talent and Blue Data, they provide that capability on top of whatever the customer's infrastructure is. >> How important is it to you guys that Dell EMC are partnering. I know Michael Dell talks about it all the time, so I know it's important. But I want to hear your reaction. Down in the trenches, you're in the front lines, providing the value, pulling things together. Partnerships seem to be really important. Explain how you look at that, how you guys do your partners. You mentioned Blue Talent and Blue Data. >> That's right, well I'm in the consulting organization. So we are on the front lines. We are dealing with customers day in and day out. And they want us to help them solve their problems, not put more of our kit in their data centers, on their desktops. And so partnering is really key, and our job is to find where the problems are with our customers, and find the best tool for the best job. The right thing for the right workload. And you know what? If the customer says, "We're moving to Amazon," then Dell EMC might not sell any more compute infrastructure to that customer. They might, we might not, right? But it's our job to help them get there, and by partnering with organizations, we can help that seamless. And that strengthens the relationship, and they're going to purchase-- >> So you're saying that you will put the customer over Dell EMC? >> Well, the customer is number one to Dell EMC. Net promoter score is one of the most important metrics that we have-- >> Just want to make sure get on the record, and that's important, 'cause Amazon, and you know, we saw it in Net App. I've got to say, give Net App credit. They heard from customers early on that Amazon was important. They started building into Amazon support. So people saying, "Are you crazy?" VMware, everyone's saying, "Hey you capitulated "by going to Amazon." Turns out that that was a damn good move. >> That's right. >> For Kelsinger. >> Yep. >> Look at VM World. They're going to own the cloud service provider market as an arms dealer. >> Yep. >> I mean, you would have thought that a year ago, no way. And then when they did the deal, they said, >> We have really smart leadership in the organization. Obviously Michael is a brilliant man. And it sort of trickles on down. It's customer first, solve the customer's problems, build the relationship with them, and there will be other things that come, right? There will be other needs, other workloads. We do happen to have a private cloud solution with Virtustream. Some of these customers need that intermediary step, before they go full public, with a hosted private solution using a Virtustream. >> All right, so what's the, final question, so what's the number one thing you're working on right now with customers? What's the pattern? You got the stack rank, you're requests, your deliverables, where you spend your time. What's the top things you're working on? >> The top thing right now is scaling architectures. So getting organizations past, they've already got their first 20 use cases. They've already got lakes, they got pedabytes in there. How do we enable self service so that we can actually bring that business value back, as you mentioned. Bring that business value back by making those data scientists productive. That's number one. Number two is aligning that to overall strategy. So organizations want to monetize their data, but they don't really know what that means. And so, within a consulting practice, we help our customers define, and put a road map in place, to align that strategy to their goals, the policies, the security, the GDP, or the regulations. You have to marry the business and the technology together. You can't do either one in isolation. Or ultimately, you're not going to be efficient. >> All right, and just your take on Big Data NYC this year. What's going on in Manhattan this year? What's the big trend from your standpoint? That you could take away from this show besides it becoming a sprawl of you know, everyone just promoting their wares. I mean it's a big, hyped show that O'Reilly does, >> It is. >> But in general, what's the takeaway from the signal? >> It was good hearing from customers this year. Customer segments, I hope to see more of that in the future. Not all just vendors showing their wares. Hearing customers actually talk about the pain and the success that they've had. So the Barclay session where they went up and they talked about their entire journey. It was a packed room, standing room only. They described their journey. And I saw other banks walk up to them and say, "We're feeling the same thing." And this is a highly competitive financial services space. >> Yeah, we had Packsotta's customer on Standard Bank. They came off about their journey, and how they're wrangling automating. Automating's the big thing. Machine learning, automation, no doubt. If people aren't looking at that, they're dead in my mind. I mean, that's what I'm seeing. >> That's right. And you have to get your house in order before you can start doing the fancy gardening. >> John: Yeah. >> And organizations aspire to do the gardening, right? >> I couldn't agree more. You got to be able to drive the car, you got to know how to drive the car if you want to actually play in this game. But it's a good example, the house. Got to get the house in order. Rooms are on fire (laughs) right? Put the fires out, retrench. That's why private cloud's kicking ass right now. I'm telling you right now. Wikibon nailed it in their true private cloud survey. No other firm nailed this. They nailed it, and it went viral. And that is, private cloud is working and growing faster than some areas because the fact of the matter is, there's some bursting through the clouds, and great use cases in the cloud. But, >> Yep. >> People have to get the ops right on premise. >> Matt: That's right, yep. >> I'm not saying on premise is going to be the future. >> Not forever. >> I'm just saying that the stack and rack operational model is going cloud model. >> Yes. >> John: That's absolutely happening, that's growing. You agree? >> Absolutely, we completely, we see that pattern over and over and over again. And it's the Goldilocks problem. There's the organizations that say, "We're never going to go cloud." There's the organizations that say, "We're going to go full cloud." For big data workloads, I think there's an intermediary for the next couple years, while we figure out operating pulse. >> This evolution, what's fun about the market right now, and it's clear to me that, people who try to get a spot too early, there's too many diseconomies of scale. >> Yep. >> Let the evolution, Kubernetes looking good off the tee right now. Docker containers and containerization in general's happened. >> Yep. >> Happening, dev ops is going mainstream. >> Yep. >> So that's going to develop. While that's developing, you get your house in order, and certainly go to the cloud for bursting, and other green field opportunities. >> Sure. >> No doubt. >> But wait until everything's teed up. >> That's right, the right workload in the right place. >> I mean Amazon's got thousands of enterprises using the cloud. >> Yeah, absolutely. >> It's not like people aren't using the cloud. >> No, they're, yeah. >> It's not 100% yet. (laughs) >> And what's the workload, right? What data can you put there? Do you know what data you're putting there? How do you secure that? And how do you do that in a repeatable way. Yeah, and you think cloud's driving the big data market right now. That's what I was saying earlier. I was saying, I think that the cloud is the unsubtext of this show. >> It's enabling. I don't know if it's driving, but it's the enabling factor. It allows for that scale and speed. >> It accelerates. >> Yeah. >> It accelerates... >> That's a better word, accelerates. >> Accelerates that horizontally scalable. Mike, thanks for coming on the CUBE. Really appreciate it. More live action we're going to have some partners on with you guys. Next, stay with us. Live in Manhattan, this is the CUBE. (electronic music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by Silicon Angle Media This is the CUBE here in Manhattan sort of the what's next? And it's interesting because the decentralize and democratize the The separation between the players And have the cloud on premise play. Right now, to me, that's the call here. the model of cloud on premise. IBM had the best, it was the fastest, So Google's TensorFlow happens to be very good. So the idea is to-- and orchestration container-based system. and deploy anywhere is the path forward, right? So let me get your So Dell EMC, so you guys have And then you can give a reaction to it. I got the security team. So that's the fear. How do you guys help that scenario? Every room in the house is on fire. And I got to get my house in order, doesn't put the fire out. the deployment of environments. How important is it to you guys And that strengthens the relationship, Well, the customer is number one to Dell EMC. and you know, we saw it in Net App. They're going to own the cloud service provider market I mean, you would have thought that a year ago, no way. build the relationship with them, You got the stack rank, you're the policies, the security, the GDP, or the regulations. What's the big trend from your standpoint? and the success that they've had. Automating's the big thing. And you have to get your house in order But it's a good example, the house. the stack and rack operational model John: That's absolutely happening, that's growing. And it's the Goldilocks problem. and it's clear to me that, Kubernetes looking good off the tee right now. and certainly go to the cloud for bursting, That's right, the right workload in the I mean Amazon's got It's not 100% yet. And how do you do that in a repeatable way. but it's the enabling factor. Mike, thanks for coming on the CUBE.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Michael	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Mike Maccaux	PERSON	0.99+
Matt Maccaux	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Matt	PERSON	0.99+
Manhattan	LOCATION	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
New York	LOCATION	0.99+
100%	QUANTITY	0.99+
Blue Data	ORGANIZATION	0.99+
Mike	PERSON	0.99+
Blue Talent	ORGANIZATION	0.99+
Dell EMC	ORGANIZATION	0.99+
Standard Bank	ORGANIZATION	0.99+
Big Data	ORGANIZATION	0.99+
this year	DATE	0.99+
one	QUANTITY	0.99+
VM World	ORGANIZATION	0.99+
Michael Dell	PERSON	0.99+
thousands	QUANTITY	0.99+
Barclay	ORGANIZATION	0.99+
Hadoop	TITLE	0.98+
three days	QUANTITY	0.98+
decades ago	DATE	0.98+
NYC	LOCATION	0.98+
one day	QUANTITY	0.98+
one conversation	QUANTITY	0.98+
Goldilocks	PERSON	0.98+
O'Reilly	ORGANIZATION	0.98+
a year ago	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Midtown Manhattan	LOCATION	0.98+
10 years later	DATE	0.97+
TensorFlow	ORGANIZATION	0.97+
first 20 use cases	QUANTITY	0.97+
Google	ORGANIZATION	0.97+
Kelsinger	PERSON	0.97+
New York City	LOCATION	0.96+
first	QUANTITY	0.95+
VMware	ORGANIZATION	0.93+
Strata Conference	EVENT	0.93+
Big Data	EVENT	0.92+
Strata-Hadoop	EVENT	0.9+
Strata-Data	EVENT	0.9+
Number two	QUANTITY	0.9+
next couple years	DATE	0.86+
couple years ago	DATE	0.84+
2017	DATE	0.84+
Global Big Data	ORGANIZATION	0.83+
Packsotta	ORGANIZATION	0.83+
Hadoop World	ORGANIZATION	0.83+
Big Data 2.0	TITLE	0.81+
three	QUANTITY	0.79+
couple years	QUANTITY	0.76+
Big Data 1.0	TITLE	0.73+
Net App	TITLE	0.72+
2017	EVENT	0.71+
one place	QUANTITY	0.69+
number one	QUANTITY	0.67+
Kubernetes	ORGANIZATION	0.67+
enterprises	QUANTITY	0.66+

Murthy Mathiprakasam, Informatica | Big Data NYC 2017

>> Narrator: Live from midtown Manhattan, it's theCUBE. Covering BigData, New York City, 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we're here live in New York City for theCUBE's coverage of BigData NYC, our event we've been running for five years, been covering BigData space for eight years, since 2010 when it was Hadoop World, Strata Conference, Strata Hadoop, Strata Data, soon to be called Strata AI, just a few. We've been theCUBE for all eight years. Here, live in New York, I'm John Furrier. Our next guest is Murthy Mathiprakasam, who is the Director of Product Marketing at Informatica. Cube alumni has been on many times, we cover Informatica World, every year. Great to see you, thanks for coming by and coming in. >> Great to see you. >> You guys do data, so there's not a lot of recycling going on in the data because we've been talking about it all week, total transformation, but the undercurrent has been a lot of AI, AI this, and you guys have the CLAIRE product, doing a lot of things there. But outside of the AI, the undertone is cloud, cloud, cloud. Governance, governance, governance. There's two kind of the drivers I'm seeing as the force of this week is, a lot of people trying to get their act together on those two fronts and you can kind of see the scabs on the industry, people, some people haven't been paying attention. And they're weak in the area. Cloud is absolutely going to be driving the BigData world, 'cause data is horizontal. Cloud's the power source that you guys have been on that. What's your thoughts, what other drivers encourage you? (mumbles) what I'm saying and what else did I miss? Security is obviously in there, but-- >> Absolutely, no, so I think you're exactly right on. So obviously governments security is a big deal. Largely being driven by the GDPR regulation, it's happening in Europe. But, I mean every company today is global, so. Everybody's essentially affected by it. So, I think data until now has always been a kind of opportunistic thing, that there's a couple guys and their organizations were looking at it as oh, let's do some experimentation. Let's do something interesting here. Now, it's becoming government managed so I think there's a lot of organizations who are, like, to your point, getting their act together, and that's driving a lot of demand for data management projects. So now, people say, well, if I got to get my act together, I don't have to hire armies of people to do it, let me look for automated machine learning based ways of doing it. So that they can actually deliver on their audit reports that they need to deliver on, and ensure the compliance that they need to ensure, but do it in a very scalable way. >> I've been kind of joking all week, and I kind of had this meme in my head, so I've been pounding on it all week, calling it the tool shed problem. The tool shed problem is, everyone's got these tools. They throw them into the tool shed. They bought a hammer and the company that sells them the hammer is trying to turn it to a lawnmower, right? You can't mow your lawn with a hammer, it's not going to work, and so this, these tools are great but it defines work. What you do, but, the platforming issue is a huge one. And you start to see people who took that view. You guys were one of them because in a platform centric view with tools that are enabled, to be highly productive. You don't have to worry about new things like a government's policy, the GDPR that might pop up, or the next Equifax that's around the corner. There's probably two or three of them going on right now. So, that's an impact, the data, who uses it, how it's used, and who's at fault or whatever. So, how does a company deal with that? And machine learning has proven to be a great horse that a lot of people are riding right now. You guys are doing it, how does a customer deal with that tsunami of potential threats? Architecture challenges, what is your solution, how do you talk about that? >> Well, I think machine learning, you know, up until now has been seen as the kind of, nice to have, and I think that very quickly, it's going to become a must have. Because, exactly like you're saying, it really is a tsunami. I mean, you could see people who are nervous about the fact that I mean, there's different estimates. It's like 40% growth in data assets from most organizations every year. So, you can try to get around this somehow with one of these (mumbles) tools or something. But at some point, something is going to break, either you just don't, run out of manpower, you can't train the manpower, people start leaving. whatever the operational challenges are, it just isn't going to scale. Machine learning is the only approach. It is absolutely the only approach that actually ensures that you can maintain data for these kind of defensive reasons like you're saying. The structure and compliance, but also the kind of offensive opportunistic reasons, and do it scalably, 'cause there's just no other way mathematically speaking, that when the data is growing 40% a year, just throwing a bunch of tools at it just doesn't work. >> Yeah, I would just amplify and look right in the camera, say, if you're not on machine learning, you're out of business. That's a straight up obvious trend, 'cause that's a precursor to AI, real AI. Alright, let's get down to data management, so when people throw around data management, it's like, oh yeah we've got some data management. There are challenges with that. You guys have been there from day one. But now if you take it out in the future, how do you guys provide the data management in a totally cloud world where now the customer certainly has public and private, or on premise but theirs might have multi cloud? So now, comes a land grab for the data layer, how do you guys play in that? >> Well, I think it's a great opportunity for these kind of middle work platforms that actually do span multiple clouds, that can span the internal environments. So, I'll give you an example. Yesterday we actually had a customer speaking at Astrada here, and he was talking about from him, the cloud is really just a natural extension of what they're already doing, because they already have a sophisticated data practice. This is a large financial services organization, and he's saying well now the data isn't all inside, some of it's outside, you've got partners, who've got data outside. How do we get to that data? Clearly, the cloud is the path for doing that. So, the fact that the cloud is a national extension a lot of organizations were already doing internally means they don't want to have a completely different approach to the data management. They want to have a consistent, simple, systematic repeatable approach to the data management that spans, as you said, on premise in the cloud. That's why I think the opportunity of a very mature and sophisticated platform because you're not rewriting and re-platforming for every new, is it AWS, is it Azure? Is it something on premise? You just want something that works, that shields you from the underlying infrastructure. >> So I put my skeptic hat on for a second and challenge you on this, because this I think is fundamental. Whether it's real or not, it's perceived, maybe in the back of the mind of the CXO or the CDO, whoever is enabled to make these big calls. If they have the keys to the kingdom in Informatica, I'm going to get locked in. So, this is a deep fear. People wake up with nightmares in the enterprise, they've seen locked in before. How do you explain that to a customer that you're going to be an enabling opportunity for them, not a lock in and foreclosing future benefits. Especially if I have an unknown scenario called multi-cloud. I mean, no one's really doing multi-cloud let's face it. I mean, I have multiple clouds with stuff on it, >> At least not intentionally. Sometimes you got a line of businesses and doing things, but absolutely I get it. >> No one's really moving workloads dynamically between clouds in real time. Maybe a few people doing some hacks, but for the most part of course, not a standard practice. >> Right. >> But they want it to be. >> Absolutely. >> So that's the future. From today, how do you preserve that position with the customer where you say hey we're going to add value, but we're not going to lock you in? >> So the whole premise again of, I mean, this goes back to classic three tier models of how you think about technology stacks, right? There's an infrastructure layer, there's a platform layer, there's an analytics layer and the whole premise of the middle of the layer, the platform layer, is that it enables flexibility in the other two layers. It's precisely when you don't have something that's kind of intermediating the data and the use of the data, that's when you run into challenges with flexibility and with data being locked in the data store. But you're absolutely right. We had dinner with a bunch of our customers last night. They were talking about they'd essentially evaluated every version of sort of BigData platform and data infrastructure platform right? And why? It was because they were a large organization and your different teams start stuff and they had to compute them out and stuff. And I was like that must have been pretty hard for you guys. Now what we were using Informatica, so it didn't really matter where the data was, we were still doing everything as far as the data management goes from a consistent layer and we integrate with all those different platforms. >> John: So you didn't get in the way? >> We didn't get in the way. >> You've actually facilitated. >> We are facilitating increased flexibility. Because without a layer like that, a fabric, or whatever you want to call it a data platform that's facilitating this the complexity's going to get very, very crazy very soon. If it hasn't already. The number of infrastructure platforms that are available like you said, on premise and on the cloud now, keeps growing. The number of analytical tools that are available is also growing. And all this is amazing innovation by the way. This is all great stuff, but to your point about it if your the chief officer of an organization going, I got to get this thing figured out somehow. I need some sanity, that's really the purpose of-- >> They just don't want to know the tool for tool's sake, they need to have it be purposeful. >> And that's why this machine learning aspect is very, very critical because I was thinking about an analogy just like you were and I was thinking, in a way you can think of data managing as sort of cleaning stuff up and there are people that have brooms and mops and all these different tools. Well, we are bringing a Roomba to market, right? Because you don't want to just create tools that transfer the laborer around, which is a little bit of what's going on. You want to actually get the laborer out of the equation, so that the people are focused on the context, business strategy and the data management is sort of cleaning itself. It's doing the work for you. That's really what Informatica's vision is. It's about being a kind of enterprise cloud data management vendor that is leveraging AI under the hood so that you can sort of set it and forget it. A lot of this ingestion and the cleansing, telling annals what data they should be looking for. All the stuff is just happening in an automated way and you're not in this total chaos. >> And that can be some tools will be sitting in the back for a long time. In my tool shed, when I had one back in a big enough property back east. No one has tool sheds by the way. No one does any gardening. The issue is in the day, I need to have a reliable partner. So I want you to take a minute and explain to the folks who aren't yet Informatica customers why they should be and the Informatica customers why they should stay with Informatica. >> Absolutely, so certainly the ones we have, a very loyal customer base. In fact the guy who was presenting with us yesterday, he said he's been with Informatica since 1999, going through various versions of our products and adopting new innovations. So we have a very loyal customer base, so I think that loyalty itself speaks for itself as well. As far as net new customers, I think that in a world of this increasing data complexity, it's exactly what you were saying, you need to find an approach that is going to scale. I keep hearing this word from the chief data officer, I kind of got something some going on today, I don't know how I scale it. How is this going to work in 2018 and 2019, in 2025? And it's just daunting for some of these guys. Especially going back to your point about compliance, right? So it's one thing if you have data sitting around, data so to speak, that you're not using it. But god forbid now, you got legal and regulatory concerns around it as well. So you have to get your arms around the data and that's precisely where Informatica can help because we've actually thought through these problems and we've talked about them. >> Most of them were a problem you solved because at the end of the day, we were talking about problems that have massive importance, big time consequences people can actually quantify. >> That's right. >> So what specific problem highest level do you solve is the most important, has the most consequences? >> Everything from ingestion of raw data sets from wherever like you said, in the cloud on premise, all the way through all the processes you need to make it fully usable. And we view that as one problem. There's other vendors who think that one aspect of that is a problem and it is worth solving. We really think, look at the end of the day, you got raw stuff and you have to turn it into useful stuff. Everything in there has to happen, so we might as well just give you everything and be very, very good at doing all those things. And so that's what we call enterprise cloud data management. It's everything from raw material to finished goods of insights. We want to be able to provide that in a consistent integrated and machine learning integrate it. >> Well you guys have a loyal customer base but to be fair and you kind of have to acknowledge that there is a point in time and not throw Informatica's away the big customers, big engagements. But there was a time in Informatica's history where you went private. There was some new management came in. There was a moment where the boat was taking on water, right? And you could almost look at it and say, hmm, you know, we're in this space. You guys retooled around that. Success to the team. Took it to another dimension. So that's the key thing. You know a lot of the companies become big and it's hard to get rid of. So the question is that's a statement. I think you guys done a great job. Yet, the boat might have taken on water, that's my opinion, but you can probably debate that. But I think as you get mature and you're in public, you just went private. But here's the thing, you guys have had a good product chop in Informatica, so I got to ask you the question. What cool things are you doing? Because remember, cool shiny new toys help put a little flash and glam on the nuts and bolts that scales. What are you guys doing? I know you just announced claire, some AI stuff. What's the hot stuff you're doing that's adding value? >> Yeah, absolutely, first of all, this kind of addresses your water comment as well. So we are probably one of the few vendors that spends almost about $200 million in R and D. And that hasn't changed through the acquisition. If anything, I think it actually increased a little bit because now our investors are even more committed to innovation. >> Well you're more nimble in private. A lot more nimble. >> Absolutely, a lot more ideas that are coming to the forefront. So there's never been any water just to be clear. But to answer your follow on question about some examples of this innovation. So I think Ahmed yesterday talked about some of our recent release as well but we really just keep pushing on this idea of, I know I keep saying this but it's this whole machine learning approach here of how can we learn more about the data? So one of the features, I'll give you an example, is if we can actually go look at a file and if we spot like a name and an address and some order information, that probably is a customer, right? And we know that right, because we've seen past data sets. So, there's examples of this pattern matching where you don't even have to have data that's filled out. And this is increasingly the way the data looks we are not dealing with relational tables anymore it's JSON files, it's web blogs, XML files, all of that data that you had to have that data scientists go through and parse and sift through, we just automatically recognize it now. If we can look for the data and understand it, we can match it. >> Put that in context in the order of benefits that, from the old way versus the current way, what's the pain levels? One versus the other, can you put context around that? In terms of, it's pretty significant. >> It's huge because again, back to this sort of volume and variety of data that people are trying to get into systems and do it very rapidly. I'll give you a really tangible customer case. So, this is a customer that presented at Informatica World a couple months ago. It's Jewelry TV, I can actually tell you the name. So there are one of these online kind of shopping sites and they've got a TV program that goes with the online site. So what they do is obviously when you promote something on TV, your orders go up online, right? They wanted to flip it around and they said, look, let's look at the web logs of the traffic that's on the website and then go promote that on the TV program. Because then you get a closed loop and start to have this explosion of sales. So they used Informatica, didn't have to do any of this hand coding. They just build this very quickly and with the graphical user interface that we provide, it leverages sparks streaming under the hood. So they are using all these technologies under the hood, they just didn't have to do any of the manual coding. Got this thing out in a couple days and it works. And they have been able to measure it and they're actually driving increased sales by taking the data and just getting it out to the people that need to see the data very, very quickly. So that's an example of a use case where this isn't just to your point about is this a small, incremental type of thing. No, there is a lot of money behind data if you can actually put it to good use. >> The consequences are grave and I think you've seen more and more, I mean the hacks just amplify it over and over again. It's not a cost center when you think about it. It has to be somehow configured differently as a profit center, even though it might not drive top line revenue directly like an app or anything else. It's not a cost center. If anything it will be treated as a profit center because you get hacked or someone's data is misused, you can be out of business. There is no profit. Look at the results of these hacks. >> The defensive argument is going to become very, very strong as these regulations come out. But, let's be clear, we work with a lot of the most advanced customers. There are people making money off of this. It can be a top line driver-- >> No it should be, it should be. That's exactly the mindset. So the final question for you before we break. I know we're out of time here. There are some chief data officers that are enabled, some aren't and that's just my observation. I don't want to pidgeonhole anyone, but some are enable to really drive change, some are just figureheads that are just managing the compliance risk and work for the CFO and say no to everything. I'm over-generalizing. But that's essentially how I see it. What's the problem with that? Because the cost center issue has, we've seen this moving before in the security business. Security should not be part of IT. That's it's own deal. >> Exactly. >> So we're kind of, this is kind of smoke, but we're coming out of the jungle here. Your thoughts on that. >> Yeah, you're absolutely right. We see a variety of models. We can see the evolution of those models and it's also very contextual to different industries. There are industries that are inherently more regulated, so that's why you're seeing the data people maybe more in those cost center areas that are focused on regulations and things like that. There's other industries that are a lot more consumer oriented. So for them, it makes more sense to have the data people be in a department that seems more revenue basing. So it's not entirely random. There are some reasons, that's not to say that's not the right model moving forward, but someday, you never know. There is a reason why this role became a CXO in the first place. Maybe it is somebody who reports to the CEO and they really view the data department as a strategic function. And it might take a while to get there, but I don't think it's going to take a long time. Again, we're talking about 40% growth in the data and these guys are realizing that now and I think we're going to see very quickly people moving out of the whole tool shed model, and moving to very systematic, repeatable practices. Sophisticated middleware platforms and-- >> As we say don't be a tool, be a platform. Murphy thanks so much for coming on to theCUBE, we really appreciate it. What's going on in Informatica real quick. Things good? >> Things are great. >> Good, awesome. Live from New York, this is theCUBE here at BigData NYC more live coverage continuing day three after this short break. (digital music)

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media soon to be called Strata AI, just a few. Cloud's the power source that you guys have been on that. the compliance that they need to ensure, And you start to see people who took that view. that you can maintain data for these kind So now, comes a land grab for the data layer, that shields you from the underlying infrastructure. So I put my skeptic hat on for a second and challenge you Sometimes you got a line of businesses and doing things, but for the most part of course, not a standard practice. So that's the future. is that it enables flexibility in the other two layers. the complexity's going to get very, very crazy very soon. they need to have it be purposeful. so that you can sort of set it and forget it. The issue is in the day, I need to have a reliable partner. So you have to get your arms around the data because at the end of the day, we were talking about all the processes you need to make it fully usable. But here's the thing, you guys have had a good product So we are probably one of the few vendors that spends almost Well you're more nimble in private. So one of the features, I'll give you an example, of benefits that, from the old way versus the current way, So what they do is obviously when you promote something on It's not a cost center when you think about it. of the most advanced customers. So the final question for you before we break. So we're kind of, this is kind of smoke, So for them, it makes more sense to have the data people Murphy thanks so much for coming on to theCUBE, Live from New York, this is theCUBE here at BigData NYC

ENTITIES

Entity	Category	Confidence
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Murthy Mathiprakasam	PERSON	0.99+
2018	DATE	0.99+
John Furrier	PERSON	0.99+
two	QUANTITY	0.99+
Europe	LOCATION	0.99+
Astrada	ORGANIZATION	0.99+
2025	DATE	0.99+
New York	LOCATION	0.99+
yesterday	DATE	0.99+
five years	QUANTITY	0.99+
2019	DATE	0.99+
three	QUANTITY	0.99+
New York City	LOCATION	0.99+
Murphy	PERSON	0.99+
eight years	QUANTITY	0.99+
two layers	QUANTITY	0.99+
one	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
first	QUANTITY	0.99+
today	DATE	0.99+
two fronts	QUANTITY	0.99+
1999	DATE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one problem	QUANTITY	0.99+
last night	DATE	0.98+
Ahmed	PERSON	0.98+
Yesterday	DATE	0.98+
2010	DATE	0.98+
one thing	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
NYC	LOCATION	0.98+
40% a year	QUANTITY	0.97+
Hadoop World	EVENT	0.97+
Equifax	ORGANIZATION	0.97+
day three	QUANTITY	0.96+
Strata Hadoop	EVENT	0.95+
Informatica World	ORGANIZATION	0.95+
two kind	QUANTITY	0.95+
2017	DATE	0.95+
about $200 million	QUANTITY	0.94+
one aspect	QUANTITY	0.94+
theCUBE	ORGANIZATION	0.94+
Informatica World	EVENT	0.91+
this week	DATE	0.9+
40% growth	QUANTITY	0.88+
BigData	ORGANIZATION	0.87+
three tier	QUANTITY	0.87+
day one	QUANTITY	0.87+
Strata Data	EVENT	0.85+
CXO	TITLE	0.85+
Cube	ORGANIZATION	0.84+
midtown Manhattan	LOCATION	0.83+
about 40% growth	QUANTITY	0.8+
couple months ago	DATE	0.8+
Strata AI	EVENT	0.79+
couple guys	QUANTITY	0.76+
claire	PERSON	0.71+
lot of money	QUANTITY	0.67+
One	QUANTITY	0.66+
BigData	TITLE	0.64+
CDO	TITLE	0.63+
couple days	QUANTITY	0.63+
JSON	TITLE	0.62+

Santhosh Mahendiran, Standard Chartered Bank | BigData NYC 2017

>> Announcer: Live, from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat techno music) >> Okay welcome back, we're live here in New York City. It's theCUBE's presentation of Big Data NYC, our fifth year doing this event in conjunction with Strata Data, formerly Strata Hadoop, formerly Strata Conference, formerly Hadoop World, we've been there from the beginning. Eight years covering Hadoop's ecosystem now Big Data. This is theCUBE, I'm John Furrier. Our next guest is Santhosh Mahendiran, who is the global head of technology analytics at Standard Chartered Bank. A practitioner in the field, here getting the data, checking out the scene, giving a presentation on your journey with Data at a bank, which is big financial obviously an adopter. Welcome to theCUBE. >> Thank you very much. >> So we always want to know what the practitioners are doing because at the end of the day there's a lot of vendors selling stuff here, so you got, everyone's got their story. End of the day you got to implement. >> That's right. >> And one of the themes is the data democratization which sounds warm and fuzzy, collaborating with data, this is all good stuff and you feel good and you move into the future, but at the end of the day it's got to have business value. >> That's right. >> And as you look at that, how do you look at the business value? Cause you want to be in the bleeding edge, you want to provide value and get that edge operationally. >> That's right. >> Where's the value in data democratization? How did you guys roll this out? Share your story. >> Okay, so let me start with the journey first before I come to the value part of it, right? So, data democratization is an outcome, but the journey has been something we started three years back. So what did we do, right? So we had some guiding principles to start our journey. The first was to say that we believed in the three S's, which is speed, scale, and it should be really, really flexible and super fast. So one of the challenges that we had was our historical data warehouses was entirely becoming redundant. And why was it? Because it was RDBMS centric, and it was extremely disparate. So we weren't able to scale up to meet the demands of managing huge chunks of data. So, the first step that we did was to re-pivot it to say that okay, let's embrace Hadoop. And what you mean by embracing is just not putting in the data lake, but we said that all our data will land into the data lake. And this journey started in 2015, so we have close to 80% of the Bank's data in the lake and it is end of day data right now and this data flows in on daily basis, and we have consumers who feed off that data. Now coming to your question about-- >> So the data lake's working? >> The data lake is working, up and running. >> People like it, you just got a good spot, batch 'em all you throw everything in the lake. >> So it is not real time, it is end of day. There is some data that is real-time, but the data lake is not entirely real-time, that I have to tell you. But one part is that the data lake is working. Second part to your question is how do I actually monetize it? Are you getting some value out of it? But I think that's where tools like Paxata has actually enabled us to accelerate this journey. So we call it data democratization. So the best part it's not about having the data. We want the business users to actually use the data. Typically, data has always been either delayed or denied in most of the cases to end-users and we have end-users waiting for the data but they don't get access to the data. It was done because primarily the size of the data was too huge and it wasn't flexible enough to be shared with. So how did tools like Paxata and the data lake help us? So what we did with data democratization is basically to say that "hey we'll get end-users to access the data first in a fast manner, in a self-service manner, and something that gives operational assurance to the data, so you don't hold the data and then say that you're going to get a subset of data to play with. We'll give you the entire set of data and we'll give you the right tools which you can play with. Most importantly, from an IT perspective, we'll be able to govern it. So that's the key about democratization. It's not about just giving them a tool, giving them all data and then say "go figure it out." It's about ensuring that "okay, you've got the tools, you've got the data, but we'll also govern it," so that you obviously have control over what they're doing. >> So now you govern it, they don't have to get involved in the governance, they just have access? >> No they don't need to. Yeah, they have access. So governance works both ways. We establish the boundaries. Look at it as a referee, and then say that "okay, there are guidelines that you don't," and within the datasets that key people have access to, you can further set rules. Now, coming back to specific use cases, I can talk about two specific cases which actually helped us to move the needle. The first is on stress testing, so being a financial institution, we typically have to report various numbers to our regulators, etc. The turnaround time was extremely huge. These kind of stress testing typically involve taking huge amount-- >> What were some of the turnaround times? >> Normally it was two to three weeks, some cases a month-- >> Wow. >> So we were able to narrow it down to days, but what we essentially did was as with any stress testing or reporting, it involved taking huge amounts of data, crunching them and then running some models and then showing the output, basically a number of transformations involved. Earlier, you first couldn't access the entire dataset, so that we solved-- >> So check, that was a good step one-- >> That was step one. >> But was there automation involved in that, the Paxata piece? >> Yeah, I wouldn't say it was fully automated end-to-end, but there was definitely automation given the fact that now you got Paxata to work off the data rather than someone extracting the data and then going off and figuring what needs to be done. The ability to work off the entire dataset was a big plus. So stress testing, bringing down the cycle time. The second one use case I can talk about is again anti-money laundering, and in our financial crime compliance space. We had processes that took time to report, given the clunkiness in the various handoffs that we needed to do. But again, empowering the users, giving the tool to them and then saying "hey, this"-- >> How about know your user, because we have to anti-money launder, you need to have to know your user base, that's all set their too? >> Yeah. So the good part is know the user, know your customer, KYCs all that part is set, but the key part is making sure the end-users are able to access the data much more earlier in the life cycle and are able to play with it. In the case of anti-money laundering, again first question of three weeks to four weeks was shortened down to question of days by giving tools like Paxata again in a structured manner and with which we're able to govern. >> You control this, so you knew what you were doing, but you let their tools do the job? >> Correct, so look at it this way. Typically, the data journey has always been IT-led. It has never been business-led. If you look at the generations of what happens is, you source the data which is IT-led, then you model the data which is IT-led, then you prepare then massage the data which is again IT-led and then you have tools on top of it which is again IT-led so the end-users get it only after the fourth stage. Now look at the generations within. All these life cycles apart from the fact that you source the data which is typically an IT issue, the rest need to be done by the actual business users and that's what we did. That's the progression of the generations in which we now we're in the third generation as I call it where our role is just to source the data and then say, "yeah we'll govern it in the matter and then preparation-- >> It's really an operating system and we were talking with Aaron with Elation's co-founder, we used the analogy of a car, how this show was like a car show engine show, what's in the engine and the technology and then it evolved every year, now it's like we're talking about the cars, now we're talking about driver experience-- >> That's right. >> At the end of the day, you just want to drive. You don't really care what's under the hood, you do but you don't, but there's those people who do care what's under the hood, so you can have best of both worlds. You've got the engines, you set up the infrastructure, but ultimately, you in the business side, you just want to drive, that's what's you're getting at? >> That's right. The time-to-market and speed to empower the users to play around with the data rather than IT trying to churn the data and confine access to data, that's a thing of the past. So we want more users to have faster access to data but at the same time govern it in a seamless manner. The word governance is still important because it's not about just give the data. >> And seamless is key. >> Seamless is key. >> Cause if you have democratization of data, you're implying that it is community-oriented, means that it's available, with access privileges all transparently or abstracted away from the users. >> Absolutely. >> So here's the question I want to ask you. There's been talk, I've been saying it for years going back to 2012 that an abstraction layer, a data layer will evolve and that'll be the real key. And then here in this show, I heard things like intelligent information fabric that is business, consumer-friendly. Okay, it's a mouthful, but intelligent information fabric in essence talks about an abstraction layer-- >> That's right. >> That doesn't really compromise anything but gives some enablement, creates some enabling value-- >> That's right. >> For software, how do you see that? >> As the word suggests, the earlier model was trying to build something for the end-users, but not which was end-user friendly, meaning to say, let me just give you a simple example. You had a data model that existed. Historically the way that we have approached using data is to say "hey, I've got a model and then let's fit that data into this model," without actually saying that "does this model actually serve the purpose?" You abstracted the model to a higher level. The whole point about intelligent data is about saying that, I'll give you a very simple analogy. Take zip code. Zipcode in US is very different from zipcode in India, it's very different from zipcode in Singapore. So if I had the ability for my data to come in, to say that "I know it's a zipcode, but this zipcode belongs to US, this zipcode belongs to Singapore, and this zipcode belongs to India," and more importantly, if I can further rev it up a notch, if I say that "this belongs to India, and this zipcode is valid." Look at where I'm going with intelligent sense. So that's what's up. If you look at the earlier model, you have to say that "yeah, this is a placeholder for zipcode." Now that makes sense, but what are you doing with it? >> Being a relational database model, it's just a field in a schema, you're taking it and abstracting it and creating value out of it. >> Precisely. So what I'm actually doing is accelerating the adoption, I'm making it more simpler for users to understand what the data is. So I don't need to as a user figure out "I got a zipcode, now is it a Singapore, India or what zipcode." >> So all this automation, Paxata's got a good system, we'll come back to the Paxata question in a second, I do want to drill down on that. But the big thing that I've been seeing at the show, and again Dave Alonte, my partner, co-CEO of Silicon Angle, we always talk about this all the time. He's more less bullish on Hadoop than I am. Although I love Hadoop, I think it's great but it's not the end-all, be-all. It's a great use case. We were critical early on and the thing we were critical on it was it was too much time being spent on the engine and how things are built, not on the business value. So there's like a lull period in the business where it was just too costly-- >> That's right. >> Total cost of ownership was a huge, huge problem. >> That's right. >> So now today, how did you deal with that and are you measuring the TCO or total cost of ownership cause at the end of the day, time to value, which is can you be up and running in 90 days with value and can you continue to do that, and then what's the overall cost to get there. Thoughts? >> So look I think TCO always underpins any technology investment. If someone said I'm doing a technology investment without thinking about TCO, I don't think he's a good technology leader, so TCO is obviously a driving factor. But TCO has multiple components. One is the TCO of the solution. The other aspect is TCO of what my value I'm going to get out of this system. So talking from an implementation perspective, what I look at as TCO is my whole ecosystem which is my hardware, software, so you spoke about Hadoop, you spoke about RDBMS, is Hadoop cheaper, etc? I don't want to get into that debate of cheaper or not but what I know is the ecosystem is becoming much, much more cheaper than before. And when I talk about ecosystem, I'm talking about RDBMS tools, I'm talking about Hadoop, I'm talking about BI tools, I'm talking about governance, I'm talking about this whole framework becoming cheaper. And it is also underpinned by the fact that hardware is also becoming cheaper. So the reality is all components in the whole ecosystem are becoming cheaper and given the fact that software is also becoming more open-sourced and people are open to using open-source software, I think the whole question of TCO becomes a much more pertinent question. Now coming to your point, do you measure it regularly? I think the honest answer is I don't think we are doing a good job of measuring it that well, but we do have that as one of the criteria for us to actually measure the success of our project. The way that we do is our implementation cost, at the time of writing out our PETs, we call it PETs, which is the Project Execution Document, we talk about cost. We say that "what's the implementation cost?" What are the business cases that are going to be an outcome of this? I'll give you an example of our anti-money laundering. I told you we reduced our cycle time from few weeks to a few days, and that in turn means the number of people involved in this whole process, you're reducing the overheads and the operational folks involved in it. That itself tells you how much we're able to save. So definitely, TCO is there and to say that-- >> And you are mindful of, it's what you look at, it's key. TCO is on your radar 100% you evaluate that into your deals? >> Yes, we do. >> So Paxata, what's so great about Paxata? Obviously you've had success with them. You're a customer, what's the deal. Was it the tech, was it the automation, the team? What was the key thing that got you engaged with them or specifically why Paxata? >> Look, I think the key to partnership there cannot be one ingredient that makes a partnership successful, I think there are multiple ingredients that make a partnership successful. We were one of the earliest adopters of Paxata. Given that we're a bank and we have multiple different systems and we have lot of manual processing involved, we saw Paxata as a good fit to govern these processes and ensure at the same time, users don't lose their experience. The good thing about Paxata that we like was obviously the simplicity and the look and feel of the tool. That's number one. Simplicity was a big point. The second one is about scale. The scale, the fact that it can take in millions of roles, it's not about just working off a sample of data. It can work on the entire dataset. That's very key for us. The third is to leverage our ecosystem, so it's not about saying "okay you give me this data, let me go figure out what to do and then," so Paxata works off the data lake. The fact that it can leverage the lake that we built, the fact that it's a simple and self-preparation tool which doesn't require a lot of time to bootstrap, so end-use people like you-- >> So it makes it usable. >> It's extremely user-friendly and usable in a very short period of time. >> And that helped with the journey? >> That really helped with the journey. >> Santosh, thanks so much for sharing. Santosh Mahendiran, who is the Global Tech Lead at the Analytics of the Bank at Standard Chartered Bank. Again, financial services, always a great early adopter, and you get success under your belt, congratulations. Data democratization is huge and again, it's an ecosystem, you got all that anti-money laundering to figure out, you got to get those reports out, lot of heavylifting? >> That's right, >> So thanks so much for sharing your story. >> Thank you very much. >> We'll give you more coverage after this short break, I'm John Furrier, stay tuned. More live coverage in New York City, its theCube.

Published Date : Sep 29 2017

SUMMARY :

Brought to you by SiliconANGLE Media here getting the data, checking out the scene, End of the day you got to implement. but at the end of the day it's got to have business value. how do you look at the business value? Where's the value in data democratization? So one of the challenges that we had was People like it, you just got a good spot, in most of the cases to end-users and we have end-users guidelines that you don't," and within the datasets that Earlier, you first couldn't access the entire dataset, So stress testing, bringing down the cycle time. So the good part is know the user, know your customer, That's the progression of the generations in which we At the end of the day, you just want to drive. but at the same time govern it in a seamless manner. Cause if you have democratization of data, So here's the question I want to ask you. So if I had the ability for my data to come in, and creating value out of it. So I don't need to as a user figure out "I got a zipcode, But the big thing that I've been seeing at the show, at the end of the day, time to value, which is can you be So the reality is all components in the whole ecosystem And you are mindful of, it's what you look at, it's key. Was it the tech, was it the automation, the team? The fact that it can leverage the lake that we built, It's extremely user-friendly and usable in a very at the Analytics of the Bank at Standard Chartered Bank. We'll give you more coverage after this short break,

ENTITIES

Entity	Category	Confidence
Dave Alonte	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
three weeks	QUANTITY	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
2012	DATE	0.99+
2015	DATE	0.99+
Santosh Mahendiran	PERSON	0.99+
two	QUANTITY	0.99+
Aaron	PERSON	0.99+
US	LOCATION	0.99+
Santhosh Mahendiran	PERSON	0.99+
Singapore	LOCATION	0.99+
Santosh	PERSON	0.99+
four weeks	QUANTITY	0.99+
TCO	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
90 days	QUANTITY	0.99+
India	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
fifth year	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
one ingredient	QUANTITY	0.99+
third	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.99+
one part	QUANTITY	0.99+
millions	QUANTITY	0.99+
first	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
Silicon Angle	ORGANIZATION	0.99+
Second part	QUANTITY	0.98+
third generation	QUANTITY	0.98+
fourth stage	QUANTITY	0.98+
two specific cases	QUANTITY	0.98+
both ways	QUANTITY	0.98+
one	QUANTITY	0.98+
BigData	ORGANIZATION	0.98+
NYC	LOCATION	0.98+
both worlds	QUANTITY	0.98+
first step	QUANTITY	0.97+
three years back	DATE	0.97+
second one	QUANTITY	0.97+
One	QUANTITY	0.97+
2017	DATE	0.96+
Hadoop	TITLE	0.96+
Strata Data	ORGANIZATION	0.96+
Strata Hadoop	ORGANIZATION	0.94+
step one	QUANTITY	0.94+
first question	QUANTITY	0.93+
a month	QUANTITY	0.92+
Elation	ORGANIZATION	0.9+
Data	EVENT	0.89+
2017	EVENT	0.89+
80%	QUANTITY	0.88+
Paxata	TITLE	0.88+
Big Data	EVENT	0.84+
theCube	ORGANIZATION	0.83+

Aaron Kalb, Alation | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's the Cube. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone, we are here live in New York City, in Manhattan for BigData NYC, our event we've been doing for five years in conjunction with Strata Data which is formerly Strata Hadoop, which was formerly Strata Conference, formerly Hadoop World. We've been covering the big data space going on ten years now. This is the Cube. I'm here with Aaron Kalb, whose Head of Product and co-founder at Alation. Welcome to the cube. >> Aaron Kalb: Thank you so much for having me. >> Great to have you on, so co-founder head of product, love these conversations because you're also co-founder, so it's your company, you got a lot of equity interest in that, but also head of product you get to have the 20 mile stare, on what the future looks, while inventing it today, bringing it to market. So you guys have an interesting take on the collaboration of data. Talk about what the means, what's the motivation behind that positioning, what's the core thesis around Alation? >> Totally so the thing we've observed is a lot of people working in the data space, are concerned about the data itself. How can we make it cheaper to store, faster to process. And we're really concerned with the human side of it. Data's only valuable if it's used by people, how do we help people find the data, understand the data, trust in the data, and that involves a mix of algorithmic approaches and also human collaboration, both human to human and human to computer to get that all organized. >> John Furrier: It's interesting you have a symbolics background from Stanford, worked at Apple, involved in Siri, all this kind of futuristic stuff. You can't go a day without hearing about Alexia is going to have voice-activated, you've got Siri. AI is taking a really big part of this. Obviously all of the hype right now, but what it means is the software is going to play a key role as an interface. And this symbolic systems almost brings on this neural network kind of vibe, where objects, data, plays a critical role. >> Oh, absolutely, yeah, and in the early days when we were co-founding the company, we talked about what is Siri for the enterprise? Right, I was you know very excited to work on Siri, and it's really a kind of fun gimmick, and it's really useful when you're in the car, your hands are covered in cookie dough, but if you could answer questions like what was revenue last quarter in the UK and get the right answer fast, and have that dialogue, oh do you mean fiscal quarter or calendar quarter. Do you mean UK including Ireland, or whatever it is. That would really enable better decisions and a better outcome. >> I was worried that Siri might do something here. Hey Siri, oh there it is, okay be careful, I don't want it to answer and take over my job. >> (laughs) >> Automation will take away the job, maybe Siri will be doing interviews. Okay let's take a step back. You guys are doing well as a start up, you've got some great funding, great investors. How are you guys doing on the product? Give us a quick highlight on where you guys are, obviously this is BigData NYC a lot going on, it's Manhattan, you've got financial services, big industry here. You've got the Strata Data event which is the classic Hadoop industry that's morphed into data. Which really is overlapping with cloud, IoTs application developments all kind of coming together. How do you guys fit into that world? >> Yeah, absolutely, so the idea of the data lake is kind of interesting. Psychologically it's sort of a hoarder mentality, oh everything I've ever had I want to keep in the attic, because I might need it one day. Great opportunity to evolve these new streams of data, with IoT and what not, but just cause you can get to it physically doesn't mean it's easy to find the thing you want, the needle in all that big haystack and to distinguish from among all the different assets that are available, which is the one that is actually trustworthy for your need. So we find that all these trends make the need for a catalog to kind of organize that information and get what you want all the more valuable. >> This has come up a lot, I want to get into the integration piece and how you're dealing with your partnerships, but the data lake integration has been huge, and having the catalog has come up with, has been the buzz. Foundationally if you will saying catalog is important. Why is it important to do the catalog work up front, with a lot of the data strategies? >> It's a great question, so, we see data cataloging as step zero. Before you can prep the data in a tool like Trifacta, PACSAT, or Kylo. Before you can visualize it in a tool like Tableau, or MicroStrategy. Before you can do some sort of cool prediction of what's going to happen in the future, with a data science engine, before any of that. These are all garbage in garbage out processes. The step zero is find the relevant data. Understand it so you can get it in the right format. Trust that it's good and then you can do whatever comes next >> And governance has become a key thing here, we've heard of the regulations, GDPR outside of the United States, but also that's going to have an arms length reach over into the United States impact. So these little decisions, and there's going to be an Equifax someday out there. Another one's probably going to come around the corner. How does the policy injection change the catalog equation? A lot of people are building machine learning algorithms on top of catalogs, and they're worried they might have to rewrite everything. How do you balance the trade off between good catalog design and flexibility on the algorithm side? >> Totally yes it's a complicated thing with governance and consumption right. There's people who are concerned with keeping the data safe, and there are people concerned with turning that data into real value, and these can seem to be at odds. What we find is actually a catalog as a foundation for both, and they are not as opposed as they seem. What Alation fundamentally does is we make a map of where the data is, who's using what data, when, how. And that can actually be helpful if your goal is to say let's follow in the footsteps of the best analyst and make more insights generated or if you want to say, hey this data is being used a lot, let's make sure it's being used correctly. >> And by the right people. >> And by the right people exactly >> Equifax they were fishing that pond dry months, months before it actually happened. With good tools like this they might have seen this right? Am I getting it right? >> That's exactly right, how can you observe what's going on to make sure it's compliant and that the answers are correct and that it's happening quickly and driving results. >> So in a way you're taking the collective intelligence of the user behavior and using that into understanding what to do with the data modeling? >> That's exactly right. We want to make each person in your organization as knowledgeable as all of their peers combined. >> So the benefit then for the customer would be if you see something that's developing you can double down on it. And if the users are using a lot of data, then you can provision more technology, more software. >> Absolutely, absolutely. It's sort of like when I was going to Stanford, there was a place where the grass was all dead, because people were riding their bikes diagonally across it. And then somebody smart was like, we're going to put a real gravel path there. So the infrastructure should follow the usage, instead of being something you try to enforce on people. >> It's a classic design meme that goes around. Good design is here, the more effective design is the path. >> Exactly. >> So let's get into the integration. So one of the hot topics here this year obviously besides cloud and AI, with cloud really being more the driver, the tailwind for the growth, AI being more the futuristic head room, is integration. You guys have some partnerships that you announced with integration, what are some of the key ones, and why are they important? >> Absolutely, so, there have been attempts in the past to centralize all the data in one place have one warehouse or one lake have one BI tool. And those generally fail, for different reasons, different teams pick different stacks that work for them. What we think is important is the single source of reference One hub with spokes out to all those different points. If you think about it it's like Google, it's one index of the whole web even though the web is distributed all over the place. To make that happen it's very important that we have partnerships to get data in from various sources. So we have partnerships with database vendors, with Cloudera and Hortonworks, with different BI tools. What's new are a few things. One is with Cloudera Navigator, they have great technical metadata around security and lineage over HGFS, and that's a way to bolster our catalog to go even deeper into what's happening in the files before things get surfaced and higher for places where we have a deeper offering today. >> So it's almost a connector to them in a way, you kind of share data. >> That's exactly right, we've a lot of different connectors, this is one new one that we have. Another, go ahead. >> I was going to go ahead continue. >> I was just going to say another place that is exciting is data prep tools, so Trifacta and Paxata are both places where you can find and understand an alation and then begin to manipulate in those tools. We announced with Paxata yesterday, the ability to click to profile, so if you want to actually see what's in some raw compressed avro file, you can see that in one click. >> It's interesting, Paxata has really been almost lapping, Trifacta because they were the leader in my mind, but now you've got like a Nascar race going on between the two firms, because data wrangling is a huge issue. Data prep is where everyone is stuck right now, they just want to do the data science, it's interesting. >> They are both amazing companies and I'm happy to partner with both. And actually Trifacta and Alation have a lot of joint customers we're psyched to work with as well. I think what's interesting is that data prep, and this is beginning to happen with analyst definitions of that field. It isn't just preparing the data to be used, getting it cleaned and shaped, it's also preparing the humans to use the data giving them the confidence, the tools, the knowledge to know how to manipulate it. >> And it's great progress. So the question I wanted to ask is now the other big trend here is, I mean it's kind of a subtext in this show, it's not really front and center but we've been seeing it kind of emerge as a concept, we see in the cloud world, on premise vs cloud. On premise a lot of people bring in the dev ops model in, and saying I may move to the cloud for bursting and some native applications, but at the end of the day there is a lot of work going on on premise. A lot of companies are kind of cleaning house, retooling, replatforming, whatever you want to do resetting. They are kind of getting their house in order to do on prem cloud ops, meaning a business model of cloud operations on site. A lot of people doing that, that will impact the story, it's going to impact some of the server modeling, that's a hot trend. How do you guys deal with the on premise cloud dynamic? >> Totally, so we just want to do what's right for the customer, so we deploy both on prem and in the cloud and then from wherever the Alation server is it will point to usually a mix of sources, some that are in the cloud like vetshifter S3 often with Amazon today, and also sources that are on prem. I do think I'm seeing a trend more and more toward the cloud and we have people that are migrating from HGFS to S3 is one thing we hear a lot about it. Strata with sort of dupe interest. But I think what's happening is people are realizing as each Equifax in turn happens, that this old wild west model of oh you surround your bank with people on horseback and it's physically in one place. With data it isn't like that, most people are saying I'd rather have the A+ teams at Salesforce or Amazon or Google be responsible for my security, then the people I can get over in the midwest. >> And the Paxata guys have loved the term Data Democracy, because that is really democratization, making the data free but also having the governance thing. So tell me about the Data Lake governance, because I've never loved the term Data Lake, I think it's more of a data ocean, but now you see data lake, data lake, data lake. Are they just silos of data lakes happening now? Are people trying to connect them? That's key, so that's been a key trend here. How do you handle the governance across multiple data lakes? >> That's right so the key is to have that single source of reference, so that regardless of which lake or warehouse, or little siloed Sequel server somewhere, that you can search in a single portal and find that thing no matter where it is. >> John: Can you guys do that? >> We can do that, yeah, I think the metaphor for people who haven't seen it really is Google, if you think about it, you don't even know what physical server a webpage is hosted from. >> Data lakes should just be invisible >> Exactly. >> So your interfacing with multiple data lakes, that's a value proposition for you. >> That's right so it could be on prem or in the cloud, multi-cloud. >> Can you share an example of a customer that uses that and kind of how it's laid out? >> Absolutely, so one great example of an interesting data environment is eBay. They have the biggest teradata warehouse in the world. They also have I believe two huge data lakes, they have hive on top of that, and Presto is used to sort of virtualize it across a mixture of teradata, and hive and then direct Presto query It gets very complicated, and they have, they are a very data driven organization, so they have people who are product owners who are in jobs where data isn't in their job title and they know how to look at excel and look at numbers and make choices, but they aren't real data people. Alation provides that accessibility so that they can understand it. >> We used to call the Hadoop world the car show for the data world, where for a long time it was about the engine what was doing what, and then it became, what's the car, and now how's it drive. Seeing that same evolution now where all that stuff has to get done under the hood. >> Aaron: Exactly. >> But there are still people who care about that, right. They are the mechanics, they are the plumbers, whatever you want to call them, but then the data science are the guys really driving things and now end users potentially, and even applications bots or what nots. It seems to evolve, that's where we're kind of seeing the show change a little bit, and that's kind of where you see some of the AI things. I want to get your thoughts on how you or your guys are using AI, how you see AI, if it's AI at all if it's just machine learning as a baby step into AI, we all know what AI could be, but it's really just machine learning now. How do you guys use quote AI and how has it evolved? >> It's a really insightful question and a great metaphor that I love. If you think about it, it used to be how do you build the car, and now I can drive the car even though I couldn't build it or even fix it, and soon I don't even have to drive the car, the car will just drive me, all I have to know is where I want to go. That's sortof the progression that we see as well. There's a lot of talk about deep learning, all these different approaches, and it's super interesting and exciting. But I think even more interesting than the algorithms are the applications. And so for us it's like today how do we get that turn by turn directions where we say turn left at the light if you want to get there And eventually you know maybe the computer can do it for you The thing that is also interesting is to make these algorithms work no matter how good your algorithm is it's all based on the quality of your training data. >> John: Which is a historical data. Historical data in essence the more historical data you have you need that to train the data. >> Exactly right, and we call this behavior IO how do we look at all the prior human behavior to drive better behavior in the future. And I think the key for us is we don't want to have a bunch of unpaid >> John: You can actually get that URL behavioral IO. >> We should do it before it's too late (Both laugh) >> We're live right now, go register that Patrick. >> Yeah so the goal is we don't want to have a bunch of unpaid interns trying to manually attack things, that's error prone and that's slow. I look at things like Luis von Ahn over at CMU, he does a thing where as you're writing in a CAPTCHA to get an email account you're also helping Google recognize a hard to read address or a piece of text from books. >> John: If you shoot the arrow forward, you just take this kind of forward, you almost think augmented reality is a pretext to what we might see for what you're talking about and ultimately VR are you seeing some of the use cases for virtual reality be very enterprise oriented or even end consumer. I mean Tom Brady the best quarterback of all time, he uses virtual reality to play the offense virtually before every game, he's a power user, in pharma you see them using virtual reality to do data mining without being in the lab, so lab tests. So you're seeing augmentation coming in to this turn by turn direction analogy. >> It's exactly, I think it's the other half of it. So we use AI, we use techniques to get great data from people and then we do extra work watching their behavior to learn what's right. And to figure out if there are recommendations, but then you serve those recommendations, either it's Google glasses it appears right there in your field of view. We just have to figure out how do we make sure, that in a moment of you're making a dashboard, or you're making a choice that you have that information right on hand. >> So since you're a technical geek, and a lot of folks would love to talk about this, so I'll ask you a tough question cause this is something everyone is trying to chase for the holy grail. How do you get the right piece of data at the right place at the right time, given that you have all these legacy silos, latencies and network issues as well, so you've got a data warehouse, you've got stuff in cold storage, and I've got an app and I'm doing something, there could be any points of data in the world that could be in milliseconds potentially on my phone or in my device my internet of thing wearable. How do you make that happen? Because that's the struggle, at the same time keep all the compliance and all the overhead involved, is it more compute, is it an architectural challenge how do you view that because this is the big challenge of our time. >> Yeah again I actually think it's the human challenge more than the technology challenge. It is true that there is data all over the place kind of gathering dust, but again if you think about Google, billions of web pages, I only care about the one I'm about to use. So for us it's really about being in that moment of writing a query, building a chart, how do we say in that moment, hey you're using an out of date definition of profit. Or hey the database you chose to use, the one thing you chose out of the millions that is actually is broken and stale. And we have interventions to do that with our partners and through our own first party apps that actually change how decisions get made at companies. >> So to make that happen, if I imagine it, you'd have to need access to the data, and then write software that is contextually aware to then run, compute, in context to the user interaction. >> It's exactly right, back to the turn by turn directions concept you have to know both where you're trying to go and where you are. And so for us that can be the from where I'm writing a Sequel statement after join we can suggest the table most commonly joined with that, but also overlay onto that the fact that the most commonly joined table was deprecated by a data steward data curator. So that's the moment that we can change the behavior from bad to good. >> So a chief data officer out there, we've got to wrap up, but I wanted to ask one final question, There's a chief data officer out there they might be empowered or they might be just a CFO assistant that's managing compliance, either way, someone's going to be empowered in an organization to drive data science and data value forward because there is so much proof that data science works. From military to play you're seeing examples where being data driven actually has benefits. So everyone is trying to get there. How do you explain the vision of Alation to that prospect? Because they have so much to select from, there's so much noise, there's like, we call it the tool shed out there, there's like a zillion tools out there there's like a zillion platforms, some tools are trying to turn into something else, a hammer is trying to be a lawnmower. So they've got to be careful on who the select, so what's the vision of Alation to that chief data officer, or that person in charge of analytics to scale operational analytics. >> Absolutely so we say to the CDO we have a shared vision for this place where your company is making decisions based on data, instead of based on gut, or expensive consultants months too late. And the way we get there, the reason Alation adds value is, we're sort of the last tool you have to buy, because with this lake mentality, you've got your tool shed with all the tools, you've got your library with all the books, but they're just in a pile on the floor, if you had a tool that had everything organized, so you just said hey robot, I need an hammer and this size nail and this text book on this set of information and it could just come to you, and it would be correct and it would be quick, then you could actually get value out of all the expense you've already put in this infrastructure, that's especially true on the lake. >> And also tools describe the way the works done so in that model tools can be in the tool shed no one needs to know it's in there. >> Aaron: Exactly. >> You guys can help scale that. Well congratulations and just how far along are you guys in terms of number of employees, how many customers do you have? If you can share that, I don't know if that's confidential or what not >> Absolutely, so we're small but growing very fast planning to double in the next year, and in terms of customers, we've got 85 customers including some really big names. I mentioned eBay, Pfizer, Safeway Albertsons, Tesco, Meijer. >> And what are they saying to you guys, why are they buying, why are they happy? >> They share that same vision of a more data driven enterprise, where humans are empowered to find out, understand, and trust data to make more informed choices for the business, and that's why they come and come back. >> And that's the product roadmap, ethos, for you guys that's the guiding principle? >> Yeah the ultimate goal is to empower humans with information. >> Alright Aaron thanks for coming on the Cube. Aaron Kalb, co-founder head of product for Alation here in New York City for BigData NYC and also Strata Data I'm John Furrier thanks for watching. We'll be right back with more after this short break.

Published Date : Sep 28 2017

SUMMARY :

Brought to you by This is the Cube. Great to have you on, so co-founder head of product, Totally so the thing we've observed is a lot Obviously all of the hype right now, and get the right answer fast, and have that dialogue, I don't want it to answer and take over my job. How are you guys doing on the product? doesn't mean it's easy to find the thing you want, and having the catalog has come up with, has been the buzz. Understand it so you can get it in the right format. and flexibility on the algorithm side? and make more insights generated or if you want to say, Am I getting it right? That's exactly right, how can you observe what's going on We want to make each person in your organization So the benefit then for the customer would be So the infrastructure should follow the usage, Good design is here, the more effective design is the path. You guys have some partnerships that you announced it's one index of the whole web So it's almost a connector to them in a way, this is one new one that we have. the ability to click to profile, going on between the two firms, It isn't just preparing the data to be used, but at the end of the day there is a lot of work for the customer, so we deploy both on prem and in the cloud because that is really democratization, making the data free That's right so the key is to have that single source really is Google, if you think about it, So your interfacing with multiple data lakes, on prem or in the cloud, multi-cloud. They have the biggest teradata warehouse in the world. the car show for the data world, where for a long time and that's kind of where you see some of the AI things. and now I can drive the car even though I couldn't build it Historical data in essence the more historical data you have to drive better behavior in the future. Yeah so the goal is and ultimately VR are you seeing some of the use cases but then you serve those recommendations, and all the overhead involved, is it more compute, the one thing you chose out of the millions So to make that happen, if I imagine it, back to the turn by turn directions concept you have to know How do you explain the vision of Alation to that prospect? And the way we get there, no one needs to know it's in there. If you can share that, I don't know if that's confidential planning to double in the next year, for the business, and that's why they come and come back. Yeah the ultimate goal is Alright Aaron thanks for coming on the Cube.

ENTITIES

Entity	Category	Confidence
Luis von Ahn	PERSON	0.99+
eBay	ORGANIZATION	0.99+
Aaron Kalb	PERSON	0.99+
Pfizer	ORGANIZATION	0.99+
John	PERSON	0.99+
Aaron	PERSON	0.99+
Tesco	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Safeway Albertsons	ORGANIZATION	0.99+
Siri	TITLE	0.99+
Google	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
UK	LOCATION	0.99+
20 mile	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
BigData	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
two firms	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
Meijer	ORGANIZATION	0.99+
ten years	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Trifacta	ORGANIZATION	0.99+
85 customers	QUANTITY	0.99+
Alation	ORGANIZATION	0.99+
Patrick	PERSON	0.99+
both	QUANTITY	0.99+
Strata Data	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
United States	LOCATION	0.99+
Paxata	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
excel	TITLE	0.99+
Manhattan	LOCATION	0.99+
last quarter	DATE	0.99+
Ireland	LOCATION	0.99+
GDPR	TITLE	0.99+
Tom Brady	PERSON	0.99+
each person	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.98+
next year	DATE	0.98+
NYC	LOCATION	0.98+
one	QUANTITY	0.98+
this year	DATE	0.98+
yesterday	DATE	0.98+
today	DATE	0.97+
one lake	QUANTITY	0.97+
Nascar	ORGANIZATION	0.97+
one warehouse	QUANTITY	0.97+
Strata Data	EVENT	0.96+
Tableau	TITLE	0.96+
One	QUANTITY	0.96+
Both laugh	QUANTITY	0.96+
billions of web pages	QUANTITY	0.96+
single portal	QUANTITY	0.95+

Gus Horn, NetApp | Big Data NYC 2017

>> Narrator: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hello everyone. Welcome back to our CUBE coverage here in New York City, live in Manhattan for theCUBE's coverage of Big Data NYC, our event we've had five years in a row. Eight years covering Big Data, Hadoop World originally in 2010, then it moved to Hadoop Strata Conference, Strata Hadoop, now called Strata Data. In conjunction with that event we have our Big Data NYC event. SiliconANGLE Media's CUBE. I'm John Furrier, your cohost, with Jim Kobielus, analyst at wikibon.com for Big Data. Our next guest is Gus Horn who is the global Big Data analytics and CTO ambassador for NetApp, machine learning, AI, guru, gives talks all around the world. Great to have you, thanks for coming in and spending the time with us. >> Thanks, John, appreciate it. >> So we were talking before the camera came on, you're doing a lot of jet setting really around Evangelize But also educating a lot of folks on the impact of machine learning and AI in particular. Obviously AI we love, we love the hype. It motivates young kids getting into software development, computer science, makes it kind of real for them. But still, a lot more ways to go in terms of what AI really is. And that's good, but what is really going on with AI? Machine learning is where the rubber hits the road. That seems to be the hot air, that's your wheelhouse. Give us the update, where is AI now? Obviously machine learning is super important, it's one of the hot topics here in New York City. >> Well, I think it's super important globally, and it's going to be disruptive. So before we were talking, I said how this is going to be a disruptive technology for all of society. But regardless of that, what machine learning is bringing is a methodology to deal with this influx of IOT data, whether it's autonomous vehicles, active safety in cars, or even looking at predictive analytics for complex manufacturing processes like an automotive assembly line. Can I predict when a welding machine is going to break and can I take care of it during a scheduled maintenance cycle so I don't take the whole line down? Because the impacts are really cascading and dramatic when you have a failure that you couldn't predict. And what we're finding is that Hadoop and the Big Data space is uniquely positioned to help solve these problems, both from quality control and process management and how you can get better uptime, better quality, and then we take it full circle and how can I build an environment to help automotive manufacturers to do test and DEV and retest and retraining and learning of the AI modules and the AI engines that have to exist in these autonomous vehicles. And the only way you can do that is with data, and managing data like a data steward, which is what we do at NetApp. So for us, it's not just about the solution, but the underlying architecture is going to be absolutely critical in setting up the agility you'll need in this environment, and the flexibility you need. Because the other thing that's happening in the space right now is that technology's evolving very quickly. You see this with the DGX from NVIDIA, you see P100 cards from NVIDIA. So I have an architecture that we have in Germany right now where we have multiple NVIDIA cards in our Hadoop cluster that we've architected. But I don't make NVIDIA cards. I don't make servers. I make really good storage. And I have an ecosystem that helps manage where that data is when it needs to be there, and especially when it doesn't need to there so we can get new data. >> Yeah, Gus, we were talking also before camera, the folks watching that you were involved with AI going way back to in your days at MIT, and that's super important. Because a lot of people, the pattern that we're seeing across all the events that we go to, and we'll be at the NetApp event next week, Insight, in Vegas, but the pattern is pretty clear. You have one camp, oh, AI is just the same thing that was going on in the late '70s, '80s, and '90s, but it now has a new dynamic with the cloud. So a lot of people are saying okay, there's been some concepts that have been developed in AI, in computer science, but now with the evolution of hyperconvergence infrastructure, with cloud computing, with now a new architecture, it seems to be turbocharging and accelerating. So I'd like to get your thoughts on why is it so hot now? Obviously machine learning, everyone should be on that, no doubt, but you got the dynamic of the cloud. And NetApp's in the storage business, so that's stores data, I get that. What's the dynamic with the cloud? Because that seems to be the accelerant right now with open source and in with AI. >> Yeah, I think you got to stay focused. The cloud is going to be playing an integral role in everything. And what we do at NetApp as a data steward, and what George Kurian said, our CEO, that data is the currency of today actually, right? It's really fundamentally what drives business value, it's the data. But there's one little slight attribute change that I'd like to add to that, and that it's a perishable commodity. It has a certain value at T-sub zero when you first get it. And especially true when you're trying to do machine learning and you're trying to learn new events and new things, but it rapidly degrades and becomes less valuable. You still need to keep it because it's historical and if we forget historical data, we're doomed to repeat mistakes. So you need to keep it and you have to be a good steward. And that's where we come into play with our technologies. Because we have a portfolio of different kinds of products and management capabilities that move the data where it needs to be, whether you're in the cloud, whether you're near the cloud, like in an Equinox colo, or even on prem. And the key attribute there, and especially in automotive they want to keep the data forever because of liability, because of intellectual property and privacy concerns. >> Hold on, one quick question on that. 'Cause I think you bring up a good point. The perishability's interesting because realtime, we see this now, bashing in realtime is the buzzword in the industry, but you're talking about something that's really important. That the value of the data when you get it fast, in context, is super important. But then the historical piece where you store it also plays into the machine learning dynamics of how deep learning and machine learning has to use the historical perspective. So in a way, it's perishable in the realtime piece in the moment. If you're a self-driving car you want the data in milliseconds 'cause it's important, but then again, the historical data will then come back. Is that kind of where you're getting at with that? >> Yeah, because the way that these systems operate is the paradigm is like deep learning. You want them to learn the way a human learns, right? The only reason we walk on our feet is 'cause we fell down a lot. But we remember falling down, we remember how we got up and could walk. So if you don't have the historical context, you're just always falling down, right? So you have to have that to build up the proper machine learning neural network, the kind of connections you need to do the right things. And then as you get new data and varieties of data, and I'll stick with automotive, because it can almost be thought of as an intractable amount of data. Because most people will keep cars for measured in decades. The quality of the car is incredible now, and they're all just loaded with sensors, right? High definition cameras, radars, GPS tracking. And you want to make sure you get improvements there because you have liability issues coming as well with these same technologies, so. >> Yeah, so we talk about the perishability of the data, that's a given. What is less perishable, it seems to me and Wikibon, is that what you derive from the data, the correlations, the patterns, the predictive models, the meat of machine learning and deep learning, AI in general, is less perishable in the sense that it has a validity over time. What are your thoughts at NetApp about how those data derived assets should be stored, should be managed for backup and recovery and protected? To what extent do those requirements need to be reflected in your storage retention policies if you're an enterprise doing this? >> That's a great question. So I think what we find is that that first landing zone, and everybody talks about that being the cloud. And for me it's a cloudy day, although in New York today it's not. There are lots of clouds and there are lots of other things that come with that data like GDPR and privacy, and what are you allowed to store, what are you allowed to keep? And how do you distinguish one from the other? That's one part. But then you're going to have to ETL it, you're going to have to transform that data. Because like everything, there's a lot of noise. And the noise is really fundamentally not that important. It's those anomalies within the stream of noise that you need to capture. And then use that as your training data, right? So that you learn from it. So there's a lot of processing, I think, that's going to have to happen in the cloud regardless of what cloud, and it has to be kind of ubiquitous in every cloud. And then from there you decide, how am I going to curate the data and move it? And then how am I going to monetize the data? Because that's another part of the equation, and what can I monetize? >> Well that's a question that we hear a lot on theCUBE. On day one we were ripping at some of the concepts that we see, and certainly we talk to enterprise customers. Whether it's a CIO, CVO, chief data officer, chief security officer. There's a huge application development going on in the enterprise right now. You see the opensource booming. This huge security practice is being built up and then it's got this governance with the data. Overlay that with IOT, it's kind of an architectural, I don't want to say reset, but a retrenching for a lot of enterprises. So the question I have for you guys as a critical part of the infrastructure of storage, storage isn't going away, there's no doubt about that, but now the architecture's changing. How are you guys advising your customers? What's your position on when you come into CXO and you give a talk and I said, hey, Gus, the house is on fire, we got so much going on. Bottom line me, what's the architecture? What's best for me, but don't lose the headroom. I need to have some headroom to grow, that's where I see some machine learning, what do I do? >> I think you have to embrace the cloud, and that's one of the key attributes that NetApp brings to the table. We have our core software, our ONTAP software, is in the cloud now. And for us, we want to make sure we make it very easy for our customers to both be in the cloud, be very protected in the cloud with encryption and protection of the data, and also get the scale and all of the benefits of the cloud. But on top of that, we want to make it easy for them to move it wherever they want it to be as well. So for us it's all about the data mobility and the fact that we want to become that data steward, that data engine that helps them drive to where they get the best business value. >> So it's going to be on prem, on cloud. 'Cause I know just for the record, you guys if not the earliest, one of the earliest in with AWS, when it wasn't fashionable. I interviewed you guys on that many years ago. >> And let me ask a related question. What is NetApp's position, or your personal thinking, on what data should be persisted closer to the edge in the new generation of IOT devices? So IOT, edge devices, they do inference, they do actuation and sensing, but they also do persistence. Now should any data be persisted there longterm as part of your overall storage strategy, if you're an enterprise? >> It could be. The question is durability, and what's the impact if for some reason that edge was damaged, destroyed or the data lost. So a lot of times when we start talking about opensource, one of the key attributes we always have to take into account is data durability. And traditionally it's been done through replication. To me that's a very inefficient way to do it, but you have to protect the data. Because it's like if you've got 20 bucks in your wallet, you don't want to lose it, right? You might split it into two 10s, but you still have 20, right? You want that durability and if it has that intrinsic value, you've got to take care of it and be a good steward. So if it's in the edge, it doesn't mean that's the only place it's going to be. It might be in the edge because you need it there. Maybe you need what I call reflexive actions. This is like when a car is well, you have deep learning and machine learning and vision and GPS tracking and all these things there, and how it can stay in the lane and drive, but the sensors themself that are coming from Delphi and Bosch and ZF and all of these companies, they also have to have this capability of being what I call a reflex, right? The reason we can blink and not get a stone in our eye is not because it went to our cerebral cortex. Because it went to the nerve stem and it triggered the blink. >> Yeah, it's cache. And you have to do the same thing in a lot of these environments. So autonomous vehicles is one. It could be using facial recognition for restricting access to a gate. And all the sudden this guy's on a blacklist, and you've stopped the gate. >> Before we get into some of the product questions I have for you, Hadoop in-place analytics, as well as some of the regulations around GDPR, to end the trend segment here is what's your thoughts on decentralization? You see a lot of decentralized apps coming out, you see blockchain getting a lot of traction. Obviously that's a tell sign, certainly in the headroom category of what may be coming down. Not really on the agenda for most enterprises today, but it does kind of indicate that the wave is coming for a lot more decentralization on top of distributed computing and storage. So how do you look at that, as someone who's out on the cutting edge? >> For me it's just yet another industry trend where you have to embrace it. I'm constantly astonished at the people who are trying to push back from things that are coming. To think that they're going to stop the train that's going to run 'em over. And the key is how can we make even those trends better, more reliable, and do the right thing for them? Because if we're the trusted advisor for our customers, regardless of whether or not I'm going to sell a lot of storage to them, I'm going to be the person they're going to trust to give 'em good advice as things change, 'cause that's the one thing that's absolutely coming is change. And oftentimes when you lock yourself into these quote, commodity approaches with a lot of internal storage and a lot of these things, the counterpart to that is that you've also locked yourself in probably for two to four years now, in a technology that you can't be agile with. And this is one of the key attributes for the in-place analytics that we do with our ONTAP product and we also have our E series product that's been around for six plus years in the space, is the defacto performance leader in the space, even. And by decoupling that storage, in some cases very little but it's still connected to the data node, and in other cases where it's shared like an NFS share, that decoupling has enormous benefits from an agility perspective. And that's the key. >> That kind of ties up with the blockchain thing as kind of a tell sign, but you mentioned the in-place analytics. That decoupling gives you a lot more cohesiveness, if you will, in each area. But tying 'em together's critical. How do you guys do that? What's the key feature? Because that's compelling for someone, they want agility. Certainly DevOps' infrastructure code, that's going mainstream, you're seeing that now. That's clearly cloud operation, whatever you want to call it, on prem, off prem. Cloud ops is here. This is a key part of it, what's the unique features of why that works so well? >> Well, some of the unique features we have, so if we look at your portfolio products, so I'll stick with the ONTAP product. One of the key things we have there is the ability to have incredible speed with our AFF product, but we can also Dedoop it, we can clone it, and snapshot it, snapshotting it into, for example, NPS or NetApp Private Storage, which is in Equinox. And now all the sudden I can now choose to go to Amazon, or I can go to Azure, I can go to Google, I can go to SoftLayer. It gives me options as a customer to use whoever has got the best computational engine. Versus I'm stuck there. I can now do what's right for my business. And I also have a DR strategy that's quite elegant. But there's one really unique attribute too, and that's the cloning. So a lot of my big customers have 1000 plus node traditional Hadoop clusters, but it's nearly impossible for them to set up a test DEV environment with production data without having an enormous cost. But if I put it in my ONTAP, I can clone that. I can make hundreds of clones very efficiently. >> That gets the cost of ownership down, but more importantly gets the speed to getting Sandboxes up and running. >> And the Sandboxes are using true production data so that you don't have to worry about oh, I didn't have it in my test set, and now I have a bug. >> A lot of guys are losing budget because they just can't prove it and they can't get it working, it's too clunky. All right, cool, I want to get one more thing in before we run out of time. The role of machine learning we talked about, that's super important. Algorithms are going to be here, it's going to be a big part of it, but as you look at that policy, where the foundational policy governance thing is huge. So you're seeing GDPR, I want to get your comments on the impact of GDPR. But in addition to GDPR, there's going to be another Equifax coming, they're out there, right? It's inevitable. So as someone who's got code out there, writing algorithms, using machine learning, I don't want to rewrite my code based upon some new policy that might come in tomorrow. So GDPR is one we're seeing that you guys are heavily involved in. But there might be another policy I might want to change, but I don't want to rewrite my software. How should a CXO think about that dynamic? Not rewriting code if a new governance policy comes in, and then the GDPR's obvious. >> I don't think you can be so rigid to say that you don't want to rewrite code, but you want to build on what you have. So how can I expand what I already have as a product, let's say, to accommodate these changes? Because again, it's one of those trains. You're not going to stop it. So GDPR, again, it's one of these disruptive regulations that's coming out of EMEA. But what we forget is that it has far reaching implications even in the United States. Because of their ability to reach into basically the company's pocket and fine them for violations. >> So what's the impact of the Big Data system on GDPR? >> It can potentially be huge. The key attribute there is you have to start when you're building your data lakes, when you're building these things, you always have to make sure that you're taking into account anonymizing personal identifying information or obfuscating it in some way, but it's like with everything, you're only as strong as your weakest link. And this is again where NetApp plays a really powerful role because in our storage products, we actually can encrypt the data at rest, at wire speed. So it's part of that chain. So you have to make sure that all of the parts are doing that because if you have data at rest in a drive, let's say, that's inside your server, it doesn't take a lot to beat the heck out of it and find the data that's in there if it's not encrypted. >> Let me ask you a quick question before we wrap up. So how does NetApp incorporate ML or AI into these kinds of protections that you offer to customers? >> Well for us it's, again, we're only as successful as our customers are, and what NetApp does as a company, we'll just call us the data stewards, that's part of the puzzle, but we have to build a team to be successful. So when I travel around the world, the only reason a customer is successful is because they did it with a team. Nobody does it on an island, nobody does it by themself, although a lot of times they think they can. So it's not just us, it's our server vendors that work with us, it's the other layers that go on top of it, companies like Zaloni or BlueData and BlueTalon, people we've partnered with that are providing solutions to help drive this for our customers. >> Gus, great to have you on theCUBE. Looking forward to next week. I know you're super busy at NetApp InSight. I know you got like five major talks you're doing but if we can get some time I think you'd be great. My final question, a personal one. We were talking that you're a search and rescue in Tahoe in case there's an avalanche, a lost skier. A lot of enterprises feel lost right now. So you kind of come in a lot and the avalanche is coming, the waves or whatever are coming, so you probably seen situations. You don't need to name names, but talk about what should someone do if they're lost? You come in, you can do a lot of consulting. What's the best advice you could give someone? A lot of CXOs and CEOs, their heads are spinning right now. There's so much on the table, so much to do, they got to prioritize. >> It's a great question. And here's the one thing is don't try to boil the ocean. You got to be hyper-focused. If you're not seeing a return on investment within 90 days of setting up your data lake, something's going wrong. Either the scope of what you're trying to do is too large, or you haven't identified the use case that will give you an immediate ROI. There should be no hesitation to going down this path, but you got to do it in a manner where you're tackling the biggest problems that have the best hit value for you. Whether it's ETLing goes into your plan of record systems, your enterprise data warehouses, you got to get started, but you want to make sure you have measurable, tangible success within 90 days. And if you don't, you have to reset and say okay, why is that not happening? Am I reinventing the wheel because my consultant said I have to write all this SCOOP and Flume code and get the data in? Or maybe I should have chosen another company to be a partner that's done this 1000 times. And it's not a science experiment. We got to move away from science experiment to solving business problems. >> Well science experiments and boiling of the ocean is don't try to overreach, build a foundational building block. >> The successful guys are the ones who are very disciplined and they want to see results. >> Some call it baby steps, some call it building blocks, but ultimately the foundation right now is critical. >> Gus: Yeah. >> All right, Gus, thanks for coming on theCUBE. Great day, great to chat with you. Great conversation about machine learning impact to organizations. theCUBE bringing you the data here live in Manhattan. I'm John Furrier, Jim Kobielus with Wikibon. More after this short break. We'll be right back. (digital music) (synthesizer music)

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media and spending the time with us. But also educating a lot of folks on the impact And the only way you can do that is with data, the folks watching that you were involved with AI and management capabilities that move the data That the value of the data when you get it fast, the kind of connections you need to do the right things. is that what you derive from the data, and everybody talks about that being the cloud. So the question I have for you guys and the fact that we want to become that data steward, one of the earliest in with AWS, when it wasn't fashionable. in the new generation of IOT devices? it doesn't mean that's the only place it's going to be. And you have to do the same thing but it does kind of indicate that the wave is coming And the key is how can we make even those trends better, What's the key feature? And now all the sudden I can now choose to go to Amazon, but more importantly gets the speed so that you don't have to worry about oh, But in addition to GDPR, there's going to be another Equifax to say that you don't want to rewrite code, and find the data that's in there if it's not encrypted. into these kinds of protections that you offer to customers? that's part of the puzzle, but we have to build a team What's the best advice you could give someone? Either the scope of what you're trying to do Well science experiments and boiling of the ocean The successful guys are the ones who are very disciplined but ultimately the foundation right now is critical. Great day, great to chat with you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John	PERSON	0.99+
Gus Horn	PERSON	0.99+
BlueTalon	ORGANIZATION	0.99+
George Kurian	PERSON	0.99+
BlueData	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Germany	LOCATION	0.99+
two	QUANTITY	0.99+
Zaloni	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
Tahoe	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
1000 times	QUANTITY	0.99+
New York City	LOCATION	0.99+
20 bucks	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Delphi	ORGANIZATION	0.99+
Vegas	LOCATION	0.99+
20	QUANTITY	0.99+
New York	LOCATION	0.99+
Gus	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
2010	DATE	0.99+
Amazon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
United States	LOCATION	0.99+
ZF	ORGANIZATION	0.99+
90 days	QUANTITY	0.99+
GDPR	TITLE	0.99+
next week	DATE	0.99+
today	DATE	0.99+
NetApp	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
four years	QUANTITY	0.99+
Eight years	QUANTITY	0.99+
tomorrow	DATE	0.98+
hundreds of clones	QUANTITY	0.98+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
NYC	LOCATION	0.98+
One	QUANTITY	0.98+
one part	QUANTITY	0.97+
Big Data	EVENT	0.97+
Wikibon	ORGANIZATION	0.97+
one camp	QUANTITY	0.96+
NetApp	TITLE	0.96+
Strata Data	EVENT	0.96+
NetApp	EVENT	0.96+
late '70s	DATE	0.96+
six plus years	QUANTITY	0.95+
Midtown Manhattan	LOCATION	0.95+
Hadoop Strata Conference	EVENT	0.95+
Equinox	ORGANIZATION	0.95+
one thing	QUANTITY	0.94+
Strata Hadoop	EVENT	0.94+
one more thing	QUANTITY	0.94+
one quick question	QUANTITY	0.93+
1000 plus	QUANTITY	0.92+
DGX	COMMERCIAL_ITEM	0.91+

Sergei Rabotai, InData Labs | Big Data NYC 2017

>> Live from Midtown Manhattan, it's the CUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Fifth year of coverage of our own event Big Data NYC where we cover all the action in New York City. For this week in big data, in conjunction with Strata Data which was originally Hadoop World in 2010. We've been covering it for eight years. It became Strata Conference, Strata Hadoop, now called Strata Data. Will probably called Strata AI tomorrow. Who knows, but certainly the trends are going in that direction. I'm John Furrier, your co-host. Our next guest here in New York City is Sergei Rabotai, who is the Head of Business Development at InData Labs from Belarus. In town, doing some biz dev in the big data ecosystem. Welcome to theCUBE. >> Yeah. Good morning. >> Great to have you. So, obviously Belarus is becoming known as the Silicon Valley of Eastern Europe. A lot of great talent. We're seeing that really explode. A lot of great stuff going on globally, even though there's a lot of stuff, you know GDPR and all these other things happening. It's clearly a global economy with tech. Silicon Valley still is magical. I live there in Palo Alto but you're starting to see peering points within these ecosystems of entrepreneurship and now big companies are taking advantage of it as well. What do you guys do? I mean you're in the middle of that. What is InData Labs do in context of all this? >> Well, InData Labs is a full stack data science company. Which means that we provide professional services for data strategy, big data engineering and the data science. So, yeah, like you just said, we are based - my team is based in Minsk, Belarus. We are about 40 people strong at the moment. And in our recent years we have been very successful starting this business and we have been getting customers from all over the world, including United States, Great Britain, and European Union. The company was launched about four years ago and very important thing, that it was launched by two tech leaders who come from very data-driven industries. Our CEO, Ilya Kirillov, has been running several EdTech companies for many years. Our second founder, Marat Karpeko, has been holding C-Level positions in one of the most successful gaming companies in the world. >> John: So they know data. They're data guys. >> Yeah they're data guys. They know data from different aspects and that brings synergy to our business. >> You guys bring that expertise now into professional services for us. Give me an example of some of the things someone might want to call you up on, because the thing we're hearing here in New York City this week is look, we need more data sciences and they got to be more productive. They're spending way too much time wrangling and doing stuff that they shouldn't be doing. In the old days, sysadmins were built to let people be productive and they ran the infrastructure. That's not what data scientists should be doing. They're the users. There's a level of setting things up and then there's a level of provisioning, it's actually data assets, but then the data scientists just want to do their job. How do you help companies do that? >> Well I would probably, if I take all of our activities, I would split them into two big parts. First of all, we are helping big companies, who already have a lot of data. We help them in managing this data more effectively. We help them with predictive analytics. We help them with, helping them build the churn prediction and user segmentation solutions. We have been recently involved into several natural language processing projects. In one of our successful key studies we helped one of the largest gaming companies to automate their customer feedback processing. So, like, a couple years ago they were working manually with their customer feedback and we built them a tool that allows them to instantly get the sentiment of what the user says. It's kind of like a voice of a customer, which means they can be more effective in developing new things for their games. So, we-- >> So what would someone engage? I'm just trying to peg a order of magnitude of the levels of engagements you do. Startups come in? Is it big companies? What kind of size scoped work do you do? >> So I would say at the moment we work with startups, but it's a bit of a different approach than we have with big or well-established companies. When startups typically approach us with asking to help them implement some brand new technologies like neural networks or deep learning. So they want to be effective from the start. They want to use the cutting edge technology to be more attractive, to provide a better value on the market and just to be effective and to be a successful business from the start. The other part, the well-established companies, who already have the data but they understand that so far their data might not be used that effectively as it should have been used. Therefore, they approach us with a request to help them to get more insights out of the data. Let's say, implement some machine learning that can help them. >> How about larger companies? What kind of projects do you work for them? >> It could be a typical project like churn prediction, that is very actual for the companies who have got a lot of customer data. Then it could be companies from such industries like betting industry, where churn is a very big issue. And, the same probably applies to companies who do trading. >> So is scale one of the things you differentiate around? It sounds like your founders have an EdTech background obviously must be a larger, large data set. Is your profile of engagements large scale? Is it ... I'm just trying to get a handle of if someone's watching who, what is the kind of engagements people should be calling you for? Give us an example of that. >> Like, let's say there is a company who has got a lot of customer data, has got some products and they have a problem of churn, or they have a problem of segmenting their customers so they can later address the specific segments of the customers with the right offers at the right time and through the right marketing channel. Then it could be customers or requests where natural text processing is required where we have to automate some understanding of the written or spoken text. Then I should say that we have been getting recently some requests where computer vision skills are required. I think the first stage of AI being really intelligent was the speech recognition and I think nowadays we manage to reach to the level of what we earlier saw in fantastic movies or sci-fi movies. Computer vision is going to be the next leap in all that AI buzz we're having at the moment. >> So you solve, the problem that you solve for customers is data problems. If they're swimming in a lot of data, you can help them. >> Sergei: Yep. >> If they actually want to make that data do things that are cutting edge, you guys can help them. >> Sergei: Yeah. That's-- >> Alright, so here's a question for you. I mean, Belarus has obviously got good things going on. I've heard the press that you guys have been getting, the whole area, and you guys in particular. So I'm a buyer, one of the questions I might ask is "Hey Sergei, how do I know that you'll keep that talent because the churn is always a big problem. I've dealt with outsourcing before and in the US it's hard to keep talent but I've heard there's a churn." How do you guys keep the talent in the country? How do you keep talent on the projects? Is there certain economic rules over there? What's happening in Belarus? Give us the economical. >> Yeah, so, basically what you're saying. The churn problem has always been known for companies who have their development teams in Asian regions. That's a known problem because I have a lot of meetings with clients in the UK and the US, potential prospects, I would say. So they say it is a problem for them. With Belarus, I don't think we have that because from what I know, we have an average churn of under 10 percent. That's the figures across the industry. In smaller companies, the churn is even less and there are specific reasons for that. First of all, that due to Belarusian mentality, we always try to keep to a job that we're having. Yeah? So we do not-- >> John: That's a cultural thing. >> That's just the cultural thing. We do not ... >> You honor, you honor a code, if you will. >> Yeah. >> Okay. >> So, that's one of the things. Another thing is that Belarusian IT industry is very small. We have, I would say, no more than 40 thousand people being involved in different IT companies. The community is very small, so if somebody is hopping jobs from one job to another, it is going to be known and this person is not likely to have like, a good career. >> So job hoppers is kind of like a code of community, honor. Silicon Valley works that way too, by the way. >> Yeah. >> You get identified, that's who you are. >> Yeah. And so nowadays-- >> Economic tax breaks going on over there? What's the government to get involved? >> One of the key things is, the special tax and legal regulations that Belarus has got at the moment. I can definitely say that there is no country in the world that has got the same tax preferences, and the same support from the government. If a Belarusian company, IT company, becomes a part of Belarusian High Tech Park it means the company becomes automatically exempt from BET tax, corporate income tax. The employees of that company having the reliefs on their income, personal income tax rate, and there are a lot more reliefs that make the talent stay in the country. Having this relief for the IT business allows the companies to provide better working conditions for the employees and stop the people from migrating to other parts of the world. That's what we have. >> Sort of created an environment where there's not a lot of migration out of the area. The tech community kind of does it's own policing of behavior for innovation. >> Yeah but I think before those initiatives were adopted there was a certain percentage of people migrating but I think that nowadays even if it happens, yes, you're right, it's not that substantial. >> Great. Tell us ... Great overview of the company and congratulations, it's a good opportunity for folks watching to explore new areas of talent, especially ones that have the work ethic and knowledge you guys have over there. New York here, there's codes here too. Get the job done. Be on time. What's your experience like in New York here? What's your goal this week? What's some of the meetings you're having? Share with the folks kind of your game plan for Big Data NYC. >> Well, yeah, I've really enjoyed my stay here. It, so far, has been a very enjoyable experience. From the business perspective, I had over 10 meetings with the prospective customers. And we are likely to have follow-ups coming in the next couple of weeks. I can definitely say there is a great demand for professional services. You can see that if you go to whichever center you can see there's a lot of jobs being posted on the job boards. It means that there is lack of knowledge here in the US, yeah? One more important thing that I wanted to share with you from my personal observations that USA, UK and maybe Nordic countries, they have very, very strong background for creating the business ideas but Eastern Europe or Eastern European countries and Belarus in particular, they are very strong in actually implementing those ideas. >> Building them. >> Yes, building them. I think we have lots of synergies and we can ... we can ... >> John: Great. >> We can work together. I also got some meetings with our existing customers here in the US and so far we had good experiences. I can see that New York is moving fast. I travel a lot. I've been to over 40 countries in the previous five years and I just ... New York is different. >> It's fun. >> Different. Even different from many other cities in the US. >> Lot of banks are here. Lot of business in New York. New York is a great town. Love New York City. It's one of my favorites. Love coming here as I grew up right across the river in New Jersey. >> Yeah. But, great town, obviously California, Palo Alto, >> Yeah. >> Is a little more softer in terms of weather, but they have a culture there too. Sounds a lot like what's going on in Belarus, so congratulations. If we get some business for you, should we give them theCUBE discount, tell them John sent you and you get 10 percent off? Alright? >> Alright, yes. Sounds great. We can make it a good deal. (laughter) >> Tell them John sent you, you get 10% off. No I'm only kidding because it's services. Congratulations. Final question. What's the number one thing that people are buying for service from you guys? Number one thing. What's the most requested service you provide? >> The most requested services ... First of all, many customers they understand that they have got a lot of data. They want to do something with their data. But before you actually do some implementation you have to do a lot of discovery or preparatory work. I would say, no matter how we end up with a customer, this stage is basically ... The idea of that stage is to identify the ways data science can be implemented and can provide benefits to the business. That's the most important. I think that, like, 95 percent of the customers they approach us with this thing in the first place. And based on the results of that preparatory stage we can then advise the customers. What can they do? Or how they can actually benefit from the existing data? Or what other things they should collect in order to make their business more effective. >> Sergei, thanks for coming on. Belarus has got a lot of builders there. Check 'em out. >> Thanks a lot. >> Builders are critical in this new world. Lots of them with clout, a lot of great opportunities. A lot of builders in Belarus. This is theCUBE, bringing you all the action from New York City. More after this short break. We'll be right back. (theme music) (no audio) >> Hi, I'm John Furrier, the co-founder of SiliconANGLE Media and co-host of theCUBE. I've been in the tech ...

Published Date : Sep 28 2017

SUMMARY :

Live from Midtown Manhattan, it's the CUBE. in the big data ecosystem. a lot of stuff, you know GDPR and all gaming companies in the world. John: So they know data. different aspects and that brings synergy to our business. Give me an example of some of the things one of the largest gaming companies to automate What kind of size scoped work do you do? on the market and just to be effective and to be And, the same probably applies to companies who do trading. So is scale one of the things you differentiate around? can later address the specific segments of the in a lot of data, you can help them. do things that are cutting edge, you guys can help them. the whole area, and you guys in particular. First of all, that due to Belarusian mentality, That's just the cultural thing. So, that's one of the things. by the way. The employees of that company having the reliefs Sort of created an environment where adopted there was a certain percentage of people especially ones that have the work ethic in the next couple of weeks. I think we have lots of synergies here in the US and so far we had good experiences. in the US. Lot of business in New York. Yeah. and you get 10 percent off? We can make it a good deal. What's the most requested service you provide? The idea of that stage is to identify the ways a lot of builders there. Lots of them with clout, a lot of great opportunities. I've been in the tech ...

ENTITIES

Entity	Category	Confidence
Sergei Rabotai	PERSON	0.99+
Ilya Kirillov	PERSON	0.99+
John	PERSON	0.99+
Belarus	LOCATION	0.99+
Marat Karpeko	PERSON	0.99+
Sergei	PERSON	0.99+
InData Labs	ORGANIZATION	0.99+
US	LOCATION	0.99+
New York City	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
10 percent	QUANTITY	0.99+
UK	LOCATION	0.99+
New York	LOCATION	0.99+
New Jersey	LOCATION	0.99+
Fifth year	QUANTITY	0.99+
eight years	QUANTITY	0.99+
95 percent	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
10%	QUANTITY	0.99+
USA	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two tech leaders	QUANTITY	0.99+
Great Britain	LOCATION	0.99+
NYC	LOCATION	0.99+
United States	LOCATION	0.99+
2010	DATE	0.99+
under 10 percent	QUANTITY	0.98+
this week	DATE	0.98+
Eastern Europe	LOCATION	0.98+
two big parts	QUANTITY	0.98+
first stage	QUANTITY	0.98+
over 40 countries	QUANTITY	0.98+
over 10 meetings	QUANTITY	0.98+
First	QUANTITY	0.98+
One	QUANTITY	0.97+
first	QUANTITY	0.97+
second founder	QUANTITY	0.97+
European Union	LOCATION	0.97+
one job	QUANTITY	0.97+
tomorrow	DATE	0.97+
Asian	LOCATION	0.96+
Strata Conference	EVENT	0.96+
about 40 people	QUANTITY	0.95+
Big Data	EVENT	0.95+
Nordic	LOCATION	0.93+
EdTech	ORGANIZATION	0.93+
Strata Data	TITLE	0.91+

Arun Murthy, Hortonworks | BigData NYC 2017

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

ENTITIES

Entity	Category	Confidence
Asia	LOCATION	0.99+
France	LOCATION	0.99+
Arun	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Germany	LOCATION	0.99+
Arun Murthy	PERSON	0.99+
Japan	LOCATION	0.99+
NVIDIA	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
2014	DATE	0.99+
California	LOCATION	0.99+
12	QUANTITY	0.99+
five	QUANTITY	0.99+
Frank Quattrone	PERSON	0.99+
three	QUANTITY	0.99+
two	QUANTITY	0.99+
Onyara	ORGANIZATION	0.99+
$64 million	QUANTITY	0.99+
Microsoft	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Morgan Stanley	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
One	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
20	QUANTITY	0.99+
one	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
three days	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
yesterday	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
Equifax	ORGANIZATION	0.99+
next year	DATE	0.99+
NYC	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
second part	QUANTITY	0.99+
both	QUANTITY	0.99+
Ranger	ORGANIZATION	0.99+
50	QUANTITY	0.98+
30	QUANTITY	0.98+
Yahoo	ORGANIZATION	0.98+
Strata Conference	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
Hadoop	TITLE	0.98+
'15	DATE	0.97+
20 years ago	DATE	0.97+
Forrester	ORGANIZATION	0.97+
GDPR	TITLE	0.97+
second one	QUANTITY	0.97+
one data center	QUANTITY	0.97+
Github	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.96+
three ways	QUANTITY	0.96+
Manhattan	LOCATION	0.95+
day two	QUANTITY	0.95+
this week	DATE	0.95+
NiFi	ORGANIZATION	0.94+
Dataplane	ORGANIZATION	0.94+
BigData	ORGANIZATION	0.94+
Hadoop World	EVENT	0.93+
billions	QUANTITY	0.93+

Amit Walia, Informatica | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay welcome back everyone, live here in New York City it's theCUBE's coverage of Big Data NYC. It's our event we've been doing for five years in conjunction with Strata Hadoop now called Strata Data right around the corner, separate place. Every year we get the best voices tech. Thought leaders, CEO's, executives, entrepreneurs anyone who's bringing the signal, we share that with you. I'm John Furrier, the co-host of theCUBE. Eight years covering Big Data, since 2010, the original Hadoop world. I'm here with Amit Walia, who's the Executive Vice President, Chief Product Officer for Informatica. Welcome back, good to see you. >> Good to be here John. >> theCUBE alumni, always great to have you on. Love product we had everyone on from Hortonworks. >> I just saw that. >> Product guys are great, can share the road map and kind of connect the dots. As Chief Product Officer, you have to have a 20 mile stare into the future. You got to know what the landscape is today, where it's going to be tomorrow. So I got to ask you, where's it going to be tomorrow? It seems that the rubber's hit the road, real value has to be produced. The hype of AI is out there, which I love by the way. People can see through that but they get it's good. Where's the value today? That's what customers want to know. I got hybrid cloud on the table, I got a lot of security concerns. Governance is a huge problem. The European regulations are coming over the top. I don't have time to do IoT and these other things, or do I? I mean this is a lot of challenges but how do you see it playing out? >> I think, to be candid, it's the best of times. The changing times are the best of times because people can experiment. I would say if you step back and take a look, we've been talking for such a long time. If there was any time, where forget the technology jargon of infrastructure, cloud, IoT, data has become the currency for every enterprise right? Everybody wants data. I say like you know, business users want today's data yesterday to make a decision tomorrow. IT has always been in the business of data, everybody wants more data. But the point you're making is that while that has become more relevant to an enterprise, it brings into the lot of other things, GDPR, it brings governance, it brings security issues, I mean hybrid clouds, some data on-prem, some data on cloud but in essence, what I think every company has realized that they will live and die by how well do they predict the future with the data they have on all their customers, products, whatever it is, and that's the new normal. >> Well hate to say it, admit pat myself on the back, but we in theCUBE team and Wikibon saw this early. You guys did too, and I want to bring up a comment we've talked about a couple of years ago. One, you guys were in the data business, Informatica. You guys went private but that was an early indicator of the trend that everyone's going private now. And that's a signal. For the first time, private equity finance have had trumped bigger venture capital asset class financing. Which is a signal that the waves are coming. We're surfing these little waves right now, we think they're big but they big ones are coming. The indicator is everyone's retrenching. Private equity's a sign of undervaluation. They want to actually also transform maybe some of the product engineering side of it or go to market. Basically get the new surfboard. >> Yeah. >> For the big waves. >> I mean that was the premise for us too because we saw as we were chatting right. We knew the new world, which was going towards predictive analytics or AI. See data is the richest thing for AI to be applied to but the thing is that it requires some heavy lifting. In fact that was our thesis, that as we went private, look we can double down on things like cloud. Invest truly for the next four years which being in public markets sometimes is hard. So we step back and look where we are as you were acting from my cover today. Our big believers look, there's so much data, so many varying architecture, so many different places. People are in Azure, or AWS, on-prem, by the way, still on mainframe. That hasn't gone away, you go back to the large customers. But ultimately when you talk about the biggest, I would say the new normal, which is AI, which clearly has been overtalked about but in my opinion has been barely touched because the biggest application of machine learning is on data. And that predicts things, whether you want to predict forecasting, or you predict something will come down or you can predict, and that's what we believe is where the world is going to go and that's what we double down on with our Claire technology. Just go deep, bring AI to data across the enterprise. >> We got to give you guys props, you guys are right on the line. I got to say as a product person myself, I see you guys executing great strategy, you've been very complimentary to your team, think you're doing a great job. Let's get back to AI. I think if you look at the hype cycles of things, IoT certainly has, still think there's a lot more hype to have there, there's so much more to do there. Cloud was overhyped, remember cloud washing? Pexus back in 2010-11, oh they're just cloud washing. Well that's a sign that ended up becoming what everyone was kind of hyping up. It did turn out. AI thinks the same thing. And I think it's real because you can almost connect the dots and be there but the reality is, is that it's just getting started. And so we had Rob Thomas from IBM on theCUBE and, you know we were talking. He made a comment, I want to get your reaction to, he said, "You can't have AI without IA." Information architecture. And you're in the information Informatica business you guys have been laying out an architecture specifically around governance. You guys kind of saw that early too. You can't just do AI, AI needs to be trained as data models. There's a lot of data involved that feeds AI. Who trains the machines that are doing the learning? So, you know, all these things come into play back to data. So what is the preferred information architecture, IA, that can power AI, artificial intelligence? >> I think it's a great question. I think of what typically, we recommend and we see large companies do look in the current complex architectures the companies are in. Hybrid cloud, multicloud, old architecture. By the way mainframe, client server, big data, you pick your favorite archit, everything exists for any enterprise right. People are not, companies are not going to move magically, everything to one place, to just start putting data in one place and start running some kind of AI on it. Our belief is that that will get organized around metadata. Metadata is data about data right? The organizing principle for any enterprise has to be around metadata. Leave your data wherever it is, organize your metadata, which is a much lighter footprint and then, that layer becomes the true central nervous system for your new next gen information architecture. That's the layer on which you apply machine learning too. So a great example is look, take GDPR. I mean GDPR is, if I'm a distributor, large companies have their GDPR. I mean who's touching my data? Where is my data coming from? Which database has sensitive data? All of these things are such complex problems. You will not move everything magically to one place. You will apply metadata approach to it and then machine learning starts to telling you gee I some anomaly detection. You see I'm seeing some data which does not have access to leave the geographical boundaries, of lets say Germany, going to, let's say UK. Those are kind of things that become a lot easier to solve once you go organize yourself at the metadata layer and that's the layer on which you apply AI. To me, that's the simplest way to describe as the organizing principle of what I call the data architecture or the information architecture for the next ten years. >> And that metadata, you guys saw that earlier, but how does that relate to these new things coming in because you know, one would argue that the ideal preferred infrastructure would be one that says hey no matter what next GDPR thing will happen, there'll be another Equifax that's going to happen, there'll be some sort of state sponsor cyber attack to the US, all these things are happening. I mean hell, all securities attacks are going up-- >> Security's a great example of that. We saw it four years ago you know, and we worked on a metadata driven approach to security. Look I've been on the security business however that's semantic myself. Security's a classic example of where it was all at the infrastructure layer, network, database, server. But the problem is that, it doesn't matter. Where is your database? In the cloud. Where is your network? I mean, do you run a data center anymore right? If I may, figuratively you don't. Ultimately, it's all about the data. The way at which we are going and we want more users like you and me access to data. So security has to be applied at the data layer. So in that context, I just talked about the whole metadata driven approach. Once you have the context of your data, you can apply governance to your data, you can apply security to your data, and as you keep adding new architectures, you do not have to create a paddle architecture you have to just append your metadata. So security, governance, hybrid cloud, all of those things become a lot easier for you, versus clearing one new architecture after another which you can never get to. >> Well people will be afraid of malware and these malicious attacks so auditing becomes now a big thing. If you look at the Equifax, it might take on, I have some data on that show that there was other action, they were fleeced out for weeks and months before the hack was even noticed. >> All this happens. >> I mean, they were ten times phished over even before it was discovered. They were inside, so audit trail would be interesting. >> Absolutely, I'll give you, typically, if you read any external report this is nothing tied to Equifax. It takes any enterprise three months minimum to figure out they're under attack. And now if a sophisticated attacker always goes to right away when they enter your enterprise, they're finding the weakest link. You're as secure as your weakest link in security. And they will go to some data trail that was left behind by some business user who moved onto the next big thing. But data was still flowing through that pipe. Or by the way, the biggest issue is inside our attack right? You will have somebody hack your or my credentials and they don't download like Snowden, a big fat document one day. They'll go drip by drip by drip by drip. You won't even know that. That again is an anomaly detection thing. >> Well it's going to get down to the firmware level. I mean look at the sophisticated hacks in China, they run their own DNS. They have certificates, they hack the iPhones. They make the phones and stuff, so you got to assume packing. But now, it's knowing what's going on and this is really the dynamic nature. So we're in the same page here. I'd love to do a security feature, come into the studio in our office at Palo Alto, think that's worthy. I just had a great cyber chat with Vidder, CTO of Vidder. Junaid is awesome, did some work with the government. But this brings up the question around big data. The landscape that we're in is fast and furious right now. You have big data being impacted by cloud because you have now unlimited compute, low latency storage, unlimited power source in that engine. Then you got the security paradigm. You could argue that that's going to slow things down maybe a little bit, but it also is going to change the face of big data. What is your reaction to the impact to security and cloud to big data? Because even though AI is the big talk of the show, what's really happening here at Strata Data is it's no longer a data show, it's a cloud and security show in my opinion. >> I mean cloud to me is everywhere. It was the, when Hadoop started it was on-prem but it's pretty much in the cloud and look at AWS and Azure, everyone runs natively there, so you're exactly right. To me what has happened is that, you're right, companies look at things two ways. If I'm experimenting, then I can look at it in a way where I'm not, I'm in dev mode. But you're right. As things are getting more operational and production then you have to worry about security and governance. So I don't think it's a matter of slowing down, it's a nature of the business where you can be fast and experiment on one side, but as you go prod, as you go real operational, you have to worry about controls, compliance and governance. By the way in that case-- >> And by the way you got to know what's going on, you got to know the flows. A data lake is a data lake, but you got the Niagara falls >> That's right. >> streaming content. >> Every, every customer of ours who's gone production they always want to understand full governance and lineage in the data flow. Because when I go talk to a regulator or I got talk to my CEO, you may have hundred people going at the data lake. I want to know who has access to it, if it's a production data lake, what are they doing, and by the way, what data is going in. The other one is, I mean walk around here. How much has changed? The world of big data or the wild wild west. Look at the amount of consolidation that has happened. I mean you see around the big distribution right? To me it's going to continue to happen because it's a nature of any new industry. I mean you looked at securities, cyber security big data, AI, you know, massive investment happens and then as customers want to truly go to scale they say look I can only bet on a few that can not only scale, but had the governance and compliance of what a large company wants. >> The waves are coming, there's no doubt about it. Okay so, let me get your reaction to end this segment. What's Informatica doing right now? I mean I've seen a whole lot 'cause we've cover you guys with the show and also we keep in touch, but I want you to spend a minute to talk about why you guys are better than what's out there on the floor. You have a different approach, why are customers working with you and if the folks aren't working with you yet, why should they work with Informatica? >> Our approach in a way has changed but not changed. We believe we operate in what we call the enterprise cloud data management. Our thing is look, we embrace open source. Open source, parks, parkstreaming, Kafka, you know, Hive, MapReduce, we support them all. To us, that's not where customers are spending their time. They're spending their time, once I got all that stuff, what can I do with it? If I'm truly building next gen predictive analytics platform I need some level of able to manage batch and streaming together. I want to make sure that it can scale. I want to make sure it has security, it has governance, it has compliance. So customers work with us to make sure that they can run a hybrid architecture. Whether it is cloud on-prem, whether it is traditional or big data or IoT, all in once place, it is scale-able and it has governance and compliance bricked into it. And then they also look for somebody that can provide true things like, not only data integration, quality, cataloging, all of those things, so when we working with large or small customers, whether you are in dev or prod, but ultimately helping you, what I call take you from an experiment stage to a large scale operational stage. You know, without batting an eyelid. That's the business we are in and in that case-- >> So you are in the business of operationalizing data for customers who want to add scale. >> Our belief is, we want to help our customers succeed. And customers will only succeed, not just by experimenting, but taking their experiments to production. So we have to think of the entire lifecycle of a customer. We cannot stop and say great for experiments, sorry don't go operational with us. >> So we've had a theme here in theCUBE this week called, I'm calling it, don't be a tool, and too many tools are out there right now. We call it the tool shed phenomenon. The tool shed phenomenon is customers aren't, they're tired of having too many tools and they bought a hammer a couple years ago that wants to try to be a lawn mower now and so you got to understand the nature of having great tooling, which you need which defines the work, but don't confuse a tool with a platform. And this is a huge issue because a lot of these companies that are flowing by wayside are groping for platforms. >> So there are customers tell us the same thing, which is why we-- >> But tools have to work in context. >> That's exactly, so that's why you heard, we talked about that for the last couple, it was the intelligent data platform. Customers don't buy a platform but all of our products, like are there microservices on our platform. Customers want to build the next gen data management platform, which is the intelligent data platform. A lot of little things are features or tools along the way but if I am a large bank, if I'm a large airline, and I want to go at scale operational, I can't stitch hundred tools and expect to run my IT shop from there. >> Yeah >> I can't I will never be able to do it. >> There's good tools out there that have a nice business model, lifestyle business or cashflow business, or even tools that are just highly focused and that's all they do and that's great. It's the guys who try to become something that they're not. It's hard, it's just too difficult. >> I think you have to-- >> The tool shed phenomenon is real. >> I think companies have to realize whether they are a feature. I always say are you a feature or are you a product? You have to realize the difference between the two and in between sits our tool. (John laughing) >> Well that quote came, the tool comment came from one of our chief data officers, that was kind of sparked the conversation but people buy a hammer, everything looks like a nail and you don't want to mow your lawn with a hammer, get a lawn mower right? Do the right tool for the job. But you have to platform, the data has to have a holistic view. >> That's exactly right. The intelligent data platform, that's what we call it. >> What's new with Informatica, what's going on? Give us a quick update, we'll end the segment with a quick update on Informatica. What do you got going on, what events are coming up? >> Well we just came off a very big release, we call it 10-2 which had lot of big data, hybrid cloud, AI and catalog and security and governance, all five of them. Big release, just came out and basically customers are adopting it. Which obviously was all centered around the things we talked in Informatica. Again, single platform, cloud, hybrid, big data, streaming and governance and compliance. And then right now, we are basically in the middle, after Informatica, we go on as barrage of tours across multiple cities across the globe so customers can meet us there. Paris is coming up, I was in London a few weeks ago. And then separately we're getting up for coming up, I will probably see you there at Amazon re:Invent. I mean we are obviously all-in partner for-- >> Do you have anything in China? >> China is a- >> Alibaba? >> We're working with them, I'll leave it there. >> We'll be in Alibaba in two weeks for their cloud event. >> Excellent. >> So theCUBE is breaking into China, CUBE China. We need some translators so if anyone out there wants to help us with our China blog. >> We'll be at Dreamforce. We were obviously, so you'll see us there. We were at Amazon Ignite, obviously very close to- >> re:Invent will be great. >> Yeah we will be there and Amazon obviously is a great partner and by the way a great customer of ours. >> Well congratulations, you guys are doing great, Informatica. Great to see the success. We'll see you at re:Invent and keep in touch. Amit Walia, the Executive Vice President, EVP, Chief Product Officer, Informatica. They get the platform game, they get the data game, check em out. It's theCUBE ending day two coverage. We've got a big event tonight. We're going to be streaming live our research that we are going to be rolling out here at Big Data NYC, our even that we're running in conjunction with Strata Data. They run their event, we run our event. Thanks for watching and stay tuned, stay with us. At five o'clock, live Wikibon coverage of their new research and then Party at Seven, which will not be filmed, that's when we're going to have some cocktails. I'm John Furrier, thanks for watching. Stay tuned. (techno music)

Published Date : Sep 28 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE. theCUBE alumni, always great to have you on. and kind of connect the dots. I say like you know, business users want today's data of the product engineering side of it or go to market. See data is the richest thing for AI to be applied to We got to give you guys props, and that's the layer on which you apply AI. And that metadata, you guys saw that earlier, and we want more users like you and me access to data. I have some data on that show that there was other action, I mean, they were if you read any external report I mean look at the sophisticated hacks in China, it's a nature of the business where you can be fast And by the way you got to know what's going on, I mean you see around the big distribution right? and if the folks aren't working with you yet, That's the business we are in and in that case-- So you are in the business of operationalizing data but taking their experiments to production. and so you got to understand the nature That's exactly, so that's why you heard, I will never be able to do it. It's the guys who try to become something that they're not. I always say are you a feature or are you a product? and you don't want to mow your lawn with a hammer, The intelligent data platform, that's what we call it. What do you got going on, what events are coming up? I will probably see you there at Amazon re:Invent. wants to help us with our China blog. We were obviously, so you'll see us there. is a great partner and by the way a great customer of ours. you guys are doing great, Informatica.

ENTITIES

Entity	Category	Confidence
Amit Walia	PERSON	0.99+
London	LOCATION	0.99+
Alibaba	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
China	LOCATION	0.99+
ten times	QUANTITY	0.99+
Informatica	ORGANIZATION	0.99+
John	PERSON	0.99+
Equifax	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
yesterday	DATE	0.99+
Rob Thomas	PERSON	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
hundred people	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
20 mile	QUANTITY	0.99+
three months	QUANTITY	0.99+
Paris	LOCATION	0.99+
today	DATE	0.99+
five	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
iPhones	COMMERCIAL_ITEM	0.99+
theCUBE	ORGANIZATION	0.99+
2010	DATE	0.99+
one side	QUANTITY	0.99+
UK	LOCATION	0.99+
Palo Alto	LOCATION	0.98+
Germany	LOCATION	0.98+
AWS	ORGANIZATION	0.98+
one	QUANTITY	0.98+
four years ago	DATE	0.98+
one place	QUANTITY	0.98+
Dreamforce	ORGANIZATION	0.98+
two ways	QUANTITY	0.98+
Eight years	QUANTITY	0.98+
Vidder	ORGANIZATION	0.98+
2010-11	DATE	0.98+
tonight	DATE	0.97+
GDPR	TITLE	0.97+
NYC	LOCATION	0.97+
Junaid	PERSON	0.97+
this week	DATE	0.97+
MapReduce	ORGANIZATION	0.96+
Pexus	ORGANIZATION	0.95+
One	QUANTITY	0.95+
two weeks	QUANTITY	0.95+
five o'clock	DATE	0.94+
first time	QUANTITY	0.94+
big	EVENT	0.94+
single platform	QUANTITY	0.92+
CTO	PERSON	0.92+
Strata Hadoop	ORGANIZATION	0.91+
Claire	ORGANIZATION	0.9+
Strata Data	ORGANIZATION	0.89+
US	LOCATION	0.88+

Chuck Yarbough, Pentaho | Big Data NYC 2017

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017 brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Hey, welcome back everyone live here in New York City it's theCUBE's special presentation Big Data NYC. This is our fifth year doing our own event here in New York City, our eighth year covering the Hadoop World ecosystem from the beginning. Through eight years, it's had a lot evolutions, Hadoop World, Strata Conference, Strata Hadoop, now it's called Strata Data happening right around the corner. We run our own event here, talk about thought leaders and the expert CEO's, entrepreneurs. Getting the data for you, sharing that with you. I'm John Furrier co-host theCUBE with my co-host here Jim Kobielus who's the Lead Analyst at Wikibon Big Data. And Chuck Yarbough who's the Vice President at Pentaho Solutions part of Hitachi's new Vantara. A new company created just announced last week. Hitachi in a variety of their portfolio technologies into a new company, out to bring in a lot of those integrated solutions. Chuck great to see you again, theCUBE alumni. We chatted multiple times at Pentaho World, going back 2015. >> Always he always great to be at theCUBE. >> What a couple of years it's been. Give us quickly hard news, it's pretty awesome you guys have a variety of things at Pentaho you know with Hitachi, that happened, now the market's evolved, what's this new entity, this new company they're bringing together? >> Yes, so the big news Hitachi Vantara. So what that is, two years ago Hitachi Data Systems acquired Pentaho and so fast forward two years. A new company gets created from Hitachi Data Systems. Pentaho, in a third organization at Hitachi called the Insight Group so Hitachi Insight Group. Those three groups come together to form Hitachi Vantara >> What's the motivation behind that. I mean, I go connect the dots but I want to hear your perspective because it really is about pulling things together. The trend this year the show is as Jim calls it, hybrid data, integrated data. Things seem to be coming together, is that part the purpose? What's the reason behind pulling this together? >> Yeah, I think there's a lot of reasons. One of them is what we're seeing not just in our own business, but in our customers business, and that is digital transformation. Right, this this need to evolve So Hitachi Vantara is all about data and analytics. And a big focus of what we do is what Pentaho's been doing for years which is driving in all kinds of data, big data, all data. I think we're getting on the cusp of closing out the big data term, but you know, it's all data right. >> Data everywhere, every application. >> And applying analytics across the board. One of the big initiatives, part of why Pentaho was originally acquired we were actually Hitachi Data Systems was a customer of Pentaho when we got acquired, so we we knew each other pretty well. And part of the reason for that acquisition was to drive analytics in around internet of things. The IoT space, which is something that Hitachi being a very large IT and operational technology, OT, company probably does as well as anybody if not better. >> So going back couple of years, I'm just looking at my notes here from our our video index. You visited theCUBE in 2015, but really the concepts have evolved significantly. I want to just highlight a few of them. What data warehouse optimizations, we talk about that. Data refinery concepts, 360 view as applied to big data. Again that was foundational concepts that all are in play right now. >> Absolutely. >> What is the update in those areas? Because refinery, everyone talks about data refinery, you know, oil, the easy oil example but I mean, come on, data is everywhere it is most important, you can use it multiple times unlike oil, as you were pointing out. >> So interesting you bring that up. So to me data refinery in a digital transformation really in an IoT world where lots of data is is streaming through in fact, yesterday I read something by IDC that 95% of all data in the future and the data growth is dramatic it's 10x what it is today in just a few years. 95% of the that growth of data's IoT related. The question is how are you using most of that, right, and what what are you going to do with it. So that data's is streaming through, there's a lot happening, we can do things at the edge, we can apply analytics and filtering and do things. But ultimately that data is going to land somewhere and that's where that refinery, think of it as the big data center refinery, right, where I'm going to take that large amount of data and do the things that Jim does, you know and apply machine learning and deep algorithms too really. >> I had some thoughts on the IoT Jim and I were arguing, not arguing, discussing, with others in theCube about the role. >> We were bickering. >> The role of the edge because I was saying the refiner of the data can come back depending on what kind of data or you push compute to the edge, kind of known concepts, people been discussing that. But the issue is been, how do you view the edge? I'd love to get your reaction to that question because a lot of people are saying you have to think of IoT as a completely different category, than just cloud, than just data center, because the way some people are looking at IoT I know this can be semantics whether it's industrial or just straight internet of things device, or person, that is a different animal when it comes to like what you call it and how it gets put into a bucket. I mean most people put a lot of the IT bucket but. Some are saying IT edge should be completely different category of how you look at those problems. Your thoughts on how that IoT conversation shape. >> The question I always ask when I'm talking to somebody about the edge is, well what do you mean? Because it is something that can be defined a little bit differently but in an industrial IoT context I think, you know we look at it as one, you you have to know what those things are you have to really understand them. And part of understanding those things is having a digital representation of what those things are. >> A digital twin? >> A digital twin. Right, or asset avatar, as we call it at Hitachi. >> Oh I like that. >> So this idea of really managing those assets, understanding what they are and then being able to know what the current state, what the previous state, things are like that are. And then that refinery we just talked about is sort of where that information goes to so you can do other kinds of analytics right. But when you're talking about the edge, typically what we're seeing is the kinds of analytics might happen at the edge, are probably more around filtering you know, it's not quite as complex of analytics that's what we're seeing today. Now, the future I don't know. >> Sort of tiered analytics from the edge on in with more minimal, I mean, not minimal that's the wrong term, with a more narrowly scoped inference. Like predictions and so forth being handled at the edge with larger more complex models being like deep learning whatever being processed in the cloud is that it? >> Yeah that's exactly the way that I see it. Now the other thing about the edge, depends on who you're talking to, again, but what is an edge device or the the gateways or the compute right, so part of IoT is in my mind, it's not cloud, it's not on-prem or it's not, I mean it's a little bit of everything right, it depends on the use case and what you're operating. We have a customer who does trains as a service in England, in Europe, and so they don't sell the trains anymore they actually manufacture trains, and they sell the service of getting a passenger from here to there. But for them, edge is everything that happens on those trains. And tracking, as a digital representation, the train and then being able to drill down deeper and deeper, and you, know one of the things that I understand is one of the major delays for train service is doors opening and closing or being delayed, so maybe that comes down to a small part and the vibration of it and tracking that. So you've got to be able to track that appropriately. Now, on a train you might have a lot of extra space so you could put compute devices that have a lot of power. >> What's interesting you said the edge, in this context, is everything that happens on that train. In other words, it sounds like all the real world outcomes that are enabled, perhaps optimized, by embedding of the analytics in those physical devices or in that entire vehicle that is essentially. One way that you're describing the edge which is not a single device but as a complete assembly of devices that play together. Amongst themselves and in with the services in the cloud. Is that a logical sort of framework? >> That's why I said I usually ask what do we mean by edge. If you've got millions, thousands, whatever, devices out there feeding sensors whatever feeding this data, collecting, processing you know there's some some level of edge computing gateways, processes that are going to happen. >> Well, my question for ya, I'd like to get your thoughts, as we, again we're having a, we love the hyperbio we think its completely legit and it's going to be continued to be hyped because it's obvious what you see with IoT standing on the edge. But lot of customers we talked to are like, look I got a lot going on I got application development I got to break out my security got to build that up. I've got data governance issues, and now you throw in IoT over the top. They're like, I'm choking in projects. So they they come down to one of a selection criteria. How do they define a working IoT project? And the trend that we're seeing is that it has to do with their industrial equipment or something related to their business. Call it industrial IoT, because if they have something in their business, say trains, as a critical part of what they do, that's easy to say let's justify this. Everything else then tends to go on the back burner, if they don't have clear visibility of what their instrumenting. That's kind of weird do you agree with that? Do you see a pattern as well as what customers are doing by saying I'm going to bring this project in and were going to connect our IoT. >> That's exactly what I see. Industrial internet of things is where I see the biggest value today when you have trains or mining equipment or you know whatever. >> John: Whatever your business runs. >> Your manufacturing line right. and being able to a fine tune those lines to either predicts failures, maybe improve quality. Those are those are impactful and they can be done right now today and that's what we're seeing is kind of the big emerging thing. IoT's interesting to talk about, the reality is it's really digital transformation that we're seeing. Companies transforming into new business models, doing things significantly different to grow into the future. And IoT is an enabler of that. So you're not going to see IoT everywhere today. >> The low hanging fruit is where it gets to the real business. >> Yeah, but it's going to go across all verticals, right, no doubt. >> So what solutions does Pentaho have for digital twins, or managing digital twins, the objects, the data itself, within and IoT context, is this something you're engaged in already? >> So within the Hitachi Vantara, the larger company. Bigger company, we have, we have what we call our Lumada IoT Platform and in that there is this asset avatar technology that that does exactly what you're describing. Now I'm going to throw quick plug out if you don't mind. Pentaho World in a couple, in about a month. >> John: theCUBE will be there. >> theCUBE will be there, and we're excited to have theCUBE and we're going to we're going to give you complete information about asset avatar with all the right people. >> There's a movie in there somewhere I could feel it, Avatar two. There's a lot of great representations of data I want to get your thoughts on how the new firm's going to solve customer problems. Because now as the customer see this new entity from you guys, Vantara's been doing real well, we covered the acquisition and you were kind of left alone Pentaho was integrating in, but it wasn't like a radical shift. Now there's some movement, what does it mean to the customer, what's the story to the customer. >> You know I think it's great news for the customer because Pentaho's always been very customer focused. But when you look at Hitachi Vantara the wealth of technology and expertise. Everything from all of the the great IT oriented stuff that Hitachi Data Systems has done and been well known for in the past still exists. But this broader focus of taking data and processing it in a variety of ways to solve real business problems. All the way to orchestrating machine learning in applying algorithms and then with the Hitachi. >> What specifically in Hitachi is coming into this? Because again this is again a focused solution company now with data, so Hitachi Data Centers, >> Yeah, so Hitachi Data Systems, think of it as the the infrastructure company. Hitachi Insight was the really focused largely on the IoT platform development, with some Pentaho assets and then the Pentaho business. But here's the thing about Hitachi, very large company, builds everything. Mining equipment and and all kinds of stuff. So nobody understands how all those things fit together better, I believe, than Hitachi. But some of the things that we have at that organization is this idea of the Hitachi labs. And data scientists that are really doing interesting things Jim you'd love to get more embedded into what some of those things are, and making that available to customers is a huge opportunity for customers to now be able to embrace a lot of the technologies we've been talking about. I said last year that this year was going to be the year of machine learning. And if you look through the expo hall that's what everybody's talking about. Right, it's AI or machine learning. >> I'm wondering if you're commercializing R&D that's coming straight out of Hitachi labs already or whether the Vantara combination will enable that. In other words, more innovation straight out of the labs, into into the commercial arena. >> That's something that we are absolutely trying to to, right because there's great things that these lab organizations and at Hitachi they're big labs. They're really legit, I kind of joke about that. The kinds of stuff that they're able to bring about now, Pentaho is part of the engine to help actually commercialize those things. >> Chuck I know you're looking forward to Pentaho World I'll give you the final word here in this segment how you see the big data worlds evolve. Take your Pentaho hat off and put your industry guru hat on. What's happening, I mean this AI watch, that's pretty obvious, not a lot of blockchain discussion which is going to completely open up some things we getting on the decentralized application market which is going to compliment the distributed nature of how we see a date analytics flow and certainly the immutability of it's interesting. But that's kind of down the road. But here you're starting to see the swim lanes in the industry, you've seen people who've been successful and the ones who have fallen by the wayside. But now the customers, they want real solutions. They don't want more hype, they don't want another eighth year of hype, they want OK let's get into the real meat and potatoes of data impact to my organization, call it digital transformation. What's happening, what is going on the landscape. >> So you know I mentioned before and to me it's digital transformation which is a big huge thing. But that's what companies are interested in that's what they're beginning to think. If they're not thinking about those things they're falling behind, five or six, seven years ago we talked about the same exact thing with big data. It's like a big data is really you know it's a big opportunity and they're like well I don't know those that didn't adopt it aren't necessarily in a position now to transform digitally and to do some of the things that they're going to need to evolve into new business opportunities. >> And the big data examples of winner is the ones who actually made it valuable. Whether it's insight that converted to a new customer or change an outcome in a positive way, they go that wouldn't have been possible without data. The proof points kind of hit the table. >> That's right the other thing is you know, who's going to win, who's going to lose. I think people that are implementing technology for technology's sake are going to lose. People that are focused on the outcomes are going to win. That's what it is, technology enables all that but you've really got to be focused on. I want to get your quick, one more quick thing, before we go I know we got we're tight on time but I want to get thoughts on the open ecosystem. Open source going to whole other level. The projections are code will be shipping at an exponential rate, it's be a lot of onboarding of new stuff, so open obviously works, community models work, partnering is critical. So we're seeing that good partnerships, not fake deals or optical deals or Barney deals, whatever you want to call it. But real partnerships. You starting to see technology partnerships. What's your view on that, how is the new Vantara going to go forward, are you going to continue to do partnerships and what's the strategy? >> Yeah I think the opportunity with one, Hitachi Vantara is we have a breadth that can touch many different aspects. So as Pentaho we had great partnerships, very meaningful but it always comes down to what we doing for the customer. How are we changing things for customer. So I'm not a believer in those Barney kind of relationships those are nice but let's talk about what we're doing for customers. >> Yeah, real proof points. >> You guys will continue to parner. >> Yes, we will continue to do that. >> Okay great, Chuck, thank you so much. CUBE coverage Live in New York City in Manhattan it's theCUBE with Big Data NYC, out fifth year doing our own event in conjunction with Strata Data. Now bless the new name of the show. It was Strata Hadoop, Hadoop World before that. But we're still theCUBE covering eight years of the action here back with more after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media Chuck great to see you again, theCUBE alumni. now the market's evolved, what's this new entity, Yes, so the big news Hitachi Vantara. is that part the purpose? the big data term, but you know, it's all data right. One of the big initiatives, part of why Pentaho the concepts have evolved significantly. What is the update in those areas? and do the things that Jim does, you know on the IoT Jim and I were arguing, not arguing, But the issue is been, how do you view the edge? to somebody about the edge is, well what do you mean? Right, or asset avatar, as we call it at Hitachi. to know what the current state, what the previous state, I mean, not minimal that's the wrong term, it depends on the use case and what you're operating. by embedding of the analytics in those physical devices gateways, processes that are going to happen. to be continued to be hyped because it's obvious what you I see the biggest value today when you have trains and being able to a fine tune those lines it gets to the real business. Yeah, but it's going to go across all verticals, Now I'm going to throw quick plug out if you don't mind. and we're going to we're going to give you Because now as the customer see this new entity Everything from all of the the great But some of the things that we have of the labs, into into the commercial arena. now, Pentaho is part of the engine to help But now the customers, they want real solutions. and to do some of the things that they're going to need Whether it's insight that converted to a new customer People that are focused on the outcomes are going to win. to what we doing for the customer. continue to parner. to do that. of the action here back with more after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Hitachi	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Chuck Yarbough	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
England	LOCATION	0.99+
Hitachi Data Systems	ORGANIZATION	0.99+
Chuck	PERSON	0.99+
Vantara	ORGANIZATION	0.99+
2015	DATE	0.99+
Pentaho	ORGANIZATION	0.99+
millions	QUANTITY	0.99+
One	QUANTITY	0.99+
95%	QUANTITY	0.99+
New York City	LOCATION	0.99+
John Furrier	PERSON	0.99+
10x	QUANTITY	0.99+
last year	DATE	0.99+
eighth year	QUANTITY	0.99+
three groups	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
last week	DATE	0.99+
eight years	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
yesterday	DATE	0.99+
two years	QUANTITY	0.99+
Hitachi Insight Group	ORGANIZATION	0.99+
Big Data	ORGANIZATION	0.99+
Insight Group	ORGANIZATION	0.99+
this year	DATE	0.99+
Midtown Manhattan	LOCATION	0.98+
Strata Conference	EVENT	0.98+
third organization	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.98+
two years ago	DATE	0.98+
Strata Hadoop	EVENT	0.98+
Wikibon Big Data	ORGANIZATION	0.98+
seven years ago	DATE	0.97+
Hitachi Insight	ORGANIZATION	0.97+
today	DATE	0.97+
Strata Data	EVENT	0.97+
Hadoop World	EVENT	0.96+
one	QUANTITY	0.96+
One way	QUANTITY	0.96+
NYC	LOCATION	0.96+
Pentaho Solutions	ORGANIZATION	0.96+
thousands	QUANTITY	0.95+
Hitachi Data Centers	ORGANIZATION	0.95+

Yaron Haviv, iguazio | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE, covering BigData New York City 2017, brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Okay, welcome back everyone, we're live in New York City, this is theCUBE's coverage of BigData NYC, this is our own event for five years now we've been running it, been at Hadoop World since 2010, it's our eighth year covering the Hadoop World which has evolved into Strata Conference, Strata Hadoop, now called Strata Data, and of course it's bigger than just Strata, it's about big data in NYC, a lot of big players here inside theCUBE, thought leaders, entrepreneurs, and great guests. I'm John Furrier, the cohost this week with Jim Kobielus, who's the lead analyst on our BigData and our Wikibon team. Our next guest is Yaron Haviv, who's with iguazio, he's the founder and CTO, hot startup here at the show, making a lot of waves on their new platform. Welcome to theCUBE, good to see you again, congratulations. >> Yes, thanks, thanks very much. We're happy to be here again. >> You're known in the theCUBE community as the guy on Twitter who's always pinging me and Dave and team, saying, "Hey, you know, you guys got to "get that right." You really are one of the smartest guys on the network in our community, you're super-smart, your team has got great tech chops, and in the middle of all that is the hottest market which is cloud native, cloud native as it relates to the integration of how apps are being built, and essentially new ways of engineering around these solutions, not just repackaging old stuff, it's really about putting things in a true cloud environment, with an application development, with data at the center of it, you got a whole complex platform you've introduced. So really, really want to dig into this. So before we get into some of my pointed questions I know Jim's got a ton of questions, is give us an update on what's going on so you guys got some news here at the show, let's get to that first. >> So since the last time we spoke, we had tons of news. We're making revenues, we have customers, we've just recently GA'ed, we recently got significant investment from major investors, we raised about $33 million recently from companies like Verizon Ventures, Bosch, you know for IoT, Chicago Mercantile Exchange, which is Dow Jones and other properties, Dell EMC. So pretty broad. >> John: So customers, pretty much. >> Yeah, so that's the interesting thing. Usually you know investors are sort of strategic investors or partners or potential buyers, but here it's essentially our customers that it's so strategic to the business, we want to... >> Let's go with GA of the projects, just get into what's shipping, what's available, what's the general availability, what are you now offering? >> So iguazio is trying to, you know, you alluded to cloud native and all that. Usually when you go to events like Strata and BigData it's nothing to do with cloud native, a lot of hard labor, not really continuous development and integration, it's like continuous hard work, it's continuous hard work. And essentially what we did, we created a data platform which is extremely fast and integrated, you know has all the different forms of states, streaming and events and documents and tables and all that, into a very unique architecture, won't dive into that today. And on top of it we've integrated cloud services like Kubernetes and serverless functionality and others, so we can essentially create a hybrid cloud. So some of our customers they even deploy portions as an Opix-based settings in the cloud, and some portions in the edge or in the enterprise deployed the software, or even a prepackaged appliance. So we're the only ones that provide a full hybrid experience. >> John: Is this a SAS product? >> So it's a software stack, and it could be delivered in three different options. One, if you don't want to mess with the hardware, you can just rent it, and it's deployed in Equanix facility, we have very strong partnerships with them globally. If you want to have something on-prem, you can get a software reference architecture, you go and deploy it. If you're a telco or an IoT player that wants a manufacturing facility, we have a very small 2U box, four servers, four GPUs, all the analytics tech you could think of. You just put it in the factory instead of like two racks of Hadoop. >> So you're not general purpose, you're just whatever the customer wants to deploy the stack, their flexibility is on them. >> Yeah. Now it is an appliance >> You have a hosting solution? >> It is an appliance even when you deploy it on-prem, it's a bunch of Docker containers inside that you don't even touch them, you don't SSH to the machine. You have APIs and you have UIs, and just like the cloud experience when you go to Amazon, you don't open the Kimono, you know, you just use it. So our experience that's what we're telling customers. No root access problems, no security problems. It's a hardened system. Give us servers, we'll deploy it, and you go through consoles and UIs, >> You don't host anything for anyone? >> We host for some customers, including >> So you do whatever the customer was interested in doing? >> Yes. (laughs) >> So you're flexible, okay. >> We just want to make money. >> You're pretty good, sticking to the product. So on the GA, so here essentially the big data world you mentioned that there's data layers, like data piece. So I got to ask you the question, so pretend I'm an idiot for a second, right. >> Yaron: Okay. >> Okay, yeah. >> No, you're a smart guy. >> What problem are you solving. So we'll just go to the simple. I love what you're doing, I assume you guys are super-smart, which I can say you are, but what's the problem you're solving, what's in it for me? >> Okay, so there are two problems. One is the challenge everyone wants to transform. You know there is this digital transformation mantra. And it means essentially two things. One is, I want to automate my operation environment so I can cut costs and be more competitive. The other one is I want to improve my customer engagement. You know, I want to do mobile apps which are smarter, you know get more direct content to the user, get more targeted functionality, et cetera. These are the two key challenges for every business, any industry, okay? So they go and they deploy Hadoop and Hive and all that stuff, and it takes them two years to productize it. And then they get to the data science bit. And by the time they finished they understand that this Hadoop thing can only do one thing. It's queries, and reporting and BI, and data warehousing. How do you do actionable insights from that stuff, okay? 'Cause actionable insights means I get information from the mobile app, and then I translate it into some action. I have to enrich the vectors, the machine learning, all that details. And then I need to respond. Hadoop doesn't know how to do it. So the first generation is people that pulled a lot of stuff into data lake, and started querying it and generating reports. And the boss said >> Low cost data link basically, was what you say. >> Yes, and the boss said, "Okay, what are we going to do with this report? "Is it generating any revenue to the business?" No. The only revenue generation if you take this data >> You're fired, exactly. >> No, not all fired, but now >> John: Look at the budget >> Now they're starting to buy our stuff. So now the point is okay, how can I put all this data, and in the same time generate actions, and also deal with the production aspects of, I want to develop in a beta phase, I want to promote it into production. That's cloud native architectures, okay? Hadoop is not cloud, How do I take a Spark, Zeppelin, you know, a notebook and I turn it into production? There's no way to do that. >> By the way, depending on which cloud you go to, they have a different mechanism and elements for each cloud. >> Yeah, so the cloud providers do address that because they are selling the package, >> Expands all the clouds, yeah. >> Yeah, so cloud providers are starting to have their own offerings which are all proprietary around this is how you would, you know, forget about HDFS, we'll have S3, and we'll have Redshift for you, and we'll have Athena, and again you're starting to consume that into a service. Still doesn't address the continuous analytics challenge that people have. And if you're looking at what we've done with Grab, which is amazing, they started with using Amazon services, S3, Redshift, you know, Kinesis, all that stuff, and it took them about two hours to generate the insights. Now the problem is they want to do driver incentives in real time. So they want to incent the driver to go and make more rides or other things, so they have to analyze the event of the location of the driver, the event of the location of the customers, and just throwing messages back based on analytics. So that's real time analytics, and that's not something that you can do >> They got to build that from scratch right away. I mean they can't do that with the existing. >> No, and Uber invested tons of energy around that and they don't get the same functionality. Another unique feature that we talk about in our PR >> This is for the use case you're talking about, this is the Grab, which is the car >> Grab is the number one ride-sharing in Asia, which is bigger than Uber in Asia, and they're using our platform. By the way, even Uber doesn't really use Hadoop, they use MemSQL for that stuff, so it's not really using open source and all that. But the point is for example, with Uber, when you have a, when they monetize the rides, they do it just based on demand, okay. And with Grab, now what they do, because of the capability that we can intersect tons of data in real time, they can also look at the weather, was there a terror attack or something like that. They don't want to raise the price >> A lot of other data points, could be traffic >> They don't want to raise the price if there was a problem, you know, and all the customers get aggravated. This is actually intersecting data in real time, and no one today can do that in real time beyond what we can do. >> A lot of people have semantic problems with real time, they don't even know what they mean by real time. >> Yaron: Yes. >> The data could be a week old, but they can get it to them in real time. >> But every decision, if you think if you generalize round the problem, okay, and we have slides on that that I explain to customers. Every time I run analytics, I need to look at four types of data. The context, the event, okay, what happened, okay. The second type of data is the previous state. Like I have a car, was it up or down or what's the previous state of that element? The third element is the time aggregation, like, what happened in the last hour, the average temperature, the average, you know, ticker price for the stock, et cetera, okay? And the fourth thing is enriched data, like I have a car ID, but what's the make, what's the model, who's driving it right now. That's secondary data. So every time I run a machine learning task or any decision I have to collect all those four types of data into one vector, it's called feature vector, and take a decision on that. You take Kafka, it's only the event part, okay, you take MemSQL, it's only the state part, you take Hadoop it's only like historical stuff. How do you assemble and stitch a feature vector. >> Well you talked about complex machine learning pipeline, so clearly, you're talking about a hybrid >> It's a prediction. And actions based on just dumb things, like the car broke and I need to send a garage, I don't need machine learning for that. >> So within your environment then, do you enable the machine learning models to execute across the different data platforms, of which this hybrid environment is composed, and then do you aggregate the results of those models, runs into some larger model that drives the real time decision? >> In our solution, everything is a document, so even a picture is a document, a lot of things. So you can essentially throw in a picture, run tensor flow, embed more features into the document, and then query those features on another platform. So that's really what makes this continuous analytics extremely flexible, so that's what we give customers. The first thing is simplicity. They can now build applications, you know we have tier one now, automotive customer, CIO coming, meeting us. So you know when I have a project, one year, I need to have hired dozens of people, it's hugely complex, you know. Tell us what's the use case, and we'll build a prototype. >> John: All right, well I'm going to >> One week, we gave them a prototype, and he was amazed how in one week we created an application that analyzed all the streams from the data from the cars, did enrichment, did machine learning, and provided predictions. >> Well we're going to have to come in and test you on this, because I'm skeptical, but here's why. >> Everyone is. >> We'll get to that, I mean I'm probably not skeptical but I kind of am because the history is pretty clear. If you look at some of the big ideas out there, like OpenStack. I mean that thing just morphed into a beast. Hadoop was a cost of ownership nightmare as you mentioned early on. So people have been conceptually correct on what they were trying to do, but trying to get it done was always hard, and then it took a long time to kind of figure out the operational model. So how are you different, if I'm going to play the skeptic here? You know, I've heard this before. How are you different than say OpenStack or Hadoop Clusters, 'cause that was a nightmare, cost of ownership, I couldn't get the type of value I needed, lost my budget. Why aren't you the same? >> Okay, that's interesting. I don't know if you know but I ran a lot of development for OpenStack when I was in Matinox and Hadoop, so I patched a lot of those >> So do you agree with what I said? That that was a problem? >> They are extremely complex, yes. And I think one of the things that first OpenStack tried to bite on too much, and it's sort of a huge tent, everyone tries to push his agenda. OpenStack is still an infrastructure layer, okay. And also Hadoop is sort of a something in between an infrastructure and an application layer, but it was designed 10 years ago, where the problem that Hadoop tried to solve is how do you do web ranking, okay, on tons of batch data. And then the ecosystem evolved into real time, and streaming and machine learning. >> A data warehousing alternative or whatever. >> So it doesn't fit the original model of batch processing, 'cause if an event comes from the car or an IoT device, and you have to do something with it, you need a table with an index. You can't just go and build a huge Parquet file. >> You know, you're talking about complexity >> John: That's why he's different. >> Go ahead. >> So what we've done with our team, after knowing OpenStack and all those >> John: All the scar tissue. >> And all the scar tissues, and my role was also working with all the cloud service providers, so I know their internal architecture, and I worked on SAP HANA and Exodata and all those things, so we learned from the bad experiences, said let's forget about the lower layers, which is what OpenStack is trying to provide, provide you infrastructure as a service. Let's focus on the application, and build from the application all the way to the flash, and the CPU instruction set, and the adapters and the networking, okay. That's what's different. So what we provide is an application and service experience. We don't provide infrastructure. If you go buy VMware and Nutanix, all those offerings, you get infrastructure. Now you go and build with the dozen of dev ops guys all the stack above. You go to Amazon, you get services. Just they're not the most optimized in terms of the implementation because they also have dozens of independent projects that each one takes a VM and starts writing some >> But they're still a good service, but you got to put it together. >> Yeah right. But also the way they implement, because in order for them to scale is that they have a common layer, they found VMs, and then they're starting to build up applications so it's inefficient. And also a lot of it is built on 10-year-old baseline architecture. We've designed it for a very modern architecture, it's all parallel CPUs with 30 cores, you know, flash and NVMe. And so we've avoided a lot of the hardware challenges, and serialization, and just provide and abstraction layer pretty much like a cloud on top. >> Now in terms of abstraction layers in the cloud, they're efficient, and provide a simplification experience for developers. Serverless computing is up and coming, it's an important approach, of course we have the public clouds from AWS and Google and IBM and Microsoft. There are a growing range of serverless computing frameworks for prem-based deployment. I believe you are behind one. Can you talk about what you're doing at iguazio on serverless frameworks for on-prem or public? >> Yes, it's the first time I'm very active in CNC after Cloud Native Foundation. I'm one of the authors of the serverless white paper, which tries to normalize the definitions of all the vendors and come with a proposal for interoperable standard. So I spent a lot of energy on that, 'cause we don't want to lock customers to an API. What's unique, by the way, about our solution, we don't have a single proprietary API. We just emulate all the other guys' stuff. We have all the Amazon APIs for data services, like Kinesis, Dynamo, S3, et cetera. We have the open source APIs, like Kafka. So also on the serverless, my agenda is trying to promote that if I'm writing to Azure or AWS or iguazio, I don't need to change my app. I can use any developer tools. So that's my effort there. And we recently, a few weeks ago, we launched our open source project, which is a sort of second generation of something we had before called Nuclio. It's designed for real time >> John: How do you spell that? >> N-U-C-L-I-O. I even have the logo >> He's got a nice slick here. >> It's really fast because it's >> John: Nuclio, so that's open source that you guys just sponsor and it's all code out in the open? >> All the code is in the open, pretty cool, has a lot of innovative ideas on how to do stream processing and best, 'cause the original serverless functionality was designed around web hooks and HTTP, and even many of the open source projects are really designed around HTTP serving. >> I have a question. I'm doing research for Wikibon on the area of serverless, in fact we've recently published a report on serverless, and in terms of hybrid cloud environments, I'm not seeing yet any hybrid serverless clouds that involve public, you know, serverless like AWS Lambda, and private on-prem deployment of serverless. Do you have any customers who are doing that or interested in hybridizing serverless across public and private? >> Of course, and we have some patents I don't want to go into, but the general idea is, what we've done in Nuclio is also the decoupling of the data from the computation, which means that things can sort of be disjoined. You can run a function in Raspberry Pi, and the data will be in a different place, and those things can sort of move, okay. >> So the persistence has to happen outside the serverless environment, like in the application itself? >> Outside of the function, the function acts as the persistent layer through APIs, okay. And how this data persistence is materialized, that server separate thing. So you can actually write the same function that will run against Kafka or Kinesis or Private MQ, or HTTP without modifying the function, and ad hoc, through what we call function bindings, you define what's going to be the thing driving the data, or storing the data. So that can actually write the same function that does ETL drop from table one to table two. You don't need to put the table information in the function, which is not the thing that Lambda does. And it's about a hundred times faster than Lambda, we do 400,000 events per second in Nuclio. So if you write your serverless code in Nuclio, it's faster than writing it yourself, because of all those low-level optimizations. >> Yaron, thanks for coming on theCUBE. We want to do a deeper dive, love to have you out in Palo Alto next time you're in town. Let us know when you're in Silicon Valley for sure, we'll make sure we get you on camera for multiple sessions. >> And more information re:Invent. >> Go to re:Invent. We're looking forward to seeing you there. Love the continuous analytics message, I think continuous integration is going through a massive renaissance right now, you're starting to see new approaches, and I think things that you're doing is exactly along the lines of what the world wants, which is alternatives, innovation, and thanks for sharing on theCUBE. >> Great. >> That's very great. >> This is theCUBE coverage of the hot startups here at BigData NYC, live coverage from New York, after this short break. I'm John Furrier, Jim Kobielus, after this short break.

Published Date : Sep 27 2017

SUMMARY :

brought to you by SiliconANGLE Media I'm John Furrier, the cohost this week with Jim Kobielus, We're happy to be here again. and in the middle of all that is the hottest market So since the last time we spoke, we had tons of news. Yeah, so that's the interesting thing. and some portions in the edge or in the enterprise all the analytics tech you could think of. So you're not general purpose, you're just Now it is an appliance and just like the cloud experience when you go to Amazon, So I got to ask you the question, which I can say you are, So the first generation is people that basically, was what you say. Yes, and the boss said, and in the same time generate actions, By the way, depending on which cloud you go to, and that's not something that you can do I mean they can't do that with the existing. and they don't get the same functionality. because of the capability that we can intersect and all the customers get aggravated. A lot of people have semantic problems with real time, but they can get it to them in real time. the average temperature, the average, you know, like the car broke and I need to send a garage, So you know when I have a project, an application that analyzed all the streams from the data Well we're going to have to come in and test you on this, but I kind of am because the history is pretty clear. I don't know if you know but I ran a lot of development is how do you do web ranking, okay, and you have to do something with it, and build from the application all the way to the flash, but you got to put it together. it's all parallel CPUs with 30 cores, you know, Now in terms of abstraction layers in the cloud, So also on the serverless, my agenda is trying to promote I even have the logo and even many of the open source projects on the area of serverless, in fact we've recently and the data will be in a different place, So if you write your serverless code in Nuclio, We want to do a deeper dive, love to have you is exactly along the lines of what the world wants, I'm John Furrier, Jim Kobielus, after this short break.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Bosch	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Verizon Ventures	ORGANIZATION	0.99+
Yaron Haviv	PERSON	0.99+
Asia	LOCATION	0.99+
NYC	LOCATION	0.99+
Google	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
Jim	PERSON	0.99+
Palo Alto	LOCATION	0.99+
30 cores	QUANTITY	0.99+
New York	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
BigData	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
two problems	QUANTITY	0.99+
Dell EMC	ORGANIZATION	0.99+
Yaron	PERSON	0.99+
One	QUANTITY	0.99+
Dave	PERSON	0.99+
Kafka	TITLE	0.99+
third element	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Dow Jones	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
two racks	QUANTITY	0.99+
today	DATE	0.99+
Grab	ORGANIZATION	0.99+
Nuclio	TITLE	0.99+
two key challenges	QUANTITY	0.99+
Cloud Native Foundation	ORGANIZATION	0.99+
about $33 million	QUANTITY	0.99+
eighth year	QUANTITY	0.99+
Hadoop	TITLE	0.98+
second type	QUANTITY	0.98+
Lambda	TITLE	0.98+
10 years ago	DATE	0.98+
each cloud	QUANTITY	0.98+
Strata Conference	EVENT	0.98+
Equanix	LOCATION	0.98+
10-year-old	QUANTITY	0.98+
first thing	QUANTITY	0.98+
first generation	QUANTITY	0.98+
one	QUANTITY	0.98+
second generation	QUANTITY	0.98+
Hadoop World	EVENT	0.98+
first time	QUANTITY	0.98+
theCUBE	ORGANIZATION	0.97+
Nutanix	ORGANIZATION	0.97+
MemSQL	TITLE	0.97+
each one	QUANTITY	0.97+
2010	DATE	0.97+
Kinesis	TITLE	0.97+
SAS	ORGANIZATION	0.96+
Wikibon	ORGANIZATION	0.96+
Chicago Mercantile Exchange	ORGANIZATION	0.96+
about two hours	QUANTITY	0.96+
this week	DATE	0.96+
one thing	QUANTITY	0.95+
dozen	QUANTITY	0.95+

Christian Rodatus, Datameer | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to by SiliconANGLE Media and its ecosystem sponsors. >> Coverage to theCUBE in New York City for Big Data NYC, the hashtag is BigDataNYC. This is our fifth year doing our own event in conjunction with Strata Hadoop, now called Strata Data, used to be Hadoop World, our eighth year covering the industry, we've been there from the beginning in 2010, the beginning of this revolution. I'm John Furrier, the co-host, with Jim Kobielus, our lead analyst at Wikibon. Our next guest is Christian Rodatus, who is the CEO of Datameer. Datameer, obviously, one of the startups now evolving on the, I think, eighth year or so, roughly seven or eight years old. Great customer base, been successful blocking and tackling, just doing good business. Your shirt says show him the data. Welcome to theCUBE, Christian, appreciate it. >> So well established, I barely think of you as a startup anymore. >> It's kind of true, and actually a couple of months ago, after I took on the job, I met Mike Olson, and Datameer and Cloudera were sort of founded the same year, I believe late 2009, early 2010. Then, he told me there were two open source projects with MapReduce and Hadoop, basically, and Datameer was founded to actually enable customers to do something with it, as an entry platform to help getting data in, create the data and doing something with it. And now, if you walk the show floor, it's a completely different landscape now. >> We've had you guys on before, the founder, Stefan, has been on. Interesting migration, we've seen you guys grow from a customer base standpoint. You've come on as the CEO to kind of take it to the next level. Give us an update on what's going on at Datameer. Obviously, the shirt says "Show me the data." Show me the money kind of play there, I get that. That's where the money is, the data is where the action is. Real solutions, not pie in the sky, we're now in our eighth year of this market, so there's not a lot of tolerance for hype even though there's a lot of AI watching going on. What's going on with you guys? >> I would say, interesting enough I met with a customer, prospective customer, this morning, and this was a very typical organization. So, this is a customer that was an insurance company, and they're just about to spin up their first Hadoop cluster to actually work on customer management applications. And they are overwhelmed with what the market offers now. There's 27 open source projects, there's dozens and dozens of other different tools that try to basically, they try best of reach approaches and certain layers of the stack for specific applications, and they don't really know how to stitch this all together. And if I reflect from a customer meeting at a Canadian bank recently that has very successfully deployed applications on the data lake, like in fraud management and compliance applications and things like this, they still struggle to basically replicate the same performance and the service level agreements that they used from their old EDW that they still have in production. And so, everybody's now going out there and trying to figure out how to get value out of the data lake for the business users, right? There's a lot of approaches that these companies are trying. There's SQL-on-Hadoop that supposedly doesn't perform properly. There is other solutions like OLAP on Hadoop that tries to emulate what they've been used to from the EDWs, and we believe these are the wrong approaches, so we want to stay true to the stack and be native to the stack and offer a platform that really operates end-to-end from interesting the data into the data lake to creation, preparation of the data, and ultimately, building the data pipelines for the business users, and this is certainly something-- >> Here's more of a play for the business users now, not the data scientists and statistical modelers. I thought the data scientists were your core market. Is that not true? >> So, our primary user base as Datameer used to be like, until last week, we were the data engineers in the companies, or basically the people that built the data lake, that created the data and built these data pipelines for the business user community no matter what tool they were using. >> Jim, I want to get your thoughts on this for Christian's interest. Last year, so these guys can fix your microphone. I think you guys fix the microphone for us, his earpiece there, but I want to get a question to Chris, and I ask to redirect through you. Gartner, another analyst firm. >> Jim: I've heard of 'em. >> Not a big fan personally, but you know. >> Jim: They're still in business? >> The magic quadrant, they use that tool. Anyway, they had a good intro stat. Last year, they predicted through 2017, 60% of big data projects will fail. So, the question for both you guys is did that actually happen? I don't think it did, I'm not hearing that 60% have failed, but we are seeing the struggle around analytics and scaling analytics in a way that's like a dev ops mentality. So, thoughts on this 60% data projects fail. >> I don't know whether it's 60%, there was another statistic that said there's only 14% of Hadoop deployments, or production or something, >> They said 60, six zero. >> Or whatever. >> Define failure, I mean, you've built a data lake, and maybe you're not using it immediately for any particular application. Does that mean you've failed, or does it simply mean you haven't found the killer application yet for it? I don't know, your thoughts. >> I agree with you, it's probably not a failure to that extent. It's more like how do they, so they dump the data into it, right, they build the infrastructure, now it's about the next step data lake 2.0 to figure out how do I get value out of the data, how do I go after the right applications, how do I build a platform and tools that basically promotes the use of that data throughout the business community in a meaningful way. >> Okay, so what's going on with you guys from a product standpoint? You guys have some announcements. Let's get to some of the latest and greatest. >> Absolutely. I think we were very strong in data creation, data preparation and the entire data governance around it, and we are using, as a user interface, we are using this spreadsheet-like user interface called a workbook, it really looks like Excel, but it's not. It operates at completely different scale. It's basically an Excel spreadsheet on steroids. Our customers built a data pipeline, so this is the data engineers that we discussed before, but we also have a relatively small power user community in our client base that use that spreadsheet for deep data exploration. Now, we are lifting this to the next level, and we put up a visualization layer on top of it that runs natively in the stack, and what you get is basically a visual experience not only in the data curation process but also in deep data exploration, and this is combined with two platform technologies that we use, it's based on highly scalable distributed search in the backend engine of our product, number one. We have also adopted a columnar data store, Parquet, for our file system now. In this combination, the data exploration capabilities we bring to the market will allow power analysts to really dig deep into the data, so there's literally no limits in terms of the breadth and the depth of the data. It could be billions of rows, it could be thousands of different attributes and columns that you are looking at, and you will get a response time of sub-second as we create indices on demand as we run this through the analytic process. >> With these fast queries and visualization, do you also have the ability to do semantic data virtualization roll-ups across multi-cloud or multi-cluster? >> Yeah, absolutely. We, also there's a second trend that we discussed right before we started the live transmission here. Things are also moving into the cloud, so what we are seeing right now is the EDW's not going away, the on prem is data lake, that prevail, right, and now they are thinking about moving certain workload types into the cloud, and we understand ourselves as a platform play that builds a data fabric that really ties all these data assets together, and it enables business. >> On the trends, we weren't on camera, we'll bring it up here, the impact of cloud to the data world. You've seen this movie before, you have extensive experience in this space going back to the origination, you'd say Teradata. When it was the classic, old-school data warehouse. And then, great purpose, great growth, massive value creation. Enter the Hadoop kind of disruption. Hadoop evolved from batch to do ranking stuff, and then tried to, it was a hammer that turned into a lawnmower, right? Then they started going down the path, and really, it wasn't workable for what people were looking at, but everyone was still trying to be the Teradata of whatever. Fast forward, so things have evolved and things are starting to shake out, same picture of data warehouse-like stuff, now you got cloud. It seems to be changing the nature of what it will become in the future. What's your perspective on that evolution? What's different about now and what's same about now that's, from the old days? What's the similarities of the old-school, and what's different that people are missing? >> I think it's a lot related to cloud, just in general. It is extremely important to fast adoptions throughout the organization, to get performance, and service-level agreements without customers. This is where we clearly can help, and we give them a user experience that is meaningful and that resembles what they were used to from the old EDW world, right? That's number one. Number two, and this comes back to a question to 60% fail, or why is it failing or working. I think there's a lot of really interesting projects out, and our customers are betting big time on the data lake projects whether it being on premise or in the cloud. And we work with HSBC, for instance, in the United Kingdom. They've got 32 data lake projects throughout the organization, and I spoke to one of these-- >> Not 32 data lakes, 32 projects that involve tapping into the data lake. >> 32 projects that involve various data lakes. >> Okay. (chuckling) >> And I spoke to one of the chief data officers there, and they said they are data center infrastructure just by having kick-started these projects will explode. And they're not in the business of operating all the hardware and things like this, and so, a major bank like them, they made an announcement recently, a public announcement, you can read about it, started moving the data assets into the cloud. This is clearly happening at rapid pace, and it will change the paradigm in terms of breathability and being able to satisfy peak workload requirements as they come up, when you run a compliance report at quota end or something like this, so this will certainly help with adoption and creating business value for our customers. >> We talk about all the time real-time, and there's so many examples of how data science has changed the game. I mean, I was talking about, from a cyber perspective, how data science helped capture Bin Laden to how I can get increased sales to better user experience on devices. Having real-time access to data, and you put in some quick data science around things, really helps things in the edge. What's your view on real-time? Obviously, that's super important, you got to kind of get your house in order in terms of base data hygiene and foundational work, building blocks. At the end of the day, the real-time seems to be super hot right now. >> Real-time is a relative term, right, so there's certainly applications like IOT applications, or machine data that you analyze that require real-time access. I would call it right-time, so what's the increment of data load that is required for certain applications? We are certainly not a real-time application yet. We can possibly load data through Kafka and stream data through Kafka, but in general, we are still a batch-oriented platform. We can do. >> Which, by the way, is not going away any time soon. It's like super important. >> No, it's not going away at all, right. It can do many batches at relatively frequent increments, which is usually enough for what our customers demand from our platform today, but we're certainly looking at more streaming types of capability as we move this forward. >> What do the customer architectures look like? Because you brought up the good point, we talk about this all the time, batch versus real-time. They're not mutually exclusive, obviously, good architectures would argue that you decouple them, obviously will have a good software elements all through the life cycle of data. >> Through the stack. >> And have the stack, and the stack's only going to get more robust. Your customers, what's the main value that you guys provide them, the problem that you're solving today and the benefits to them? >> Absolutely, so our true value is that there's no breakages in the stack. We enter, and we can basically satisfy all requirements from interesting the data, from blending and integrating the data, preparing the data, building the data pipelines, and analyzing the data. And all this we do in a highly secure and governed environment, so if you stitch it together, as a customer, the customer this morning asked me, "Whom do you compete with?" I keep getting this question all the time, and we really compete with two things. We compete with build-your-own, which customers still opt to do nowadays, while our things are really point and click and highly automated, and we compete with a combination of different products. You need to have at least three to four different products to be able to do what we do, but then you get security breaks, you get lack of data lineage and data governance through the process, and this is the biggest value that we can bring to the table. And secondly now with visual exploration, we offer capability that literally nobody has in the marketplace, where we give power users the capability to explore with blazing fast response times, billion rows of data in a very free-form type of exploration process. >> Are there more power users now than there were when you started as a company? It seemed like tools like Datameer have brought people into the sort of power user camp, just simply by the virtue of having access to your tool. What are your thoughts there? >> Absolutely, it's definitely growing, and you see also different companies exploiting their capability in different ways. You might find insurance or financial services customers that have a very sophisticated capability building in that area, and you might see 1,000 to 2,000 users that do deep data exploration, and other companies are starting out with a couple of dozen and then evolving it as they go. >> Christian, I got to ask you as the new CEO of Datameer, obviously going to the next level, you guys have been successful. We were commenting yesterday on theCUBE about, we've been covering this for eight years in depth in terms of CUBE coverage, we've seen the waves come and go of hype, but now there's not a lot of tolerance for hype. You guys are one of the companies, I will say, that stay to your knitting, you didn't overplay your hand. You've certainly rode the hype like everyone else did, but your solution is very specific on value, and so, you didn't overplay your hand, the company didn't really overplay their hand, in my opinion. But now, there's really the hand is value. >> Absolutely. >> As the new CEO, you got to kind of put a little shiny new toy on there, and you know, rub the, keep the car lookin' shiny and everything looking good with cutting edge stuff, the same time scaling up what's been working. The question is what are you doubling down on, and what are you investing in to keep that innovation going? >> There's really three things, and you're very much right, so this has become a mature company. We've grown with our customer base, our enterprise features and capabilities are second to none in the marketplace, this is what our customers achieve, and now, the three investment areas that we are putting together and where we are doubling down is really visual exploration as I outlined before. Number two, hybrid cloud architectures, we don't believe the customers move their entire stack right into the cloud. There's a few that are going to do this and that are looking into these things, but we will, we believe in the idea that they will still have to EDW their on premise data lake and some workload capabilities in the cloud which will be growing, so this is investment area number two. Number three is the entire concept of data curation for machine learning. This is something where we've released a plug-in earlier in the year for TensorFlow where we can basically build data pipelines for machine learning applications. This is still very small. We see some interest from customers, but it's growing interest. >> It's a directionally correct kind of vector, you're looking and say, it's a good sign, let's kick the tires on that and play around. >> Absolutely. >> 'Cause machine learning's got to learn, too. You got to learn from somewhere. >> And quite frankly, deep learning, machine learning tools for the rest of us, there aren't really all that many for the rest of us power users, they're going to have to come along and get really super visual in terms of enabling visual modular development and tuning of these models. What are your thoughts there in terms of going forward about a visualization layer to make machine learning and deep learning developers more productive? >> That is an area where we will not engage in a way. We will stick with our platform play where we focus on building the data pipelines into those tools. >> Jim: Gotcha. >> In the last area where we invest is ecosystem integration, so we think with our visual explorer backend that is built on search and on a Parquet file format is, or columnar store, is really a key differentiator in feeding or building data pipelines into the incumbent BRE ecosystems and accelerating those as well. We've currently prototypes running where we can basically give the same performance and depth of analytic capability to some of the existing BI tools that are out there. >> What are some the ecosystem partners do you guys have? I know partnering is a big part of what you guys have done. Can you name a few? >> I mean, the biggest one-- >> Everybody, Switzerland. >> No, not really. We are focused on staying true to our stack and how we can provide value to our customers, so we work actively and very important on our cloud strategy with Microsoft and Amazon AWS in evolving our cloud strategy. We've started working with various BI vendors throughout that you know about, right, and we definitely have a play also with some of the big SIs and IBM is a more popular one. >> So, BI guys mostly on the tool visualization side. You said you were a pipeline. >> On tool and visualization side, right. We have very effective integration for our data pipelines into the BI tools today we support TD for Tableau, we have a native integration. >> Why compete there, just be a service provider. >> Absolutely, and we have more and better technology come up to even accelerate those tools as well in our big data stuff. >> You're focused, you're scaling, final word I'll give to you for the segment. Share with the folks that are a Datameer customer or have not yet become a customer, what's the outlook, what's the new Datameer look like under your leadership? What should they expect? >> Yeah, absolutely, so I think they can expect utmost predictability, the way how we roll out the division and how we build our product in the next couple of releases. The next five, six months are critical for us. We have launched Visual Explorer here at the conference. We're going to launch our native cloud solution probably middle of November to the customer base. So, these are the big milestones that will help us for our next fiscal year and provide really great value to our customers, and that's what they can expect, predictability, a very solid product, all the enterprise-grade features they need and require for what they do. And if you look at it, we are really enterprise play, and the customer base that we have is very demanding and challenging, and we want to keep up and deliver a capability that is relevant for them and helps them create values from the data lakes. >> Christian Rodatus, technology enthusiast, passionate, now CEO of Datameer. Great to have you on theCUBE, thanks for sharing. >> Thanks so much. >> And we'll be following your progress. Datameer here inside theCUBE live coverage, hashtag BigDataNYC, our fifth year doing our own event here in conjunction with Strata Data, formerly Strata Hadoop, Hadoop World, eight years covering this space. I'm John Furrier with Jim Kobielus here inside theCUBE. More after this short break. >> Christian: Thank you. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to by SiliconANGLE Media and its ecosystem sponsors. I'm John Furrier, the co-host, with Jim Kobielus, So well established, I barely think of you create the data and doing something with it. You've come on as the CEO to kind of and the service level agreements that they used Here's more of a play for the business users now, that created the data and built these data pipelines and I ask to redirect through you. So, the question for both you guys is the killer application yet for it? the next step data lake 2.0 to figure out Okay, so what's going on with you guys and columns that you are looking at, and we understand ourselves as a platform play the impact of cloud to the data world. and that resembles what they were used to tapping into the data lake. and being able to satisfy peak workload requirements and you put in some quick data science around things, or machine data that you analyze Which, by the way, is not going away any time soon. more streaming types of capability as we move this forward. What do the customer architectures look like? and the stack's only going to get more robust. and analyzing the data. just simply by the virtue of having access to your tool. and you see also different companies and so, you didn't overplay your hand, the company and what are you investing in to keep that innovation going? and now, the three investment areas let's kick the tires on that and play around. You got to learn from somewhere. for the rest of us power users, We will stick with our platform play and depth of analytic capability to some of What are some the ecosystem partners do you guys have? and how we can provide value to our customers, on the tool visualization side. into the BI tools today we support TD for Tableau, Absolutely, and we have more and better technology Share with the folks that are a Datameer customer and the customer base that we have is Great to have you on theCUBE, here in conjunction with Strata Data, Christian: Thank you.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Chris	PERSON	0.99+
HSBC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Jim	PERSON	0.99+
Christian Rodatus	PERSON	0.99+
Stefan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
60%	QUANTITY	0.99+
2017	DATE	0.99+
Datameer	ORGANIZATION	0.99+
2010	DATE	0.99+
32 projects	QUANTITY	0.99+
Last year	DATE	0.99+
United Kingdom	LOCATION	0.99+
1,000	QUANTITY	0.99+
New York City	LOCATION	0.99+
14%	QUANTITY	0.99+
eight years	QUANTITY	0.99+
fifth year	QUANTITY	0.99+
one	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Excel	TITLE	0.99+
eighth year	QUANTITY	0.99+
late 2009	DATE	0.99+
early 2010	DATE	0.99+
Mike Olson	PERSON	0.99+
60	QUANTITY	0.99+
27 open source projects	QUANTITY	0.99+
last week	DATE	0.99+
thousands	QUANTITY	0.99+
two things	QUANTITY	0.99+
Kafka	TITLE	0.99+
seven	QUANTITY	0.99+
second trend	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.99+
yesterday	DATE	0.99+
Christian	PERSON	0.99+
both	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.98+
two open source projects	QUANTITY	0.98+
Gartner	ORGANIZATION	0.98+
two platform technologies	QUANTITY	0.98+
Wikibon	ORGANIZATION	0.98+
Switzerland	LOCATION	0.98+
billions of rows	QUANTITY	0.98+
first	QUANTITY	0.98+
MapReduce	ORGANIZATION	0.98+
2,000 users	QUANTITY	0.98+
Bin Laden	PERSON	0.98+
NYC	LOCATION	0.97+
Strata Data	ORGANIZATION	0.97+
32 data lakes	QUANTITY	0.97+
six	QUANTITY	0.97+
Hadoop	TITLE	0.97+
secondly	QUANTITY	0.96+
next fiscal year	DATE	0.96+
three things	QUANTITY	0.96+
today	DATE	0.95+
four different products	QUANTITY	0.95+
Teradata	ORGANIZATION	0.95+
Christian	ORGANIZATION	0.95+
this morning	DATE	0.95+
TD	ORGANIZATION	0.94+
EDW	ORGANIZATION	0.94+
BigData	EVENT	0.92+

Emma McGrattan, Actian | Big Data NYC 2017

>> Announcer: Live from midtown Manhattan it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and it's ecosystem sponsors. (upbeat techno music) >> Hello, everyone. Welcome back to theCUBE's exclusive coverage of Big Data NYC for all the access. It's our fifth year doing our own event in New York City. The hashtag is BigDataNYC. Also, in conjunction with Strata Hadoop, used to be called Hadoop World, then Strata Hadoop. Now, it's called Strata Data as they try to grope to where the future's going to be. A lot of hype over there. A lot of action. But here as where we do the intimate interviews and the stories. I'm John Furrier, co-host of theCUBE with Emma McGrattan who is the Senior Vice President of Engineering at Actian. Great to have you on. >> Thanks for having me. >> We love having everyone from Ireland cause the accidents great traction. So, I appreciate you coming on. Have a beer later at the pub. New York's got to lot of great Irish pubs. In all seriousness, we've had Actian on before. Mike Hoskins has been on. We had Jeff Veis on yesterday giving us the marketing angle of hybrid data that you guys are doing. What's under the hood? Because Actian has a lot of technology in their portfolio through how you guys had your growth strategy. But now as the world wants to bring it together you're seeing some real critical trends. >> Emma: Right. >> A lot of application development where data's important. Huge amount of security challenges. People are trying to build out and bring security out of IT. And then you've got all this data covering stuff. That's just on the top line. Then you got IOT. So, people are busy. Their plates are full, and data's the center of it. So, what are you guys doing to bring all of Actian together? >> Emma: That's a great question, perfect question for Actian. So, we have in Actian a number of products in the portfolio. And we believe that best fit product. So, if you're doing something like graph database, it doesn't make sense to put a Vector in Hadoop solution against that. And we've got the right fit technology for what we're doing. And for IOT we've got an embedded database that's as small as 30 megs. So, I've got PowerPoint files that are bigger than this database. You put it in a device, set it, it can run for 20 years. You never have to touch it. But all that data that's being generated typically you're generating it because you want, at some point, to be able to analyze it. And we've gone in the portfolio and Vector in Hadoop has the ability to take that data from the IOT sources and perform very high-speed analytics on that. So, the products that we have within the portfolio are focused around data integration, so pulling data into an environment where you're going to perform analysis or otherwise operationalize that data, data management. A lot of our customers are just doing CRM, ERP applications on our product platforms. And then the analytics is where I get really excited cause there's so much happening in the analytics world in terms of new types of applications being built, in terms of real time requirements, in terms of security and governance that you're talking about in reference in your question. And we've got a unique solution that can address all of those areas in our Vector in Hadoop products. So, it's interesting that we see the name Hadoop coming out of the show this week because we see that the focus on Hadoop kind of moving to the background and where the real focus is around the data and not so much-- >> And the business value. >> I hate to sound cliché about outcomes but we were joking on theCUBE yesterday and kind of can't coin the term, "Outcomes as a service." Which is kind of a goof on the whole, "It's about the outcomes." Which is a cliché in tech. But that really is the truth. At the end of the day, you've got a business goal. But the role of data now in real time is key. You're seeing people want real time. Not real time response with old data, they want the real data. So, people are starting to look at data as a really instrumental part of the development process. Similar with DevOps did with infrastructure as code, people want data to be like code. >> Emma: Exactly. >> And that is a hard >> Architectural challenge. So, if you go into your customer base what do you guys tell them? And I was going to the hybrid cloud as the marketing message. But I have challenged, I'm the CXO. I'm the CDO. I'm the CIO. I'm the CFO, COO, whatever the person making these huge, sweeping operational cost decisions. What's the architecture? Cause that's what people are working on right now. And how do you present that? >> Right. So, we recognize the fact that everybody's got a very distributed environment. And part of the message around hybrid data is that data can be generated pretty much any place. You may be generating data in the cloud with your own custom applications. You may be using salesforce.com or NetSuite or whatever. And you've got your on-premise sources of data generation. And what we provide in Actian is the ability to access all of that data in real time, and make it part of the applications that you're deploying that is going to be able to react in real time to changes. You don't want to be acting on yesterday's data because things have happened, things have moved on. So, the importance of real time is not lost on Actian. And all of these solutions that we bring together enable that real time analysis of what's happening in every part of the environment. So, it's hybrid in terms of the type of data that you're working with. It's hybrid in terms of it could be generated in the cloud, in any cloud or on-premise, and being able to pull all of that together an perform real time analysis is incredibly important to generate value from the data. >> Emma, I want to get your thoughts on a comment that I heard last night and then multiple times but the same pattern, they don't get it. "They" could be the venture capitalists as part of the startup. Or the customer has, "Oh, this is the way we do it." There's definitely things that are out there Silo's Legacy things that are-- Still not going away, and we know that. But how do you go into a customer saying look, there's a whole new way of doing things right now. It's not necessarily radical lift and shift or rip and replace. Whatever word you want to use. There's always a word that, you don't like rip and replace, we'll say lift and shift. It's the same thing, right? >> Right. >> You don't want to do a lot of incremental operational wholesale changes. >> Right. >> But you want to do incremental value now. How do you go in and say, "Look, this is the way you want to think about real time in your architecture." Because I don't necessarily want to change my operational mindset for the sake of Salesforce and all these different data sources. How do you guys have that conversation? >> So, Actian is unique in that we have a consumer base that goes back 20, 30 years. I personally will be at Actian 25 years in December. So, we've got customers that are running our I'd like to call them Legacy products, but they're products that powering their business every day of the week. And we've also got incredibly innovative product that we're on the bleeding edge. And what we've done in our recent release of Actian X is do combined bleeding edge technology with this more mature and proven technology. So, at Actian X you've got the OLTP database that was Ingres and now got rebranded because it's got new capabilities. And then we've taken the engine from Actian Vector product, and brought that into Actian X so that you can do in real time analysis of your OLTP data. And we act in real time to changes in the data. And it's interesting that you talk about real time because it means different things to different people. So, if you're talking to somebody doing risk analysis, real time is milliseconds. If you're talking to some customers, real time is yesterday's data and that's fine. And what we've done with Actian X is to provide that ability to determine for yourself what real time means to you and to provide a solution that enables you to respond in real time. Now, bringing analytics into what is a more traditional OLTP database, and kind of demonstrating for them some of the new capabilities it enables and opens up other opportunities as far as we can have conversations about maybe backing up that dataset to the cloud. Somebody that may have been risk averse and not looking at cloud all of a sudden is looking at cloud, looking at analytics, and then kind of opening up new opportunities for us. And new opportunities for them cause the data, as they say, is the new oil. >> That's great, great. And you guys have a good customer base to draw from. So, you've got to bring in the shiny new toy but make it work with existing. So, it sounds like you been like an extraction layer that you're building on tech that was very useful and is useful, by decoupling it with new software that adds value. Is it an extraction layer of sorts? >> We don't think of it as an extraction layer but certainly one could think of it that way because it's ... Well, yeah it's-- >> John: It's a product. You basically take the old product and bring new stuff to it. >> Exactly. >> Okay, so I got to ask you about the trend around IOT. Because IOT is one of those things right now that's super hype. And I think it's going to be even more hype. But security has been a big problem and I hear a lot honestly, certainly IOTs on the agenda. Industrial IOT is kind of the low-hanging fruit. They go to that first. But no one wants to be the next Equifax. So, there's a lot of security stuff that causes, plus there's other things going on they got to take care of. How do you guys talk about the security equation where you can come in and put in a reliable workable solution and still make the customer's feel like they're moving the ball down the field. >> So, that's one of the benefits that we have of being in the industry for as long as we have. We have very deep understanding as to what security requirements are. In terms of providing capabilities within the product to do things like control who can access what data and to what degree. Can they update it? Can they only read it? Providing the ability to encrypt the data. So, for many usecases the data is so sensitive that you'd always want to encrypt it when it's stored. You'd want any traffic coming in and out of the environment to be encrypted. Being able to audit everything that's happening in the environment, who's issuing what queries and from where and to set alarms or something if somebody attempts to access data that they shouldn't be attempting to access. So, taking all of those capabilities together, we're then able to look at things like GDPR. What are the requirements for securing the data? And we've got all the capabilities within the product. And we've got the credibility cause we've been doing this for 30 years, that we can secure these environments. We can conform to the various standards and mandates that are put in place for data security. So, we have a very strong story to tell-- >> John: What is your position >> John: On GDPR? Obviously, you've got a super important, I call it the Y2K that actually is real cause you have there compliance issues. There's a lot of, obviously, political things going on but this is a real problem, about to move fast as a solution. What are you guys offer there? >> Equifax was a prime example of why GDPR is incredibly important. So, for Actian, and you know, I talked about the capabilities we provide with regard to securing data, and secure access to that data. And when it comes to GDPR, a lot of it is around process. So, what we're doing is guiding our customers and making sure that they have secure processes in place. Putting all of the smarts into the technology, and then having somebody doing an offline backup on a CD that they leave on a seat on the train which has, in the past, been a source of data breeches, is an issue with process and not with technology. So, we're helping with that. And helping in educating-- >> John: Equifax had some >> BPN issues but also, I mean, I haven't reported on this yet also have confirmed that there were state actors involved, foreign actors penetrating in through their franchise relationships. So, in partnering in an open internet these days you need to understand who the partners are even if they're in the network. >> Absolutely. And that's why this whole idea of providing all of the capabilities required for data security including auditing, who's coming in. So, failed attempts to get into the system should be reported as problems. And that's a capability that we have within the database. >> So, you've been at Actian for 25 years, I did not know. That's cool. Good folks over there. I've been to the office a few times. I'm sure you got a good healthy customer base but for the folks that don't know Actian. What's the pitch from your standpoint? Not the marketing pitch hybrid data, I get that. I mean, what should they know about you guys. What is the problem that you saw? What do you bring to the table? From an engineering perspective, how do you differentiate? >> So, my primary focus is around high-speed analytics. And so, Actian enables the fastest SQL access to data, on Hadoop and off of Hadoop, proven through benchmarks. So, high-speed analytics is incredibly important. But for Actian, we're unique in having this 30 year history where we understand what it is to run 24/7, mission critical operational databases. So, Actian's known for products like Ingres, like Psql, and being able to analyze data that's operationalized but then also bringing in new data sources. Cause that's where things are really going. But people want to choose the best application whether it's in the cloud or on-premise, it doesn't matter. It's the best application for their need. And being able to pull all of that data together, and for operational purposes, and for analytics purposes is incredibly important. And Actian enables all of that. >> And that's where the hybrid is really clever and smart because you got the consumption side and the creation side, and data integration isn't a project, it's real. It just happens. >> Emma: Right. >> So, you want to enable that. I can see that would be a key benefit. Certainly as, whether these decentralized apps get more traction, you're going to start to see more immutable things transactions happening. Blockchain clearly points to that direction of the market where that's cool. Distributed computing has been around for awhile but now decentralized we know how to behave there. So, we're seeing some apps that will probably be rewritten for that. But again, if architected properly that should be a problem. >> Right, exactly. And we don't want anybody to have to rewrite apps. What we want to be able to do is to provide a platform where the data that you need is available. >> John: Yeah, they're called Dapps for decentralized apps. It's a whole new wave coming, it's not being talked about here at the show. We are on, obviously, at Silicon Angle and Wikibon are those trends as we're riding the big wave. Okay, Em, I want to ask you a final question. Kind of take your Actian hat off, put your Irish techie hat on, and let's get down and dirty on what the main problem in the industry is right now. If you look back and kind of go to the balcony if you will, look at the stage of the industry, obviously Hadoop is now in the background. It's an element of the bigger picture. We're seeing, we were commenting yesterday that these customers have these tool sheds of all these tools they've bought. They bought a hammer that wants to be a lawnmower, right? It's just like they have their tool platforms are being pitched at them. There's a lot of confusion. What's the main problem that the industry's trying to solve? If you look at it, if you can put the dots together. What is the big problem that needs to be solved, that the industry should be solving? >> So, I think data is every place, right? And there's not a whole lot of discipline around corralling that and putting security around it. Being able to deploy security policies across data regardless of where it's deployed or sourced. So, I think that's probably the biggest challenge is bringing compute to the data and pulling all of that together. And that's the challenge that we're addressing. >> And so, the unification, if you will, people use that word, all unifying data. What does that actually mean? You guys call it hybrid data which means you have some flexibility if you need it. >> Emma: Right. >> All right, cool. Emma, thanks so much for coming on theCUBE. Really appreciate it. Congratulations on your success. And again, you guys got to a good spot. You got a broad portfolio, you're bringing together with hybrid data. Best of luck. We'll keep in touch. Emma McGrattan here, the Senior Vice President of Engineering at Actian here on theCUBE. More live coverage here in New York City from theCUBE's coverage of Big Data NYC after this short break. (upbeat techno music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by Silicon Angle Media and the stories. hybrid data that you guys are doing. So, what are you guys doing to bring all of Actian together? So, the products that we have within the portfolio and kind of can't coin the term, "Outcomes as a service." So, if you go into your customer base and make it part of the applications that you're deploying Or the customer has, "Oh, this is the way we do it." You don't want to do a lot of incremental operational my operational mindset for the sake of Salesforce And it's interesting that you talk about real time And you guys have a good customer base to draw from. but certainly one could think of it that way and bring new stuff to it. Industrial IOT is kind of the low-hanging fruit. So, that's one of the benefits that we have I call it the Y2K that actually is real Putting all of the smarts into the technology, So, in partnering in an open internet these days all of the capabilities required for data security What is the problem that you saw? And so, Actian enables the fastest SQL access to data, And that's where the hybrid is really clever and smart So, you want to enable that. is to provide a platform where the data that you need What is the big problem that needs to be solved, And that's the challenge that we're addressing. And so, the unification, if you will, And again, you guys got to a good spot.

ENTITIES

Entity	Category	Confidence
Emma McGrattan	PERSON	0.99+
John	PERSON	0.99+
Emma	PERSON	0.99+
20 years	QUANTITY	0.99+
Mike Hoskins	PERSON	0.99+
John Furrier	PERSON	0.99+
Actian	ORGANIZATION	0.99+
Equifax	ORGANIZATION	0.99+
Ireland	LOCATION	0.99+
New York City	LOCATION	0.99+
December	DATE	0.99+
Silicon Angle Media	ORGANIZATION	0.99+
25 years	QUANTITY	0.99+
30 years	QUANTITY	0.99+
yesterday	DATE	0.99+
30 year	QUANTITY	0.99+
20	QUANTITY	0.99+
Jeff Veis	PERSON	0.99+
fifth year	QUANTITY	0.99+
PowerPoint	TITLE	0.99+
New York	LOCATION	0.99+
Actian X	ORGANIZATION	0.99+
30 megs	QUANTITY	0.99+
Actian Vector	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Ingres	ORGANIZATION	0.99+
this week	DATE	0.99+
Wikibon	ORGANIZATION	0.98+
one	QUANTITY	0.98+
last night	DATE	0.98+
SQL	TITLE	0.97+
theCUBE	ORGANIZATION	0.97+
Strata Hadoop	TITLE	0.97+
Vector	ORGANIZATION	0.95+
Y2K	ORGANIZATION	0.95+
Hadoop	TITLE	0.95+
DevOps	TITLE	0.95+
NYC	LOCATION	0.94+
NetSuite	TITLE	0.92+
Silicon Angle	ORGANIZATION	0.91+
Irish	OTHER	0.9+
2017	DATE	0.89+
2017	EVENT	0.88+
Psql	TITLE	0.86+
Salesforce	ORGANIZATION	0.86+
first	QUANTITY	0.85+
Strata Data	TITLE	0.84+

Basil Faruqui, BMC Software | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan its theCUBE. Covering BigData New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. >> His name is Jim Kobielus. >> Jim: That right, John Furrier is actually how I pronounce his name for the record. But he is Basil Faruqui. >> Basil Faruqui who's the solutions marketing manager at BMC, welcome to theCUBE. >> Basil: Thank you, good to be back on theCUBE. >> So, first of all, I heard you guys had a tough time in Houston, so hope everything's getting better and best wishes. >> Basil: Definitely in recovery mode now. >> Hopefully that can get straightened out. What's going on BMC, give us a quick update and in context to BigData NYC what's happening, what is BMC doing in the the big data space now? The AI space now, the IoT space now, the cloud space? >> Like you said you know the data space, the IoT space. the AI space. There are four components of this entire picture that literally haven't changed since the beginning of computing. If you look at those four components of a data pipeline a suggestion, storage. processing and analytics. What keeps changing around it is the infrastructure, the types of data, the volume of data and the applications that surround it. The rate of change has picked up immensely over the last few years with Hadoop coming into the picture, public cloud providers pushing it. It's obviously created a number of challenges, but one of the biggest challenges that we are seeing in the market and we're helping customers address is the challenge of automating this. And obviously the benefit of automation is in scalability as well as reliability. So when you look at this rather simple data pipeline, which is now becoming more and more complex. How do you automate all of this from a single point of control? How do you continue to absorb new technologies and not re-architect your automation strategy every time. Whether it's Hadoop, whether it's bringing in machine learning from a cloud provider. And that is the the issue we've been solving for customers. >> All right, let me jump into it. So first of all you mention some things some things that never change, ingestion storage, and what was the third one? >> Ingestions, storage, processing and eventual analytics. >> So OK, so that's cool, totally buy that. Now if you move and say hey okay so you believe that's standard but now in the modern era that we live in, which is complex, you want breadth of data, and also you want the specialization when you get down the machine learning. That's highly bound, that's where the automation it is right now. We see the trend essentially making that automation more broader as it goes into the customer environments. >> Basil: Correct. >> How do you architect that? If I'm a CXO to I'm a CDO, what's in it for me? How do I architect this because that's really the number one thing is I know what the building blocks are but they've changed in their dynamics to the marketplace. >> So the way I look at it is that what defines success and failure, and particularly in big data projects, is your ability to scale. If you start a pilot and you spend, you know, three months on it and you deliver some results. But if you cannot roll it out worldwide, nationwide, whatever it is essentially the project has failed. The analogy often give is Walmart has been testing the pick up tower, I don't know if you seen, so this is basically a giant ATM for you to go pick up an order that you placed online. They're testing this at about hundred stores today. Now that's a success and Walmart wants to roll this out nationwide. How much time do you think their IT departments can have? Is this is a five year project, ten year project? No, the management's going to want this done six months, ten months. So essentially, this is where automation becomes extremely crucial because it is now allowing you to deliver speed to market and without automation you are not going to be able to get to an operational stage in a repeatable and reliable manner. >> You're describing a very complex automation scenario. How can you automate in a hurry without sacrificing you know, the details of what needs to be, In other words, you seem to call for re purposing or reusing prior automation scripts and rules and so forth. How how can the Walmart's of the world do that fast, but also do it well? >> So we do it we go about it in two ways. One is that out of the box we provide a lot of pre built integrations to some of the most commonly used systems in an enterprise. All the way up from the mainframes, Oracle's, SAP's Hadoop, Tableau's, of the world. They're all available out of the box for you to quickly reuse these objects and build an automated data pipeline. The other challenge we saw, and particularly when we entered the big data space four years ago, was that the automation was something that was considered close to the project becoming operational. And that's where a lot of rework happened because developers have been writing their own scripts, using point solutions. So we said all right, it's time to shift automation left and allow companies to build automation as an artifact very early in the development lifecycle. About a month ago we released what we call Control-M Workbench which is essentially a Community Edition of Control-M targeted towards developers. So that instead of writing their own scripts they can use a Control-M in a completely offline manner without having to connect to an enterprise system. As they build and test and iterate, they're using Control-M to do that. So as the application progresses the development lifecycle, and all of that work can then translate easily into an Enterprise Edition of Control-M. >> So quickly, just explain what shift-left means for the folks that might not know software methodologies, left political or left alt-right, this is software development so please take a minute explain what shift-left means, and the importance of it. >> Correct, so the if you if you think of software development and as a straight line continuum you can start with building some code, you will do some testing, then unit testing, than user acceptance testing. As it moves along this chain, there was a point right before production where all of the automation used to happen. You know, developers would come in and deliver the application to ops, and ops would say, well hang on a second all this CRON tab and all these other point solutions have been using for automation, that's not what we use in production. And we need you to now. >> To test early and often. >> Test early and often. The challenge was the developers, the tools they use, we're not the tools that were being used on the production end of the cycle. And there was good reason for it because developers don't need something really heavy and with all the bells and whistles early in the development lifecycle. Control-M Workbench is a very light version which is targeted at developers and focuses on the needs that they have when they're building and developing as the application progresses through its life cycle. >> How much are you seeing Waterfall and then people shifting-left becoming more prominent now. What percentage of your customers have moved to Agile and shifting-left percentage wise? >> So we survey our customers on a regular basis. In the last survey showed that 80% of the customers have either implemented a more continuous integration delivery type of framework, or are in the process of doing it. And that's the other. >> And getting upfront costs as possible, a tipping point is reached. >> What is driving all of that is the need from the business, you know, the days of the five year implementation timelines are gone. This is something that you need to deliver every week, two weeks, and iteration. And we have also innovated in that space and the approach we call Jobs-as-Code where you can build entire, complex data pipelines in code formats so that you can enable the automation in a continuous integration and delivery framework. >> I have one quick question, Jim, and then I'll let you take the floor and got to learn to get a word in soon. But I have one final question on this BMC methodology thing. You guys have a history obviously BMC goes way back. Remember Max Watson CEO, and then in Palm Beach back in 97 we used to chat with him. Dominated that landscape, but we're kind of going back to a systems mindset, so the question for you is how do you view the issue of the this holy grail, the promised land of AI and machine learning. Where, you know, end-to-end visibility is really the goal, right. At the same time, you want bounded experiences at root level so automation can kick in to enable more activity. So it's a trade off between going for the end-to-end visibility out of the gate, but also having bounded visibility and data to automate. How do you guys look at that market because customers want the end-to-end promise, but they don't want to try to get there too fast as a dis-economies of scale potentially. How do you talk about that? >> And that's exactly the approach we've taken with Control-M Workbench the Community Edition. Because early on you don't need capabilities like SLA management and forecasting and automated promotion between environments. Developers want to be able to quickly build, and test and show value, OK. And they don't need something that, as you know, with all the bells and whistles. We're allowing you to handle that piece in that manner, through Control-M Workbench. As things progress, and the application progresses, the needs change as well. Now I'm closer to delivering this to the business, I need to be able to manage this within an SLA. I need to be able to manage this end-to-end and connect this other systems of record and streaming data and click stream data, all of that. So that we believe that there it doesn't have to be a trade off. That you don't have to compromise speed and quality and visibility and enterprise grade automation. >> You mention trade-offs so the Control-M Workbench the developer can use it offline, so what amount of testing can they possibly do on a complex data pipeline automation, when it's when the tool is off line? I mean it simply seems like the more development they do off line, the greater the risk that it simply won't work when they go into production. Give us a sense for how they mitigate that risk. >> Sure, we spent a lot of time observing how developers work and very early in the development stage, all they're doing is working off of their Mac or their laptop and they're not really connecting to any. And that is where they end up writing a lot of scripts because whatever code, business logic, that they've written the way they're going to make it run is by writing scripts. And that essentially becomes a problem because then you have scripts managing more scripts and as the the application progresses, you have this complex web of scripts and CRON tabs and maybe some open source solutions. trying to make, simply make, all of this run. And by doing this I don't know offline manner that doesn't mean that they're losing all of the other controlling capabilities. Simply, as the application progresses whatever automation that they've built in Control-M can seamlessly now flow into the next stage. So when you are ready take an application into production there is essentially no rework required from an automation perspective. All of that that was built can now be translated into the enterprise grade Control-M and that's where operations can then go in and add the other artifacts such as SLA management forecasting and other things that are important from an operational perspective. >> I'd like to get both your perspectives because you're like an analyst here. So Jim, I want you guys to comment, my question to both of you would be you know, looking at this time in history, obviously on the BMC side, mention some of the history. You guys are transforming on a new journey and extending that capability in this world. Jim, you're covering state of the art AI machine learning. What's your take of the space now? Strata Data which is now Hadoop World, which is, Cloudera went public, Hortonworks is now public. Kind of the big, the Hadoop guys kind of grew up, but the world has changed around them. It's not just about Hadoop anymore. So I want to get your thoughts on this kind of perspective. We're seeing a much broader picture in BigData NYC versus the Strata Hadoop, which seems to be losing steam. But, I mean, in terms of the focus, the bigger focus is much broader horizontally scalable your thoughts on the ecosystem right now. >> Let Basil answer first unless Basil wants me to go first. >> I think the reason the focus is changing is because of where the projects are in their life cycle. You know now what we're seeing is most companies are grappling with how do I take this to the next level. How do I scale, how do I go from just proving out one or two use cases to making the entire organization data driven and really inject data driven decision making in all facets of decision making. So that is, I believe, what's driving the change that we're seeing, that you know now you've gone from Strata Hadoop to being Strata Data, and focus on that element. Like I said earlier, these difference between success and failure is your ability to scale and operationalize. Take machine learning for example. >> And really it's not a hype market. Show me the meat on the bone, show me scale, I got operational concerns of security and whatnot. >> And machine learning you know that's one of the hottest topics. A recent survey I read which polled a number of data scientists, it revealed that they spent about less than 3% of their time in training the data models and about 80% of their time in data manipulation, data transformation and enrichment. That is obviously not the best use of the data scientists time, and that is exactly one of the problems we're solving for our customers around the world. >> And it needs to be automated to the hilt to help them to be more productive delivering fast results. >> Ecosystem perspective, Jim whats you thoughts? >> Yes everything that Basil said, and I'll just point out that many of the core use cases for AI are automation of the data pipeline. You know it's driving machine learning driven predictions, classifications, you know abstractions and so forth, into the data pipeline, into the application pipeline to drive results in a way that is contextually and environmentally aware of what's going on. The path, the history historical data, what's going on in terms of current streaming data to drive optimal outcomes, you know, using predictive models and so forth, in line to applications. So really, fundamentally then, what's going on is that automation is an artifact that needs to be driven into your application architecture as a re-purposeful resource for a variety of jobs. >> How would you even know what to automate? I mean that's the question. >> You're automating human judgment, your automating effort. Like the judgments that a working data engineer makes to prepare data for modeling and whatever. More and more that need can be automated because those are patterned, structured activities that have been mastered by smart people over many years. >> I mean we just had a customer on his with a glass company, GSK, with that scale, and his attitude is we see the results from the users then we double down and pay for it and automate it. So the automation question, it's a rhetorical question but this begs the question, which is you know who's writing the algorithms as machines get smarter and start throwing off their own real time data. What are you looking at, how do you determine you're going to need you machine learning for machine learning? You're going to need AI for AI? Who writes the algorithms for the algorithms? >> Automated machine learning is a hot hot, not only research focus, but we're seeing it more and more solution providers like Microsoft and Google and others, are going deep down doubling down and investments in exactly that area. That's a productivity play for data scientists. >> I think the data markets going to change radically in my opinion, so you're starting to see some things with blockchain some other things that are interesting. Data sovereignty, data governance are huge issues. Basil, just give your final thoughts for this segment as we wrap this up. Final thoughts on data and BMC, what should people know about BMC right now, because people might have a historical view of BMC. What's the latest, what should they know, what's the new Instagram picture of BMC? What should they know about you guys? >> I think what I would say people should know about BMC is that you know all the work that we've done over the last 25 years, in virtually every platform that came before Hadoop, we have now innovated to take this into things like big data and cloud platforms. So when you are choosing Control-M as a platform for automation, you are choosing a very very mature solution. An example of which is Navistar and their CIO is actually speaking at the keynote tomorrow. They've had Control-M for 15, 20 years and have automated virtually every business function through Control-M. And when they started their predictive maintenance project where there ingesting data from about 300 thousand vehicles today, to figure out when this vehicle might break and do predictive maintenance on it. When they started their journey they said that they always knew that they were going to use Control-M for it because that was the enterprise standard. And they knew that they could simply now extend that capability into this area. And when they started about three four years ago there were ingesting data from about a hundred thousand vehicles, that has now scaled over 325 thousand vehicles and they have not had to re-architect their strategy as they grow and scale. So, I would say that is one of the key messages that we are are taking to market, is that we are bringing innovation that has spanned over 25 years and evolving it. >> Modernizing it. >> Modernizing it and bringing it to newer platforms. >> Congratulations, I wouldn't call that a pivot, I'd call it an extensibility issue, kind of modernizing the core things. >> Absolutely. >> Thanks for coming and sharing the BMC perspective inside theCUBE here. On BigData NYC this is theCUBE. I'm John Furrier, Jim Kobielus here in New York City, more live coverage the three days we will be here, today, tomorrow and Thursday at BigData NYC. More coverage after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media how I pronounce his name for the record. Basil Faruqui who's the solutions marketing manager So, first of all, I heard you guys The AI space now, the IoT space now, the cloud space? And that is the the issue we've been solving So first of all you mention some things some things the specialization when you get down the machine learning. the number one thing is I know what the building blocks are the pick up tower, I don't know if you seen, How how can the Walmart's of the world One is that out of the box we provide for the folks that might not know software methodologies, Correct, so the if you if you think and developing as the application progresses How much are you seeing Waterfall And that's the other. And getting upfront costs as possible, What is driving all of that is the need from At the same time, you want bounded experiences And that's exactly the approach we've taken with I mean it simply seems like the more development and as the the application progresses, Kind of the big, the Hadoop guys kind of grew up, that we're seeing, that you know now you've gone Show me the meat on the bone, show me scale, of the data scientists time, and that is exactly And it needs to be automated to the hilt that many of the core use cases for AI are automation I mean that's the question. Like the judgments that a working data engineer makes So the automation question, it's a rhetorical question and more solution providers like Microsoft What's the latest, what should they know, is that you know all the work that we've done and bringing it to newer platforms. the core things. more live coverage the three days we will be here,

ENTITIES

Entity	Category	Confidence
Jim	PERSON	0.99+
Walmart	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Basil Faruqui	PERSON	0.99+
John Furrier	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Basil	PERSON	0.99+
Google	ORGANIZATION	0.99+
Houston	LOCATION	0.99+
New York City	LOCATION	0.99+
15	QUANTITY	0.99+
80%	QUANTITY	0.99+
Palm Beach	LOCATION	0.99+
one	QUANTITY	0.99+
ten months	QUANTITY	0.99+
five year	QUANTITY	0.99+
ten year	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
six months	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
three days	QUANTITY	0.99+
over 325 thousand vehicles	QUANTITY	0.99+
Mac	COMMERCIAL_ITEM	0.99+
both	QUANTITY	0.99+
One	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
three months	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
Thursday	DATE	0.99+
GSK	ORGANIZATION	0.99+
about 300 thousand vehicles	QUANTITY	0.99+
about 80%	QUANTITY	0.99+
today	DATE	0.99+
Midtown Manhattan	LOCATION	0.99+
SAP	ORGANIZATION	0.98+
one quick question	QUANTITY	0.98+
third one	QUANTITY	0.98+
Strata Hadoop	TITLE	0.98+
four years ago	DATE	0.98+
over 25 years	QUANTITY	0.98+
single point	QUANTITY	0.98+
about a hundred thousand vehicles	QUANTITY	0.97+
one final question	QUANTITY	0.97+
About a month ago	DATE	0.96+
Max Watson	PERSON	0.96+
Instagram	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.95+
four components	QUANTITY	0.95+
about hundred stores	QUANTITY	0.95+
first	QUANTITY	0.95+
two use cases	QUANTITY	0.95+
NYC	LOCATION	0.94+
Navistar	ORGANIZATION	0.94+
BMC Software	ORGANIZATION	0.93+
97	DATE	0.93+
Agile	TITLE	0.89+

Rob Bearden, Hortonworks & Rob Thomas, IBM | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE media, and its ecosystem sponsor. >> Okay, welcome back, everyone. We're here live in New York City for BigData NYC, our annual event with SiliconANGLE Media, theCUBE, and Wikibon, in conjunction with Strata Hadoop, which is now called Strata Data as that show evolves. I'm John Furrier, cohost of theCUBE, with Peter Burris, head of research for SiliconANGLE Media, and General Manager of Wikibon. Our next two guests are two legends in the big data industry, Rob Bearden, the CEO of Hortonworks, really one of the founders of the big data movement, you know, got Cloudaire and Hortonworks, really kind of built that out, and Rob Thomas, General Manager of IBM Analytics. Big-time investments have made both of them. Congratulations for your success, guys. Welcome back to theCUBE, great to see you guys! >> Great to see you. >> Great, yeah. >> And got an exciting partnership to talk about, as well. >> So, but let's do a little history, you guys, obviously, I want to get to that, and get clarified on the news in a second, but you guys have been there from the beginning, kind of looking at the market, developing it, almost from the embryonic state to now. I mean, what a changeover. Give a quick comparison of where we've come from and what's the current landscape now, because you have, it evolved into so much more. You got IOT, you got AI, you have a lot of things in the enterprise. You've got cloud computing. A lot of tailwinds for this industry. It's gotten bigger. It's become big and now it's huge. What's your thoughts, guys? >> You know I, so you look at arcs and really all this started with Hadoop, and Rob and I met early in the days of that. You kind of gone from the early few years is about optimizing operations. Hadoop is a great way for a company to become more efficient, take out costs in their data infrastructure, and so that put huge momentum into this area, and now we've kind of fast-forwarded to the point where now it's about, "So how "am I actually going to extract insight?" So instead of just getting operational advantages, how am I going to get competitive advantage, and that's about bringing the world of data science and machine learning, run it natively on Hadoop, that's the next chapter, and that's what Rob and I are working closely together on. >> Rob, your thoughts, too? You know, we've been talking about data in motion. You guys were early on in that, seeing that trend. Real time is still hot. Data is still the core asset people are trying to figure out and move from wrangling to actually enabling that data. >> Right. Well, you know, in the early days of Big Data, it was, to Rob's point, it was very much about bringing operational leverage and efficiency and being able to aggregate very siloed data sets, and unlocking that data and bringing it into a central platform. In the early days in resources, and Hadoop went to making Hadoop an enterprise-viable data platform, with security, governance, operations, management capability, that mirrored any of the proprietary transactional or EDW platforms, and what the lessons learned in that were, is that by bringing all that data together in a central data set, we now can understand what's happening with our customers, and with our other assets pre-transaction, and so they can become very prescriptive in engaging in new business models, and so what we've learned now is the further upstream we can get in the world of IOT and bring that data under management from the point of origination and be able to manage that all the way through its life cycle, we can create new business models with higher velocity of engagement and a lot more rapid value that gets created. It, though, creates a number of new challenges in all the areas of how you secure that data, how you bring governance across that entire life cycle from a common stream set. >> Well, let's talk about the news you guys have. Obviously, the partnership. Partnerships become the new normal in an open source era that we're living in. We're seeing open source software grow really exponentially in the forecast coming in the next five years and ten years and exponential growth in new code. Just new people coming on board, new developers, dev ops is mainstream. Partnerships are key for communities. 90% of the code is going to be open source, 10%, as they say, the Code Sandwich as Jim Zemlin, the executive director of Linux Foundation, wants to, and you're seeing that work. You guys have worked together with Apache Atlas. What's the news, what's the relationship with Hortonworks and IBM? Share the news. >> So, a lot of great work's been happening there, and generally in the open source community, around Apache Atlas, and making sure that we're bringing missing critical governance capabilities across the big data sets and environments. As we then get into the complexity of now multiple data lakes, multiple tiers of data coming from multiple sources, that brings a higher level of requirement in both the security and governance aspects, and that's where the partnership with IBM is continuing to drive Apache Atlas into mission critical enterprise viability, but then when we get into the distributed models and enterprise requirements, the IBM platforms leveraging Atlas and what we're doing together then take that into the mission critical enterprise capability. >> You got the open source, and now you got the enterprise. Rob, we've talked many times about the enterprise as a hard, hard environment to crack for say, a start up, but even now, they're becoming reliant on open source, but yet, they have a lot of operational challenges. How does this relate to the challenge of, you know, CIO and his staff, now new personas coming in, you seeing the data science role, you see it expanding from analytics to dev ops. A day of challenges. >> Look, enterprises are getting better at this. Clearly we've seen progress the last five years on that, but to kind of go back and link the points, there's a phrase I heard I like. It says, "There's no AI without IA," meaning information architecture. Fundamentally, what our partnership is about is delivering the right information architecture. So it's Hadoop federated with whatever you have in terms of warehouses and databases. We partner around IBM common sequel for that. It's meta data for your core governance because without governance you don't have compliance, you can't offer self-service analytics, so we are forming what I would call the fluid data layer for an enterprise that enables them to get to this future of AI, and my view is there's a stop in between, which is data science, machine learning, applications that are ready today that clients can put into production and improve the outcomes they're getting. That's what we're focused on right now is how do we take the information architecture we've been able to establish, and then help clients on this journey? That's what enterprises want, because that's how they're going to build differentiation in their businesses. >> But the definition of an information architecture is closest to applications, and maybe this informs your perspective, it's close to the applications that the business is running on. Goes back to your observation about, "We used to be focusing, optimizing operations." As you move away from those applications, your information architecture becomes increasingly diffuse. It's not as crystal clear. How do you drive that clarity, as the data moves to derived new applications? >> Rob and I have talked about this. I think we're at the dawn of probably a new era in application development. Much more agile, flexible applications that are taking advantage of data wherever it resides. We are really early in that. Right now we are in the let's actually put into practice, machine learning and data science, let's extract value the data we got, that will then inform a new set of applications, which is related to the announcements that Hortonworks made this week around data plane, which is looking at multi-cloud environments and how would you manage applications and data across those? Rob, you can speak to that better than I can, I think. >> Well, the data plan thing, this information architecture, I think you're 100% right on. The data that we're hearing from customers in the enterprise is, they see the IOT buzz, oh, of course they're going to connect with IOT devices down the road, but when they see the security challenges, when they see the operational challenges around hiring people to actually run the dev ops, they have to then re-architect. So there's certainly a conversation we see on what is the architecture for the data, but also a little bit bigger than that, the holistic architecture of, say, cloud. So a lot of people are like, trying to clean up their house, if you will, to be ready for this new era, and I think Wikibon, your private cloud report you guys put out really amplified that by saying, "Yeah, they see these trends, "but they got to kind of get their act together." They got to look at who the staff is, what the data architecture's going to be, what apps are being developed, so doing a lot more retrenching. Given that, if we agree, what does that mean for the data plane, and then your vision of having that data architecture so that this will be a solid foundational transition? >> I think we all hit on the same point, which is it is about enabling a next generation IT architecture, of which, sort of the X and the Y axis or network, and generally what Big Data's been able to do, and Hadoop specifically, was over the last five years, enabling the existing applications architected, and I like the term that's been coined by you, is they were known processes with known technology, and that's how applications in the last 20 years have been enabled. Big Data and Hadoop generally have unlocked that ability to now be able to move all the way out to the edge and incorporate IOT, data at rest, data in motion, on-prem and cloud hybrid architecture. What that's done is said, "Now we know how to build an "application that takes advantage of an event or an "occurrence and then can drive outcome in a variety of ways. "We don't have to wait for a static programming model "to automate a function." >> And in fact, if we are wait, we're going to fail. That's one of the biggest challenges. I mean, IBM, I will tell you guys, or I'll tell you, Rob, that one of the craziest days I've ever spent is I flew from Japan to New York City for the IBM Information Architecture Announcement back in like 1994, and it was the most painful two days I've ever experienced in my entire life. That's a long time ago. It's ancient history. We can't use information architecture as a way of slowing things down. What we need to be able to do is we need to be able to introduce technology that again, allows the clarity of information architecture close to these core applications to move, and that may involve things like machine learning itself being embedded directly into how we envision data being moved, how we envision optimization, how we envision the data plane working. So, as you guys think about this data plane, everybody ends up asking themselves, "Is there a natural place for data to be?" What's going to be centralized, what's going to be decentralized, and I'm asking you, is increasingly the data going to be decentralized but the governance and securities and policies that we put in place going to be centralized and that's what's going to inform the operation of the data plane? What do you guys think? >> It's our view, very specifically from Hortonworks' perspective, that we want to give the ability for the data to exist and reside wherever the physics dictate, whether that be on-prem, whether that be in the cloud, and we want to give the ability to process and take action on an event or an occurrence or drive and outcome as early in the cycle as possible. >> Describe what you mean by "early in the cycle." >> So, as we see conditions emerge. A machine part breaking down. A customer taking an action. A supply chain inventory outage. >> So as close as possible to the event that's generating the data. >> As it's being generated, or as the processes are leading up to the natural outcome and we can maybe disintermediate for a better outcome, and so, that means that we have to be able to engage with the data irrespective of where it is in its cycle, and that's where we've enabled, with data plane, the ability to extract out the requirement of where that data is, and to be able to have a common plane, pun intended, for the operations and managing and provisioning of the environment, for being able to govern that and secure it, which are increasingly becoming intertwined, because you have to deal with it from point of origin through point at rest. >> The new phrase, "The single plane of glass." All joking aside, I want to just get your thoughts on this, Rob, too. "What's in it for me? "I'm the customer. "Right now I have a couple challenges." This is what we hear from the market. "I need data consistency because things are happening in "real time; whatever events are going on with data, we know "more data's going to be coming out from the edge and "everywhere else, faster and more volume, so I need "consistency of my data, and I don't want "to have multiple data silos," and then they got to integrate the data, so on the application developer side, a dev ops-like ethos is emerging where, "Hey, if there's data being done, I need to integrate that "into my app in real time," so those are two challenges. Does the data plane address that concern for customers? That's the question. >> Today it enables the ops world. >> So I can integrate my apps into the data plane. >> My apps and my other data assets, irrespective of where they reside, on-prem, cloud, or out to the edge, and all points in between. >> Rob, for enterprise, is this going to be the single pane of glass for data governance? Is that how the vision that you guys see this, because that's a benefit. If that could happen, that's essentially one step towards the promised land, if you will, for more data flowing through apps and app developers. >> So let me reshape a little bit. There's two main problems that collectively we have to address for enterprises: one is they want to apply machine learning and data science at scale, and they're struggling with that, and two is they want to get the cloud, and it's not talked about nearly enough, but most clients are really struggling with that. Then you fast forward on that one, we are moving to a multi-cloud world, absolutely. I don't think any enterprise is going to standardize on a single cloud, that's pretty clear. So you need things like data plane that acknowledge it's a multi-cloud world, and even as you move to multi clouds, you want a single focus for your data governance, a single strategy for your data governance, and then what we're doing together with IBM Data Science Experience with Hortonworks, let's say, whatever data you have in there, you can now do your machine learning right where that data is. You don't need to move it around. You can if you want, but you don't have to move it around, 'cause it's built in, and it's integrated right into the Hadoop ecosystem. That solves the two main enterprise pain points, which is help me get the cloud, help me apply data science and machine learning. >> Well we'll have to follow up and we'll have to do just a segment just on that. I think multi-cloud is clearly the direction, but what the hell does that mean? If I run 365 on Azure, that's one app. If I run something else on Amazon, that's multiple clouds, not necessarily moving workloads across. So the question I want to ask here is, it's clear from customers they want single code bases that run on all clouds seamlessly so I don't have to scale up on things on Amazon, Azure, and Google. Not all clouds are created equal in how they do things. Storage, through ever, inside the data factories of how they process. That's a challenge. How do you guys see that playing out of, you have on-premise activities that have been bootstrapped. Now you have multiple clouds with different ways of doing things, from pipelining, ingestion and processing, and learning. How do you see that playing out? Clouds just kind of standardizing around data plane? >> There's also the complexity of even within the multi-clouds, you're going to have multiple tiers within the clouds, if you're running in one data center in Asia, versus one in Latin America, maybe a couple across the Americas. >> But as a customer, do I need to know the cloud internals of Amazon, Azure, and Google? >> You do. In a stand-alone world, yes you do. That's where we have to bring and abstract the complexity of that out, and that's the goal with data plane, is to be able to extract, whether it's, which tier it's in, on-prem, or whether it's on, irrespective of which cloud platform. >> But Rob Thomas, I really like the way you put it. There may be some other issues that users have to worry about, certainly there are some that we think, but the two questions of, "Where am I going to run the machine learning," and "How am I going to get that to the cloud appropriately," I really like the way you put that. At the end of the day, what users need to focus on is less where the application code is, and more where the data is, so that they can move the application code or they can move the work to the data. That's fundamentally the perspective. We think that businesses don't take their business to the cloud, they bring the cloud to their business. So, when you think about this notion of increasingly looking at a set of work that needs to be performed, where the data exists, and what acts you're going to take in that data, it does suggest that data is going to become more of a centerpiece asset within the business. How does some of the things that you guys are doing lead customers to start to acknowledge data as an asset so they're making the appropriate investments in their data as their business evolves, and partly in response to data as an asset? What do you think? >> We have to do our job to build to common denominators, and that's what we're doing to make this easy for clients. So today we announced the IBM integrated analytics system. Same code base on private cloud as on a hardware system as on public cloud, all of it federates to Hortonworks through common sequel. That's what clients need, 'cause it solves their problem. Click of a button, they can get the cloud, and by the way, on private cloud it's based on Kubernetes, which is aligned with what we have on public cloud. We're working with Hortonworks to optimize Yarn and Kubernetes working together. These are the meaty issues that if we don't solve it, then clients have to deal with the bag of bolts, and so that's the kind of stuff we're solving together. So think about it: one single code base for managing your data, federates to Hadoop, machine learning is built into the system, and it's based on Kubernetes, that's what clients want. >> And the containers is just great, too. Great cloud-native trend. You guys been great, active in there. Congratulations to both of you guys. Final question, get you guys the last word: How does the relationship between Hortonworks and IBM evolve? How do you guys see this playing out? More of the same? Keep integrating in code? Is there any new thing you see on the horizon that you're going to be knocking down in the future? >> I'll take the first shot. The goal is to continue to make it simple and easy for the customer to get to the cloud, bring those machine learning and data science models to the data, and make it easy for the consumption of the new next generation of applications, and continue to make our customer successful and drive value, but to do it through transparently enabling the technology platforms together, and I think we've acknowledged the things that IBM is extraordinarily good at, the things that Hortworks is good at, and bring those two together with virtually no overlap. >> Rob, you've been very partner-centric. Your thoughts on this partnership? >> Look, it's what clients want. Since we announced this, the results and the response has been fantastic, and I think it's for one simple reason. So, Hortonworks' mission, we all know, is open source, and delivering in the community. They do a fantastic job of that. We also know that sometimes, clients need a little bit more, and so, when you bring those two things together, that's what clients want. That's very different than what other people in the industry do that say, "We're going to create a proprietary wrapper "around your Hadoop environment and lock your data in." That's the opposite of what we're doing. We're saying we're giving you full freedom of open source, but we're enabling you to augment that with machine learning, data science capabilities. This is what clients want. That's why the partnership's working. I think that's why we've gotten the response that we have. >> And you guys have been multiple years into the new operating model of being much more aggressive within the Big Data community, which has now morphed into much larger landscape. You pleased with some of the results you're seeing on the IBM side and more coding, more involvement in these projects on your end? >> Yeah, I mean, look, we were certainly early on Spark, created a lot of momentum there. I think it actually ended up helping both of our interests in the market. We built a huge community of developers at IBM, which is not something IBM had even a few years ago, but it's great to have a relationship like this where we can continue to augment our skills. We make each other better, and I think what you'll see in the future is more on the governance side; I think that's the piece that's still not quite been figured out by most enterprises yet. The need is understood. The implementation is slow, so you'll see more from us collectively there. >> Well, congratulations in the community work you guys have done. I think the community's model's evolving mainstream as well. Open source will continue to grow. Congratulations. Rob Bearden and Rob Thomas here inside theCUBE, more coverage here in Big Data NYC with theCUBE, after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE media, of the big data movement, you know, almost from the embryonic state to now. You kind of gone from the early few years Data is still the core asset people are trying to figure out and be able to manage that all the way through its 90% of the code is going to be open source, and generally in the open source community, How does this relate to the challenge of, you know, CIO the fluid data layer for an enterprise that enables them to But the definition of an information architecture is the data we got, that will then inform a new set Well, the data plan thing, this information architecture, and that's how applications in the last 20 years of the data plane? to give the ability to process and take action on an event So, as we see conditions emerge. So as close as possible to the event and provisioning of the environment, and then they got to integrate the data, they reside, on-prem, cloud, or out to the edge, Is that how the vision that you guys see this, I don't think any enterprise is going to standardize So the question I want to ask here is, There's also the complexity of even within the of that out, and that's the goal with data plane, How does some of the things that you guys are doing and so that's the kind of stuff we're solving together. Congratulations to both of you guys. for the customer to get to the cloud, bring those machine Rob, you've been very partner-centric. and delivering in the community. on the IBM side and more coding, more involvement in these in the market. Well, congratulations in the community work

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
Japan	LOCATION	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Peter Burris	PERSON	0.99+
John Furrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Jim Zemlin	PERSON	0.99+
1994	DATE	0.99+
100%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Americas	LOCATION	0.99+
Wikibon	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Latin America	LOCATION	0.99+
two	QUANTITY	0.99+
Hortworks	ORGANIZATION	0.99+
Linux Foundation	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
New York City	LOCATION	0.99+
10%	QUANTITY	0.99+
both	QUANTITY	0.99+
Cloudaire	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
NYC	LOCATION	0.99+
two challenges	QUANTITY	0.99+
one	QUANTITY	0.99+
Midtown Manhattan	LOCATION	0.98+
two days	QUANTITY	0.98+
two main problems	QUANTITY	0.98+
Apache Atlas	ORGANIZATION	0.98+
first shot	QUANTITY	0.98+
one step	QUANTITY	0.98+
ibon	PERSON	0.98+
one app	QUANTITY	0.98+
Today	DATE	0.97+
this week	DATE	0.97+
two guests	QUANTITY	0.97+
today	DATE	0.97+
Yarn	ORGANIZATION	0.96+
BigData	ORGANIZATION	0.96+
SiliconANGLE media	ORGANIZATION	0.95+
Hortonworks'	PERSON	0.94+
single cloud	QUANTITY	0.94+

Day One Wrap | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering BigData New York City 2017. Brought to you by SiliconANGLE Media, and its ecosystem sponsors. >> Hello everyone, welcome back to our day one, at Big Data NYC, of three days of wall to wall coverage. This is theCUBE. I'm John Furrier, with my co-hosts Jim Kobielus and Peter Burris. We do this event every year, this is theCUBE's BigData NYC. It's our event that we run in New York City. We have a lot of great content, we have theCUBE going live, we don't go to Strata anymore. We do our own event in conjunction, they have their own event. You can go pay over there and get the booth space, but we do our media event and attract all the influencers, the VIPs, the executives, the entrepreneurs, we've been doing it for five years, we're super excited, and thank our sponsors for allowing us to get here and really appreciate the community for continuing to support theCUBE. We're here to wrap up day one what's going on in New York, certainly we've had a chance to check out the Strata situations, Strata Data, which is Cloudera, and O'Reilly, mainly O'Reilly media, they run that, kind of old school event, guys. Let's kind of discuss the impact of the event in context to the massive growth that's going outside of their event. And their event is a walled garden, you got to pay to get in, they're very strict. They don't really let a lot of people in, but, okay. Outside of that the event it going global, the activity around big data is going global. It's more than Hadoop, we certainly thought about that's old news, but what's the big trend this year? As the horizontally scalable cloud enters the equation. >> I think the big trend, John, is the, and we've talked about in our research, is that we have finally moved away from big data, being associated with a new type of infrastructure. The emergence of AI, deep learning, machine learning, cognitive, all these different names for relatively common things, are an indications that we're starting to move up into people thinking about applications, people thinking about services they can use to get access, or they can get access to build their applications. There's not enough skills. So I think that's probably the biggest thing is that the days of failure being measured by whether or not you can scale your cluster up, are finally behind us. We're using the cloud, other resources, we have enough expertise, the technologies are becoming simpler and more straightforward to do that. And now we're thinking about how we're going to create value out of all of this, which is how we're going to use the data to learn something new about what we're doing in the organization, combine it with advanced software technologies that actually dramatically reduce the amount of work that's necessary to make a decision. >> And the other trend I would say, on top of that, just to kind of put a little cherry on top of that, kind of the business focus which is again, not the speeds and feeds, although under the hood, lot of great innovation going on from deep learning, and there's a ton of stuff. However, the conversation is the business value, how it's transforming work and, but the one thing that nobody's talking about is, this is why I'm not bullish on these one shows, one show meets all kind of thing like O'Reilly Media does, because there's multiple personas in a company now in the ecosystem. There are now a variety of buyers of some products. At least in the old days, you'd go talk to the IT CIO and you're in. Not anymore. You have an analytics person, a Chief Data Officer, you might have an IT person, you might have a cloud person. So you're seeing a completely broader set of potential buyers that are driving the change. We heard Paxata talk about that. And this is a dynamic. >> Yeah, definitely. We see a fair amount of, what I'm sensing about Strata, how it's evolving these big top shows around data, it's evolving around addressing a broader, what we call maker culture. It's more than software developers. It's business analysts, it's the people who build the hardware for the internet of things into which AI and machine learning models are being containerized and embedded. I've, you know, one of the takeaways from today so far, and the keynotes are tomorrow at Strata, but I've been walking the atrium at the Javits Center having some interesting conversations, in addition, of course, to the ones we've been having here at theCUBE. And what I'm notic-- >> John: What are those hallway conversations that you're having? >> Yeah. >> What's going on over there? >> Yeah, what I've, the conversations I've had today have been focused on, the chief trend that I'm starting to sense here is that the productionization of the machine learning development process or pipeline, is super hot. It spans multiple data platforms, of course. You've got a bit of Hadoop in the refinery layer, you've got a bit of in-memory columnar databases, like the Act In discussed at their own, but the more important, not more important, but just as important is that what users are looking at is how can we build these DevOps pipelines for continuous management of releases of machine learning models for productionization, but also for ongoing evaluation and scoring and iteration and redeployment into business applications. You know there's, I had conversations with Mapbar, I had conversations with IBM, I mean, these were atrium conversations about things that they are doing. IBM had an announcement today on the wires and so forth with some relevance to that. And so I'm seeing a fair, I'm hearing, I'm sensing a fair amount of It's The Apps, it's more than just Hadoop. But it's very much the flow of these, these are the core pieces, like AI, core pieces of intellectual property in the most disruptive applications that are being developed these days in all manner, in business and industry in the consumer space. >> So I did not go over to the show floor yet, I've not been over to the Atrium. But, I'll bet you dollars to donuts this is indicative of something that always happens in a complex technology environment. And again, this is something we've thought about particularly talked about here on theCUBE, in fact we talked to Paxata about it a little bit as well. And that is, as an organization gains experience, it starts to specialize. But there's always moments, there' always inflection points in the process of gaining that experience. And by that, or one of the indications of that is that you end up with some people starting to specialize, but not quite sure what they're specializing in yet. And I think that's one of the things that's happening right now is that the skills gap is significant. At the same time that the skills gap is being significant, we're seeing people start to declare their specializations that they don't have skills, necessarily, to perform yet. And the tools aren't catching up. So there's still this tension model, open source, not necessarily focusing on the core problem. Skills looking for tools, and explosion in the number of tools out there, not focused on how you simplify, streamline, and put into operation. How all these things work together. It's going to be an interesting couple of years, but the good news, ultimately, is that we are starting to see for the first time, even on theCUBE interviews today, the emergence of a common language about how we think about the characteristics of the problem. And I think that that heralds a new round of experience and a new round of thinking about what is all the business analysts, the data scientists, the developer, the infrastructure person, business person. >> You know, you bring up that comment, those comments, about the specialists and the skills. We talked, Jim and I talked on the segment this morning about tool shed. We're talking about there are so many tools out there, and everyone loves a good tool, a hammer. But the old expression is if you're a hammer, everything looks like a nail, that's cliche. But what's happened is there are a plethora of tools, right, and tools are good. Platforms are better. As people start to replatformize everything they could have too many tools. So we asked the C Chief Data Officer, he goes yeah, I try to manage the tool tsunami, but his biggest issue was he buys a hammer, and it turns into a lawnmower. That's a vendor mentality of-- >> What a truck. Well, but that's a classic example of what I'm talking about. >> Or someone's trying to use a hammer to mow the lawn right? Again, so this is what you're getting at. >> Yeah! >> The companies out there are groping for relevance, and that's how you can see the pretenders from the winners. >> Well, a tool, fundamentally, is pedagogical. A tool describes the way work is going to be performed, and that's been a lot of what's been happening over the course of the past few years. Now, businesses that get more experience, they're describing their own way of thinking throughout a problem. And they're still not clear on how to bring the tools together because the tools are being generated, put into the marketplace by an expanding array of folks and companies, and they're now starting to shuffle for position. But I think ultimately, what we're going to see happen over the next year and I think this is an inflection point, going back to this big tent notion, is the idea that ultimately we are going to see greater specialization over the next few years. My guess is that this year will probably, should get better, or should get bigger, I'm not certain it will because it's focused on the problems that we already solved and not moving into the problems that we need to focus on. >> Yeah, I mean, a lot of the problems I have with the O'Reilly show is that they try to throw default leadership out there, and there's some smart people that go to that, but the problem is is that it's too monetization, they try to make too much money from the event when this action's happening. And this is where the tool becomes, the hammer becomes a lawnmower, because what's happening is that the vendor's trying to stay alive. And you mentioned this earlier, to your point, the customers that are buyers of the technology don't want to have something that's not going to be a fit, that's going to be agile from us. They don't want the hammer that they bought to turn into something that they didn't buy it for. And sometimes, teams can't make that leap, skillset-wise, to literally pivot overnight. Especially as a startup. So this is where the selection of the companies makes a big difference. And a lot of the clients, a lot of customers that we're serving on the end user side are reaching the conclusion that the tools themselves, while important, are clearly not where the value is. The value is in how they put them together for their business. And that's something that's going to have to, again, that's a maturation process, roles, responsibilities, the chief data officer, they're going to have a role in that or not, but ultimately, they're going to have to start finding their pipelines, their process for ingestion out to analysis. >> Let me get your reaction, you guys, your reactions to this tape. Because one of the things that I heard today, and I think this validates a bigger trend as we talk about the landscape of the markup from the event to how people are behaving and promoting and building products and companies. The pattern that I'm hearing, we said it multiple times on theCUBE today and one from the guy who's basically reading the script, is, in his interview, explaining 'cause it's so factual, I asked him the straight-up question, how do you deal with suppliers? What's happening is the trend is don't show me sizzle. I want to see the steak. Don't sell me hype, I got too many business things to work on right now, I need to nail down some core things. I got application development, I got security to build out big time, and then I got all those data channels that I need, I don't have time for you to sell me a hammer that might not be a hammer in the future! So I need real results, I need real performance that's going to have a business impact. That is the theme, and that trumps the hype. I see that becoming a huge thing right now. Your thoughts, reactions, guys-- >> Well I'll start-- >> What's your reaction then? True or false on the trend? Be-- >> Peter: True! >> Get down to business. >> I'll say that much, true, but go ahead. >> I'll say true as well, but let me just add some context. I think a show like O'Reilly Strata is good up to a point, especially to catalyze an industry, a growing industry like big data's own understanding of it, of the value that all these piece parts, Hadoop and Spark and so forth, can add, can provide when deployed in a unit according to some emerging patterns, whatever. But at a certain point where a space like this becomes well-established, it just becomes a pure marketing event. And customers, at a certain point say, you know, I come here for ideas about things that I can do in my environ, my business, that could actually many ways help me to do new things. You know, you can't get that at a marketing-oriented, you can get that, as a user, more at a research-oriented show. When it's an emerging market, like let's say Spark has been, like the Spark Summit was in the beginning, those are kind of like, when industries go through the phase those are sort of in the beginning, sort of research-focused shows where industry, the people who are doing the development of this new architecture, they talk ideas. Now I think in 2017, where we're at now, is what the idea is everybody's trying to get their heads around, they're all around AI, what the heck that is. For a show like an O'Reilly Ready show to have relevance in a market that's in this much ferment of really innovation around AI and deep learning, there needs to be a core research focus that you don't get at this point in the lifecycle of Strata, for example. So that's my take on what's going on. >> So, my take is this. And first of all, I agree with everything you said, so it's not in opposition to anything. Many years ago I had this thought that I think still is very true. And that is the value of industry, the value of infrastructure is inversely correlated with the degree to which anybody knows anything about it. So if I know a lot about my infrastructure, it's not creating a lot of business value. In fact, more often than not, it's not working, which is why people end up knowing more about it. But the problem is, the way that technology has always been sold is as a differentiated, some sort of value-add thing. So you end up with this tension. And this is an application domain, a very, very complex application domain like big data. The tension is, my tool is so great that, and it's differentiating all those other stuff, yeah but it becomes valuable to me if and only if nobody knows it exists. So I think, and one of the reasons why I bring this up, John, is many of the companies that are in the big data space today that are most successful are companies that are positioning themselves as a service. There's a lot of interesting SaaS applications for big data analysis, pipeline management, all the other things you can talk about, that are actually being rendered as a service, and not as a product. So that all you need to know is what the tool does. You don't need to know the tool. And I don't know that that's necessarily going to last, but I think it's very, very interesting that a lot of the more successful companies that we're talking to are themselves mere infrastructure SaaS companies. >> Because-- >> AtScale is interesting, though. They came in as a service. But their service has an interesting value proposition. They can allow you to essentially virtualize the data to play with it, so people can actually sandbox data. And if it gets traction, they can then double-down on it. So to me that's a freebie. To me, I'm a customer, I got to love that kind of environment because you're essentially giving almost a developer-like environment-- >> Peter: Value without necessarily-- >> Yeah, the cost, and the guy gets the signal from the marketplace, his customer, of what data resolves. To me that's a very cool scene. I don't, you saying that's bad, or? >> No, no, I think it's interesting. I think it's-- >> So you're saying service is-- >> So what I'm saying is, what I'm saying is, that the value of infrastructure is inversely proportional to the degree to which anybody knows anything about it. But you've got a bunch of companies who are selling, effectively, infrastructure software, so it's a value-add thing, and that creates a problem. And a lot of other companies not only have the ability to sell something as a service as opposed to a product, they can put the service froward, and people are using the service and getting what they need out of it without knowing anything about the tool. >> I like that. Let me just maybe possibly restate what you just said. When a market goes toward a SaaS go-to-market delivery model for solutions, the user, the buyer's focus is shifted away from what the solution can do, I mean, how it works under the cover. >> Peter: Quote, value-add-- >> To what it can do potentially for you. >> The business, that's right. >> But you're not going to, don't get distracted by the implementation details. You have then as a user become laser-focused on, wow, there's a bunch of things that this can do for me. I don't care how it works, really. You SaaS provider, you worry about that stuff. I can worry now about somehow extracting the value. I'm not distracted. >> This show, or this domain, is one of the domains where SaaS has moved, just as we're thinking about moving up the stack, the SaaS business model is moving down the stack in the big data world. >> All right, so, in summary, the stack is changing. Predictions for the next few days. What are we going to see come out of Strata Data, and our BigData NYC? 'Cause remember, this show was always a big hit, but it's very clear from the data on our dashboards, we're seeing all the social data. Microsoft Ignite is going on, and Microsoft Azure, just in the past few years, has burst on the scene. Cloud is sucking the oxygen out of the big data event. Or is it? >> I doubt it was sucking it out of the event, but you know, theCUBE is in, theCUBE is not at Ignite. Where's theCUBE right now? >> John: BigData NYC. >> No, it's here, but it's also at the Splunk show. >> John: That's true. >> And isn't it interesting-- >> John: We're sucking the data out of two events. >> Did a lot of people coming in, exactly. A lot of people coming-- >> We're live streaming in a streaming data kind of-- >> John just said we suck, there's that record saying that. >> We're sucking all the data. >> So we are-- >> We're sharing data. These videos are data-driven. >> Yeah, absolutely, but the point is, ultimately, is that, is that Splunk is an example of a company that's putting forward a service about how you do this and not necessarily a product focus. And a lot of the folks that are coming on theCUBE here are also going on to theCUBE down in Washington D.C., which is where the Splunk show's at. And so I think one of the things, one of the predictions I'll make, is that we're going to hear over the next couple of days more companies talk about their SaaS trash. >> Yeah, I mean I just think, I agree with you, but I also agree with the comments about the technology coming together. And here's one thing I want to throw on the table. I've gotten the sense a few times about connecting the dots on it, we'll put it out publicly for comment right now. The role that communities will play outside of developer, is going to be astronomical. I think we're seeing signals, certainly open-source communities have been around for a long time. They continue to grow shoulders of giants before them. Even these events like O'Reilly, which are a small community that they rely on is now not the only game in town. We're seeing the notion of a community strategy in things like Blockchain, you're seeing it in business, you're seeing people rolling out their recruitment to say, data scientists. You're seeing a community model developing in business, yes or no? >> Yes, but I would say, I would put it this way, John. That it's always been there. The difference is that we're now getting enough experience with things that have occurred, for example, collaboration, communal, communal collaboration in open-source software that people are now saying, and they've developed a bunch of social networking techniques where they can actually analyze how those communities work together, but now they're saying, hmm, I've figured out how to do an assessment analysis understanding that community. I'm going to see if I can take that same concept and apply it over here to how sales works, or how B-to-B engagement works, or how marketing gets conducted, or how sales and marketing work together. And they're discovering that the same way of thinking is actually very fruitful over there. So I totally agree, 100%. >> So they don't rely on other people's version of a community, they can essentially construct their own. >> They are, they are-- >> John: Or enabling their own. >> That's right, they are bringing that approach to thinking about a community-driven business and they're applying it to a lot of new ways, and that's very exciting. >> As the world gets connected with mobile and internet of things as we're seeing, it's one big online community. We're seeing things, I'm writing a post right now, what you could, what B-to-B markets should learn from the fake news problem. And that is content and infrastructure are now contextually tied together. >> Peter: Totally. >> And related. The payload of the fake news is also related to the gamification of the network effect, hence the targeting, hence the weaponization. >> Hey, we wrote the three Cs, we wrote a piece on the three Cs of strategy a year and a half ago. Content, community, context. And at the end of the day, the most important thing to what you're saying about, is that there is, you know, right now people talk about social networking. Social media, you think Facebook. Facebook is a community with a single context, stay in touch with your friends. >> Connections. >> Connections. But what you're really saying is that for the first time we're now going to see an enormous amount of technology being applied to the fullness of all the communities. We're going to see a lot more communities being created with the software, each driven by what content does, creates value, against the context of how it works, where the community's defined in terms of what do we do? >> Let me focus on the fact that bringing, using community as a framework for understanding how the software world is evolving. The software world is evolving towards, I've said this many times in my work about a resurge, the data scientists or data people, data science skills are the core developers in this new era. Now, what is data science all about at its heart? Machine learning, building, and training machine learning models. And so training machine learning models is everything towards making sure that they are fit for their predicted purpose of classification. Training data, where you get all the training data from to feed all, to train all these models? Where do you get all the human resources to label, to do the labeling of the data sets, and so forth, that you need communities, crowdsourcing and whatnot, and you need sustainable communities that can supply the data and the labeling services, and so forth, to be able to sustain the AI and machine learning revolution. So content, creating data and so forth, really rules in this new era, like-- >> The interest in machine learning is at an all-time high, I guess. >> Jim: Yeah, oh yeah, very much so. >> Got it, I agree. I think the social grab, interest grab, value grab is emerging. I think communities, content, context, communities are relevant. I think a lot of things are going to change, and that the scuttlebutt that I'm hearing in this area now is it's not about the big event anymore. It's about the digital component. I think you're seeing people recognize that, but they still want to do the face-to-face. >> You know what, that's right. That's right, they still want, let's put it this way. That there are, that the whole point of community is we do things together. And there are some things that are still easier to do together if we get together. >> But B-to-B marketing, you just can't say, we're not going to do events when there's a whole machinery behind events. Legion batch marketing, we call it. There's a lot of stuff that goes on in that funnel. You can't just say hey, we're going to do a blog post. >> People still need to connect. >> So it's good, but there's some online tools that are happening, so of course. You wanted to say something? >> Yeah, I just want to say one thing. Face to face validates the source of expertise. I don't really fully trust an expert, I can't in my heart engage with them, 'til I actually meet them and figure out in person whether they really do have the goods, or whether they're repurposing some thinking that they got from elsewhere and they gussy it up. So face, there's no substitute for face-to-face to validate the expertise. The expertise that you value enough to want to engage in your solution, or whatever it might be. >> Awesome, I agree. Online activities, the content, we're streaming the data, theCUBE, this is our annual event in New York City. We've got three days of coverage, Tuesday, Wednesday, Thursday, here, theCUBE in Manhattan, right around the corner from Strata Hadoop, the Javits Center of influencers. We're here with the VIPs, with the entrepreneurs, with the CEOs and all the top analysts from WikiBon and around the community. Be there tomorrow all day, day one wrap up is done. Thanks for watching, see you tomorrow. (rippling music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media, of the event in context to the massive growth is that the days of failure being measured by of potential buyers that are driving the change. and the keynotes are tomorrow at Strata, is that the productionization of the machine learning is that the skills gap is significant. But the old expression is if you're a hammer, of what I'm talking about. Again, so this is what you're getting at. and that's how you can see the pretenders from the winners. is the idea that ultimately we are going to see And a lot of the clients, a lot of customers from the event to how people are behaving of it, of the value that all these piece parts, And that is the value of industry, So to me that's a freebie. from the marketplace, his customer, of what data resolves. I think it's-- And a lot of other companies not only have the ability for solutions, the user, the buyer's focus To what it can do by the implementation details. is one of the domains where SaaS has moved, Cloud is sucking the oxygen out of the big data event. I doubt it was sucking it out of the event, but you know, Did a lot of people coming in, exactly. We're sharing data. And a lot of the folks that are coming on theCUBE here is now not the only game in town. and apply it over here to how sales works, of a community, they can essentially construct their own. and they're applying it to a lot of new ways, from the fake news problem. hence the targeting, hence the weaponization. And at the end of the day, the most important thing We're going to see a lot more communities being created that can supply the data and the labeling services, is at an all-time high, I guess. and that the scuttlebutt that I'm hearing And there are some things that are still easier to do There's a lot of stuff that goes on in that funnel. that are happening, so of course. The expertise that you value enough to want to engage and around the community.

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
Peter Burris	PERSON	0.99+
O'Reilly	ORGANIZATION	0.99+
Jim	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
O'Reilly Media	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
2017	DATE	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
Peter	PERSON	0.99+
Washington D.C.	LOCATION	0.99+
New York	LOCATION	0.99+
tomorrow	DATE	0.99+
five years	QUANTITY	0.99+
two events	QUANTITY	0.99+
100%	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
first time	QUANTITY	0.99+
today	DATE	0.99+
Wednesday	DATE	0.99+
a year and a half ago	DATE	0.99+
Thursday	DATE	0.99+
one	QUANTITY	0.99+
Spark Summit	EVENT	0.99+
three days	QUANTITY	0.99+
Tuesday	DATE	0.98+
Javits Center	LOCATION	0.98+
Splunk	ORGANIZATION	0.98+
Paxata	ORGANIZATION	0.98+
Facebook	ORGANIZATION	0.98+
next year	DATE	0.97+
this year	DATE	0.97+
SaaS	TITLE	0.97+
day one	QUANTITY	0.96+
NYC	LOCATION	0.96+
first	QUANTITY	0.96+
one thing	QUANTITY	0.96+
WikiBon	ORGANIZATION	0.95+
one show	QUANTITY	0.94+
one shows	QUANTITY	0.94+
BigData	ORGANIZATION	0.94+
Many years ago	DATE	0.93+
Strata	LOCATION	0.93+
Strata Hadoop	LOCATION	0.92+
each	QUANTITY	0.91+
three Cs	QUANTITY	0.9+
Javits Center	ORGANIZATION	0.89+
midtown Manhattan	LOCATION	0.88+
theCUBE	ORGANIZATION	0.87+
Strata	TITLE	0.87+
past few years	DATE	0.87+

Prakash Nanduri, Paxata | BigData NYC 2017

>> Announcer: Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and it's ecosystem sponsors. (upbeat techno music) >> Hey, welcome back, everyone. Here live in New York City, this is theCUBE from SiliconANGLE Media Special. Exclusive coverage of the Big Data World at NYC. We call it Big Data NYC in conjunction also with Strata Hadoop, Strata Data, Hadoop World all going on kind of around the corner from our event here on 37th Street in Manhattan. I'm John Furrier, the co-host of theCUBE with Peter Burris, Head of Research at SiliconANGLE Media, and General Manager of WikiBon Research. And our next guest is one of our famous CUBE alumni, Prakash Nanduri co-founder and CEO of Paxata who launched his company here on theCUBE at our first inaugural Big Data NYC event in 2013. Great to see you. >> Great to see you, John. >> John: Great to have you back. You've been on every year since, and it's been the lucky charm. You guys have been doing great. It's not broke, don't fix it, right? And so theCUBE is working with you guys. We love having you on. It's been a pleasure, you as an entrepreneur, launching your company. Really, the entrepreneurial mojo. It's really what it's all about. Getting access to the market, you guys got in there, and you got a position. Give us the update on Paxata. What's happening? >> Awesome, John and Peter. Great to be here again. Every time I come here to New York for Strata I always look forward to our conversations. And every year we have something exciting and new to share with you. So, if you recall in 2013, it was a tiny little show, and it was a tiny little company, and we came in with big plans. And in 2013, I said, "You know, John, we're going to completely disrupt the way business consumers and business analysts turn raw data into information and they do self-service data preparation." That's what we brought to the market in 2013. Ever since, we have gone on to do something really exciting and new for our customers every year. In '14, we came in with the first Apache Spark-based platform that allowed business analysts to do data preparation at scale interactively. Every year since, last year we did enterprise grade and we talked about how Paxata is going to be delivering our self-service data preparation solution in a highly-scalable enterprise grade deployment world. This year, what's super exciting is in addition to the recent announcements we made on Paxata running natively on the Microsoft Azure HDI Spark system. We are truly now the only information platform that allows business consumers to turn data into information in a multi-cloud hybrid world for our enterprise customers. In the last few years, I came and I talked to you and I told you about work we're doing and what great things are happening. But this year, in addition to the super-exciting announcements with Microsoft and other exciting announcements that you'll be hearing. You are going to hear directly from one of our key anchor customers, Standard Chartered Bank. 150-year-old institution operating in over 46 countries. One of the most storied banks in the world with 87,500 employees. >> John: That's not a start up. >> That's not a start up. (John laughs) >> They probably have a high bar, high bar. They got a lot of data. >> They have lots of data. And they have chosen Paxata as their information fabric. We announced our strategic partnership with them recently and you know that they are going to be speaking on theCUBE this week. And what started as a little experiment, just like our experiment in 2013, has actually mushroomed now into Michael Gorriz, and Shameek Kundu, and the entire leadership of Standard Chartered choosing Paxata as the platform that will democratize information in the bank across their 87,500 employees. We are going in a very exciting way, a very fast way, and now delivering real value to the bank. And you can hear all about it on our website-- >> Well, he's coming on theCUBE so we'll drill down on that, but banks are changing. You talk about a transformation. What is a teller? An Internet of Things device. The watch potentially could be a terminal. So, the Internet of Things of people changes the game. Are the ATMs going to go away and become like broadcast points? >> Prakash: And you're absolutely right. And really what it is about is, it doesn't matter if you're a Standard Chartered Bank or if you're a pharma company or if you're the leading healthcare company, what it is is that everyone of our customers is really becoming an information-inspired business. And what we are driving our customers to is moving from a world where they're data-driven. I think being data-driven is fine. But what you need to be is information-inspired. And what does that mean? It means that you need to be able to consume data, regardless of format, regardless of source, regardless of where it's coming from, and turn it into information that actually allows you to get inside in decisions. And that's what Paxata does for you. So, this whole notion of being information-inspired, I don't care if you're a bank, if you're a car company, or if you're a healthcare company today, you need to have-- >> Prakash, for the folks watching that might not know our history as you launched on theCUBE in 2013 and have been successful every year since. You guys have really deploying the classic entrepreneurial success formula, be fast, walk the talk, listen to customers, add value. Take a minute quickly just to talk about what you guys do. Just for the folks that don't know you. >> Absolutely, let's just actually give it in the real example of you know, a customer like Standard Chartered. Standard Chartered operates in multiple countries. They have significant number of lines of businesses. And whether it's in risk and compliance, whether it is in their marketing department, whether it's in their corporate banking business, what they have to do is, a simple example could be I want to create a customer list to be able to go and run a marketing campaign. And the customer list in a particular region is not something easy for a bank like Standard Charter to come up with. They need to be able to pull from multiple sources. They need to be able to clean the data. They need to be able to shape the data to get that list. And if you look at what is really important, the people who understand the data are actually not the folks in IT but the folks in business. So, they need to have a tool and a platform that allows them to pull data from multiple sources to be able to massage it, to be able to clean it-- >> John: So, you sell to the business person? >> We sell to the business consumer. The business analyst is our consumer. And the person who supports them is the chief data officer and the person who runs the Paxata platform on their data lake infrastructure. >> So, IT sets the data lake and you guys just let the business guys go to town on the data. >> Prakash: Bingo. >> Okay, what's the problem that you solve? If you can summarize the problem that you solve for the customers, what is it? >> We take data and turn it into information that is clean, that's complete, that's consumable and that's contextual. The hardest problem in every analytical exercise is actually taking data and cleaning it up and getting it ready for analytics. That's what we do. >> It's the prep work. >> It's the prep work. >> As companies gain experience with Big Data, John, what they need to start doing increasingly is move more of the prep work or have more of the prep work flow closer to the analyst. And the reason's actually pretty simple. It's because of that context. Because the analyst knows more about what their looking for and is a better evaluator of whether or not they get what they need. Otherwise, you end up in this strange cycle time problem between people in back end that are trying to generate the data that they think they want. And so, by making the whole concept of data preparation simpler, more straight forward, you're able to have the people who actually consume the data and need it do a better job of articulating what they need, how they need it and making it presentable to the work that they're performing. >> Exactly, Peter. What does that say about how roles are starting to merge together? Cause you've got to be at the vanguard of seeing how some of these mature organizations are working. What do you think? Are we seeing roles start to become more aligned? >> Yes, I do think. So, first and foremost, I think what's happening is there is no such thing as having just one group that's doing data science and another group consuming. I think what you're going to be going into is the world of data and information isn't all-consuming and that everybody's role. Everybody has a role in that. And everybody's going to consume. So, if you look at a business analyst that was spending 80% of their time living in Excel or working with self-service BI tools like our partner's Tableau and Power BI from Microsoft, others. What you find is these people today are living in a world where either they have to live in coding scripting world hell or they have to rely on IT to get them the real data. So, the role of a business analyst or a subject matter expert, first and foremost, the fact that they work with data and they need information that's a given. There is no business role today where you can't deal with data. >> But it also makes them real valuable, because there aren't a lot of people who are good at dealing with data. And they're very, very reliant on these people to turn that data into something that is regarded as consumable elsewhere. So, you're trying to make them much more productive. >> Exactly. So, four years years ago, when we launched on theCUBE, the whole premise was that in order to be able to really drive towards a world where you can make information and data-driven decisions, you need to ensure that the business analyst community, or what I like to call the business consumer needs to have the power of being able to, A, get access to data, B, make sense of the data, and then turn that data into something that's valuable for her or for him. >> Peter: And others. >> And others, and others. Absolutely. And that's what Paxata is doing. In a collaborative, in a 21st Century world where I don't work in a silo, I work collaboratively. And then the tool, and the platform that helps me do that is actually a 21st Century platform. >> So, John, at the beginning of the session you and Jim were talking about what is going to be one of the themes here at the show. And we observed that it used to be that people were talking about setting up the hardware, setting up the clutters, getting Hadoop to work, and Jim talked about going up the stack. Well, this is one of the indicators that, in fact, people were starting to go up the stack because they're starting to worry more about the data, what it can do, the value of how it's going to be used, and how we distribute more of that work so that we get more people using data that's actually good and useful to the business. >> John: And drives value. >> And drives value. >> Absolutely. And if I may, just put a chronological aspect to this. When we launched the company we said the business analyst needs to be in charge of the data and turning the data into something useful. Then right at that time, the world of create data lakes came in thanks to our partners like Cloudera and Hortonworks, and others, and MapR and others. In the recent past, the world of moving from on premise data lakes to hybrid, multicloud data lakes is becoming reality. Our partners at Microsoft, at AWS, and others are having customers come in and build cloud-based data lakes. So, today what you're seeing is on one hand this complete democratization within the business, like at Standard Chartered, where all these business analysts are getting access to data. And on the other hand, from the data infrastructure moving into a hybrid multicloud world. And what you need is a 21st Century information management platform that serves the need of the business and to make that data relevant and information and ready for their consumption. While at the same time we should not forget that enterprises need governance. They need lineage. They need scale. They need to be able to move things around depending on what their business needs are. And that's what Paxata is driving. That's why we're so excited about our partnership with Microsoft, with AWS, with our customer partnerships such as Standard Chartered Bank, rolling this out in an enterprise-- >> This is a democratization that you were referring to with your customers. We see this-- >> Everywhere. >> When you free the data up, good things happen but you don't want to have IT be the constraint, you want to let them enable-- >> Peter: And IT doesn't want to be the constraint. >> They don't. >> This is one of the biggest problems that they have on a daily basis. >> They're happy to let it go free as long as it's in they're mind DevOps-like related, this is cool for them. >> Well, they're happy to let it go with policy and security in place. >> Our customers, our most strategic customers, the folks who are running the data lakes, the folks who are managing the data lakes, they are the first ones that say that we want business to be able to access this data, and to be able to go and make use out of this data in the right way for the bank. And not have us be the impediment, not have us be the roadblock. While at the same time we still need governance. We still need security. We still need all those things that are important for a bank or a large enterprise. That's what Paxata is delivering to the customers. >> John: So, what's next? >> Peter: Oh, I'm sorry. >> So, really quickly. An interesting observation. People talk about data being the new fuel of business. That really doesn't work because, as Bill Schmarzo says, it's not the new fuel of business, it's new sunlight of business. And the reason why is because fuel can only be used once. >> Prakash: That's right. >> The whole point of data is that it can be used a lot, in a lot of different ways, and a lot of different contexts. And so, in many respects what we're really trying to facilitate or if someone who runs a data lake when someone in the business asks them, "Well, how do you create value for the business?" The more people, the more users, the more context that they're serving out of that common data, the more valuable the resource that they're administering. So, they want to see more utilization, more contexts, more data being moved out. But again, governance, security have to be in place. >> You bet, you bet. And using that analogy of data, and I've heard this term about data being the new oil, etc. Well, if data is the oil, information is really the refined fuel or sunlight as we like to call it. >> Peter: Yeah. >> John: Well, you're riffing on semantics, but the point is it's not a one trick pony. Data is part of the development, I wrote a blog post in 1997, I mean 2007 that said data's the new development kit. And it was kind of riffing on this notion of the old days >> Prakash: You bet. >> Here's your development kit, SDK, or whatever was how people did things back then Enter the cloud, >> Prakash: That's right. >> And boom, there it is. The data now is in the process of the refinery the developers wanted. The developers want the data libraries. Whatever that means. That's where I see it. And that is the democratization where data is available to be integrated in to apps, into feeds, into ... >> Exactly, and so it brings me to our point about what was the exciting, new product innovation announcement we made today about Intelligent Ingest. You want to be able to access data in the enterprise regardless of where it is, regardless of the cloud where it's sitting, regardless of whether it's on-premise, in the cloud. You don't need to as a business worry about whether that is a JSON file or whether that's an XML file or that's a relational file. That's irrelevant. What you want is, do I have the access to the right data? Can I take that data, can I turn it into something valuable and then can I make a decision out of it? I need to do that fast. At the same time, I need to have the governance and security, all of that. That's at the end of the day the objective that our customers are driving towards. >> Prakash, thanks so much for coming on and being a great member of our community. >> Fantastic. >> You're part of our smart network of great people out there and entrepreneurial journey continues. >> Yes. >> Final question. Just observation. As you pinch yourself and you go down the journey, you guys are walking the talk, adding new products. We're global landscape. You're seeing a lot of new stuff happening. Customers are trying to stay focused. A lot of distractions whether security or data or app development. What's your state of the industry? How do you view the current market, from your perspective and also how the customer might see it from their impact? >> Well, the first thing is that I think in the last four years we have seen significant maturity both on the providers off software technology and solutions, and also amongst the customers. I do think that going forward what is really going to make a difference is one really driving towards business outcomes by leveraging data. We've talked about a lot of this over the last few years. What real business outcomes are you delivering? What we are super excited is when we see our customers each one of them actually subscribes to Paxata, we're a SAS company, they subscribe to Paxata not because they're doing the science experiment but because they're trying to deliver real business value. What is that? Whether that is a risk in compliance solution which is going to drive towards real cost savings. Or whether that's a top line benefit because they know what they're customer 360 is and how they can go and serve their customers better or how they can improve supply chains or how they can optimize their entire efficiency in the company. I think if you take it from that lens, what is going to be important right now is there's lots of new technologies coming in, and what's important is how is it going to drive towards those top three business drivers that I have today for the next 18 months? >> John: So, that's foundational. >> That's foundational. Those are the building blocks-- >> That's what is happening. Don't jump... If you're a customer, it's great to look at new technologies, etc. There's always innovation projects-- >> RND, GPOCs, whatever. Kick the tires. >> But now, if you are really going to talk the talk about saying I'm going to be, call your word, data-driven, information-driven, whatever it is. If you're going to talk the talk, then you better walk the walk by delivering the real kind of tools and capabilities that you're business consumers can adopt. And they better adopt that fast. If they're not up and running in 24 hours, something is wrong. >> Peter: Let me ask one question before you close, John. So, you're argument, which I agree with, suggests that one of the big changes in the next 18 months, three years as this whole thing matures and gets more consistent in it's application of the value that it generates, we're going to see an explosion in the number users of these types of tools. >> Prakash: Yes, yes. >> Correct? >> Prakash: Absolutely. >> 2X, 3X, 5X? What do you think? >> I think we're just at the cusp. I think is going to grow up at least 10X and beyond. >> Peter: In the next two years? >> In the next, I would give that next three to five years. >> Peter: Three to five years? >> Yes. And we're on the journey. We're just at the tip of the high curve taking off. That's what I feel. >> Yeah, and there's going to be a lot more consolidation. You're going to start to see people who are winning. It's becoming clear as the fog lifts. It's a cloud game, a scale game. It's democratization, community-driven. It's open source software. Just solve problems, outcomes. I think outcome is going to be much faster. I think outcomes as a service will be a model that we'll probably be talking about in the future. You know, real time outcomes. Not eight month projects or year projects. >> Certainly, we started writing research about outcome-based management. >> Right. >> Wikibon Research... Prakash, one more thing? >> I also just want to say that in addition to this business outcome thing, I think in the last five years I've seen a lot of shift in our customer's world where the initial excitement about analytics, predictive, AI, machine-learning to get to outcomes. They've all come into a reality that none of that is possible if you're not able to handle, first get a grip on your data, and then be able to turn that data into something meaningful that can be analyzed. So, that is also a major shift. That's why you're seeing the growth we're seeing-- >> John: Cause it's really hard. >> Prakash: It's really hard. >> I mean, it's a cultural mindset. You have the personnel. It's an operational model. I mean this is not like, throw some pixie dust on it and it magically happens. >> That's why I say, before you go into any kind of BI, analytics, AI initiative, stop, think about your information management strategy. Think about how you're going to democratize information. Think about how you're going to get governance. Think about how you're going to enable your business to turn data into information. >> Remember, you can't do AI with IA? You can't do AI without information architecture. >> There you go. That's a great point. >> And I think this all points to why Wikibon's research have all the analysts got it right with true private cloud because people got to take care of their business here to have a foundation for the future. And you can't just jump to the future. There's too much just to come and use a scale, too many cracks in the foundation. You got to do your, take your medicine now. And do the homework and lay down a solid foundation. >> You bet. >> All right, Prakash. Great to have you on theCUBE. Again, congratulations. And again, it's great for us. I totally have a great vibe when I see you. Thinking about how you launched on theCUBE in 2013, and how far you continue to climb. Congratulations. >> Thank you so much, John. Thanks, Peter. That was fantastic. >> All right, live coverage continuing day one of three days. It's going to be a great week here in New York City. Weather's perfect and all the players are in town for Big Data NYC. I'm John Furrier with Peter Burris. Be back with more after this short break. (upbeat techno music).

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE with Peter Burris, and it's been the lucky charm. In the last few years, I came and I talked to you That's not a start up. They got a lot of data. and Shameek Kundu, and the entire leadership Are the ATMs going to go away and turn it into information that actually allows you Take a minute quickly just to talk about what you guys do. And the customer list in a particular region and the person who runs the Paxata platform and you guys just let the business guys and that's contextual. is move more of the prep work or have more of the prep work are starting to merge together? And everybody's going to consume. to turn that data into something that is regarded to be able to really drive towards a world And that's what Paxata is doing. So, John, at the beginning of the session of the business and to make that data relevant This is a democratization that you were referring to This is one of the biggest problems that they have They're happy to let it go free as long as Well, they're happy to let it go with policy and to be able to go and make use out of this data And the reason why is because fuel can only be used once. out of that common data, the more valuable Well, if data is the oil, I mean 2007 that said data's the new development kit. And that is the democratization At the same time, I need to have the governance and being a great member of our community. and entrepreneurial journey continues. How do you view the current market, and also amongst the customers. Those are the building blocks-- it's great to look at new technologies, etc. Kick the tires. the real kind of tools and capabilities in it's application of the value that it generates, I think is going to grow up at least 10X and beyond. We're just at the tip of Yeah, and there's going to be a lot more consolidation. Certainly, we started writing research Prakash, one more thing? and then be able to turn that data into something meaningful You have the personnel. to turn data into information. Remember, you can't do AI with IA? There you go. And I think this all points to Great to have you on theCUBE. Thank you so much, John. It's going to be a great week here in New York City.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
John	PERSON	0.99+
Jim	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
2013	DATE	0.99+
Peter	PERSON	0.99+
Prakash	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Prakash Nanduri	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
1997	DATE	0.99+
New York	LOCATION	0.99+
Three	QUANTITY	0.99+
80%	QUANTITY	0.99+
Michael Gorriz	PERSON	0.99+
Standard Chartered Bank	ORGANIZATION	0.99+
New York City	LOCATION	0.99+
2007	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
87,500 employees	QUANTITY	0.99+
Paxata	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
last year	DATE	0.99+
37th Street	LOCATION	0.99+
SAS	ORGANIZATION	0.99+
WikiBon Research	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
Excel	TITLE	0.99+
24 hours	QUANTITY	0.99+
One	QUANTITY	0.99+
this year	DATE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
This year	DATE	0.99+
21st Century	DATE	0.99+
one	QUANTITY	0.99+
eight month	QUANTITY	0.99+
one question	QUANTITY	0.99+
four years years ago	DATE	0.99+
3X	QUANTITY	0.99+
5X	QUANTITY	0.99+
first	QUANTITY	0.99+
three years	QUANTITY	0.99+

Itamar Ankorian, Attunity | BigData NYC 2017

>> Announcer: Live from Midtown Manhattan, it's theCUBE, covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsor. >> Okay, welcome back, everyone, to our live special CUBE coverage in New York City in Manhattan, we're here in Hell's Kitchen for theCUBE's exclusive coverage of our Big Data NYC event and Strata Data, which used to be called Strata Hadoop, used to be Hadoop World, but our event, Big Data NYC, is our fifth year where we gather every year to see what's going on in big data world and also produce all of our great research. I'm John Furrier, the co-host of theCUBE, with Peter Burris, head of research. Our next guest, Itamar Ankorion, who's the Chief Marketing Officer at Attunity. Welcome back to theCUBE, good to see you. >> Thank you very much. It's good to be back. >> We've been covering Attunity for many, many years. We've had many conversations, you guys have had great success in big data, so congratulations on that. But the world is changing, and we're seeing data integration, we've been calling this for multiple years, that's not going away, people need to integrate more. But with cloud, there's been a real focus on accelerating the scale component with an emphasis on ease of use, data sovereignty, data governance, so all these things are coming together, the cloud has amplified. What's going on in the big data world, and it's like, listen, get movin' or you're out of business has pretty much been the mandate we've been seeing. A lot of people have been reacting. What's your response at Attunity these days because you have successful piece parts with your product offering? What's the big update for you guys with respect to this big growth area? >> Thank you. First of all, the cloud data lakes have been a major force, changing the data landscape and data management landscape for enterprises. For the past few years, I've been working closely with some of the world's leading organizations across different industries as they deploy the first and then the second and third iteration of the data lake and big data architectures. And one of the things, of course, we're all seeing is the move to cloud, whether we're seeing enterprises move completely to the cloud, kind of move the data lakes, that's where they build them, or actually have a hybrid environment where part of the data lake and data works analytics environment is on prem and part of it is in the cloud. The other thing we're seeing is that the enterprises are starting to mix more of the traditional data lake, the cloud is the platform, and streaming technologies is the way to enable all the modern data analytics that they need, and that's what we have been focusing on on enabling them to use data across all these different technologies where and when they need it. >> So, the sum of the parts is worth more if it's integrated together seems to be the positioning, which is great, it's what customers want, make it easier. What is the hard news that you guys have, 'cause you have some big news? Let's get to the news real quick. >> Thank you very much. We did, today, we have announced, we're very excited about it, we have announced a new big release of our data integration platform. Our modern platform brings together Attunity Replicate, Attunity Compose for Hive, and Attunity Enterprise Manager, or AEM. These are products that we've evolved significantly, invested a lot over the last few years to enable organizations to use data, make data available, and available in the real time across all these different platforms, and then, turn this data to be ready for analytics, especially in Hive and Hadoop environments on prem and now also in the cloud. Today, we've announced a major release with a lot of enhancements across the entire product line. >> Some people might know you guys for the Replicate piece. I know that this announcement was 6.0, but as you guys have the other piece part to this, really it's about modernization of kind of old-school techniques. That's really been the driver of your success. What specifically in this announcement makes it, you know, really work well for people who move in real time, they want to have good data access. What's the big aha for the customers out there with Attunity on this announcement? >> That's a great question, thank you. First of all is that we're bringing it all together. As you mentioned, over the past few years, Attunity Replicate has emerged as the choice of many Fortune 100 and other companies who are building modern architectures and moving data across different platforms, to the cloud, to their lakes, and they're doing it in a very efficient way. One of the things we've seen is that they needed the flexibility to adapt as they go through their journey, to adapt different platforms, and what we give them with Replicate was the flexibility to do so. We give them the flexibility, we give them the performance to get the data and efficiency to move only the change of the data as they happen and to do that in a real-time fashion. Now, that's all great, but once the data gets to the data lake, how do you then turn it into valuable information? That's when we introduced Compose for Hive, which we talked about in our last session a few month ago, which basically takes the next stage in the pipeline picking up incremental, continuous data that is fed into the data lake and turning those into operational data store, historical data stores, data store that's basically ready for analytics. What we've done with this release that we're really excited about is putting all of these together in a more integrated fashion, putting Attunity Enterprise Manager on top of it to help manage larger scale environments so customers can move faster in deploying these solutions. >> As you think about the role that Attunity's going to play over time, though, it's going to end up being part of a broader solution for how you handle your data. Imagine for a second the patterns that your customers are deploying. What is Attunity typically being deployed with? >> That's a great question. First of all, we're definitely part of a large ecosystem for building the new data architecture, new data management with data integration being more than ever a key part of that bigger ecosystem because as all they actually have today is more islands with more places where the data needs to go, and to your point, more patterns in which the data moves. One of those patterns that we've seen significantly increase in demand and deployment is streaming. Where data used to be batch, now we're all talking about streaming. Kafka has emerged as a very common platform, but not only Kafka. If you're on Amazon Web Services, you're using Kinesis. If you're in Azure, you're using Azure Event Hubs. You have different streaming technologies. That's part of how this has evolved. >> How is that challenge? 'Cause you just bring up a good point. I mean, with the big trend that customers want is they want either the same code basis on prem and that they have the hybrid, which means the gateway, if you will, to the public cloud. They want to have the same code base, or move workloads between different clouds, multi-cloud, it seems to be the Holy Grail, we've identified it. We are taking the position that we think multi-cloud will be the preferred architecture going forward. Not necessarily this year, but it's going to get there. But as a customer, I don't want to have to rebuild employees and get skill development and retraining on Amazon, Azure, Google. I mean, each one has its own different path, you mentioned it. How do you talk to customers about that because they might be like, whoa, I want it, but how do I work in that environment? You guys have a solution for that? >> We do, and in fact, one of the things we've seen, to your point, we've seen the adoption of multiple clouds, and even if that adoption is staged, what we're seeing is more and more customers that are actually referring to the term lock-in in respect to the cloud. Do we put all the eggs in one cloud, or do we allow ourselves the flexibility to move around and use different clouds, and also mitigate our risk in that respect? What we've done from that perspective is first of all, when you use the Attunity platform, we take away all the development complexity. In the Attunity platform, it is very easy to set up. Your data flow is your data pipelines, and it's all common and consistent. Whether you're working on prem, whether you work on Amazon Web Services, on Azure, or on Google or other platforms, it all looks and feels the same. First of all, and you solve the issue of the diversity, but also the complexity, because what we've done is, this is one of the big things that Attunity is focused on was reducing the complexity, allowing to configure these data pipelines without development efforts and resources. >> One of the challenges, or one of the things you typically do to take complexity out is you do a better job of design up front. And I know that Attunity's got a tool set that starts to address some of of these things. Take us a little bit through how your customers are starting to think in terms of designing flows as opposed to just cobbling together things in a bespoke way. How is that starting to change as customers gain experience with large data sets, the ability, the need to aggregate them, the ability to present them to developers in different ways? >> That's a great point, and again, one of the things we've focused on is to make the process of developing or configuring these different data flows easy and modular. First, while in Attunity you can set up different flows in different patterns, and you can then make them available to others for consumption. Some create the data ingestion, or some create the data ingestion and then create a data transformation with Compose for Hive, and with Attunity Enterprise Manager, we've now also introduced APIs that allow you to create your own microservices, consuming and using the services enabled by the platform, so we provide more flexibility to put all these different solutions together. >> What's the biggest thing that you see from a customer standpoint, from a problem that you solve? If you had to kind of lay it out, you know the classic, hey, what problem do you solve? 'Cause there are many, so take us through the key problem, and then, if there's any secondary issues that you guys can address customers, that seems the way conversation starts. What are key problems that you solve? >> I think one of the major problems that we solve is scale. Our customers that are deploying data lakes are trying to deploy and use data that is coming, not from five or 10 or even 50 data sources, we work at hundreds going on thousands of data sources now. That in itself represents a major challenge to our customers, and we're addressing it by dramatically simplifying and making the process of setting those up very repeatable, very easy, and then providing the management facility because when you have hundreds or thousands, management becomes a bigger issue to operationalize it. We invested a lot in a management facility for those, from a monitoring, control, security, how do you secure it? The data lake is used by many different groups, so how do we allow each group to see and work only on what belongs to that group? That's part it, too. So again, the scale is the major thing there. The other one is real timeliness. We talked about the move to streaming, and a lot of it is in order to enable streaming analytics, real-time analytics. That's only as good as your data, so you need to capture data in real time. And that of course has been our claim to fame for a long time, being the leading independent provider of CDC, change data capture technology. What we've done now, and also expanded significantly with the new release, version six, is creating universal database streaming. >> What is that? >> We take databases, we take databases, all the enterprise databases, and we turn them into live streams. When you think, by the way, by the most common way that people have used, customers have used to bring data into the lake from a database, it was Scoop. And Scoop is a great, easy software to use from an open source perspective, but it's scripting and batch. So, you're building your new modern architecture with the two are effectively scripting and batch. What we do with CDC is we enable to take a database, and instead of the database being something you come to periodically to read it, we actually turn it into a live feed, so as the data changes in the database, we stream it, we make it available across all these different platforms. >> Changes the definition of what live streaming is. We're live streaming theCUBE, we're data. We're data streaming, and you get great data. So, here's the question for you. This is a good topic, I love this topic. Pete and I talk about this all the time, and it's been addressed in the big data world, but it's kind of, you can see the pattern going mainstream in society globally, geopolitically and also in society. Batch processing and data in motion are real time. Streaming brings up this use case to the end customer, which is this is the way they've done it before, certainly store things in data lakes, that's not going to go away, you're going to store stuff, but the real gain is in motion. >> Itamar: Correct. >> How do you describe that to a customer when you go out and say, hey, you know, you've been living in a batch world, but wake up to the real world called real time. How do you get to them to align with it? Some people get it right away, I see that, some people don't. How do you talk about that because that seems to be a real cultural thing going on right now, or operational readiness from the customer standpoint? Can you just talk through your feeling on that? >> First of all, this often gets lost in translation, and we see quite a few companies and even IT departments that when you talk, when they refer to real time, or their business tells them we need real time, what they understand from it is when you ask for the data, the response will be immediate. You get real time access to the data, but the data is from last week. So, we get real time access, but for last week's data. And that's what we try to do is to basically say, wait a second, when you mean real time, what does real time mean? And we start to understand what is the meaning of using last week's data versus, or yesterday's data, over the real time data, and that makes a big difference. We actually see that today the access, the availability, the availability to act on the real time data, that's the frontier of competitive differentiation. That's what makes a customer experience better, that's what makes the business more operationally efficient than the competition. >> It's the data, not so much the process of what they used to do. They're version of real time is I responded to you pretty quickly. >> Exactly, the other thing that's interesting is because we see it with, again, change of the capture becoming a critical component of the modern data architecture. Traditionally, we used to talk about different type of tools and technology, now CDC itself is becoming a critical part of it, and the reason is that it serves and it answers a lot of fundamental needs that are now becoming critical. One is the need for real-time data. The other one is efficiency. If you're moving to the cloud, and we talked about this earlier, if you're data lake is going to be in the cloud, there's no way you're going to reload all your data because the bandwidth is going to get in the way. So, you have to move only the delta. You need the ability to capture and move only the delta, so CDC becomes fundamental both in enabling the real time as well the efficient, the low-impact data integration. >> You guys have a lot of partners, technology partners, global SIs, resellers, a bunch of different partnership levels. The question I have for you, love to get your reaction and share your insight into is, okay, as the relationship to the customer who has the problem, what's in it for me? I want to move my business forward, I want to do digital business, I need to get up my real-time data as it's happening. Whether it's near real time or real time, that's evolution, but ultimately, they have to move their developers down a certain path. They'll usually hire a partner. The relationship between partners and you, the supplier to the customer, has changed recently. >> That's correct. >> How is that evolving? >> First of all, it's evolving in several ways. We've invested on our part to make sure that we're building Attunity as a leading vendor in the ecosystem of they system integration consulting companies. We work with pretty much all the major global system integrators as well as regional ones, boutique ones, that focus on the emerging technologies as well as get the modern analytic-type platforms. We work a lot with plenty of them on major corporate data center-level migrations to the cloud. So again, the motivations are different, but we invest-- >> More specialized, are you seeing more specialty, what's the trend? >> We've been a technology partner of choice to both Amazon and Microsoft for enabling, facilitating the data migration to the cloud. They of course, their select or preferred group of partners they work with, so we all come together to create these solutions. >> Itamar, what's the goals for Attunity as we wrap up here? I give you the last word, as you guys have this big announcement, you're bringing it all together. Integrating is key, it's always been your ethos in the company. Where is this next level, what's the next milestone for you guys? What do you guys see going forward? >> First of all, we're going to continue to modernize. We're really excited about the new announcement we did today, Replicate six, AEM six, a new version of Compose for Hive that now also supports small data lakes, Aldermore, Scaldera, EMR, and a key point for us was expanding AEM to also enable analytics on the data we generate as data flows through it. The whole point is modernizing data integration, providing more intelligence in the process, reducing the complexity, and facilitating the automation end-to-end. We're going to continue to solve, >> Automation big, big time. >> Automation is a big thing for us, and the point is, you need to scale. In order to scale, we want to generate things for you so you don't to develop for every piece. We automate the automation, okay. The whole point is to deliver the solution faster, and the way we're going to do it is to continue to enhance each one of the products in its own space, if it's replication across systems, Compose for Hive for transformations in pipeline automation, and AEM for management, but also to create integration between them. Again, for us it's to create a platform that for our customers they get more than the sum of the parts, they get the unique capabilities that we bring together in this platform. >> Itamar, thanks for coming onto theCUBE, appreciate it, congratulations to Attunity. And you guys bringing it all together, congratulations. >> Thank you very much. >> This theCUBE live coverage, bringing it down here to New York City, Manhattan. I'm John Furrier, Peter Burris. Be right back with more after this short break. (upbeat electronic music)

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE Media I'm John Furrier, the co-host of theCUBE, Thank you very much. What's the big update for you guys the move to cloud, whether we're seeing enterprises What is the hard news that you guys have, and available in the real time That's really been the driver of your success. the flexibility to adapt as they go through their journey, Imagine for a second the patterns and to your point, more patterns in which the data moves. We are taking the position that we think multi-cloud We do, and in fact, one of the things we've seen, the ability to present them to developers in different ways? one of the things we've focused on is What's the biggest thing that you see We talked about the move to streaming, and instead of the database being something and it's been addressed in the big data world, or operational readiness from the customer standpoint? the availability to act on the real time data, I responded to you pretty quickly. because the bandwidth is going to get in the way. the supplier to the customer, has changed boutique ones, that focus on the emerging technologies facilitating the data migration to the cloud. What do you guys see going forward? on the data we generate as data flows through it. and the point is, you need to scale. And you guys bringing it all together, congratulations. it down here to New York City, Manhattan.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Peter Burris	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
hundreds	QUANTITY	0.99+
John Furrier	PERSON	0.99+
five	QUANTITY	0.99+
last week	DATE	0.99+
New York City	LOCATION	0.99+
Itamar	PERSON	0.99+
second	QUANTITY	0.99+
CDC	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Today	DATE	0.99+
Pete	PERSON	0.99+
50 data sources	QUANTITY	0.99+
10	QUANTITY	0.99+
Itamar Ankorian	PERSON	0.99+
two	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
each group	QUANTITY	0.99+
yesterday	DATE	0.99+
fifth year	QUANTITY	0.99+
One	QUANTITY	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
Attunity Replicate	ORGANIZATION	0.99+
Manhattan	LOCATION	0.99+
one	QUANTITY	0.98+
Midtown Manhattan	LOCATION	0.98+
NYC	LOCATION	0.98+
Attunity	ORGANIZATION	0.97+
Aldermore	ORGANIZATION	0.97+
both	QUANTITY	0.97+
one cloud	QUANTITY	0.97+
this year	DATE	0.97+
EMR	ORGANIZATION	0.96+
Big Data	EVENT	0.96+
Kafka	TITLE	0.95+
each one	QUANTITY	0.95+
Scaldera	ORGANIZATION	0.95+
thousands	QUANTITY	0.94+
Azure	ORGANIZATION	0.94+
Strata Hadoop	EVENT	0.94+
New York City, Manhattan	LOCATION	0.94+
6.0	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
Azure Event Hubs	TITLE	0.91+
2017	EVENT	0.91+
a second	QUANTITY	0.91+
Hive	TITLE	0.9+
rtune 100	ORGANIZATION	0.9+
CUBE	ORGANIZATION	0.9+
few month ago	DATE	0.88+
Attunity Enterprise Manager	TITLE	0.83+
thousands of data sources	QUANTITY	0.83+
2017	DATE	0.82+
AEM	TITLE	0.8+
third iteration	QUANTITY	0.79+
version six	QUANTITY	0.78+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Strata Hadoop: