Image Title

Search Results for Hadoop World:

Jack Norris | Strata-Hadoop World 2012


 

>>Okay. We're back here, live in New York city for big data week. This is siliconangle.tvs, exclusive coverage of Hadoop world strata plus Hadoop world big event, a big data week. And we just wrote a blog post on siliconangle.com calling this the south by Southwest for data geeks and, and, um, it's my prediction that this is going to turn into a, quite the geek Fest. Uh, obviously the crowd here is enormous packed and an amazing event. And, uh, we're excited. This is siliconangle.com. I'm the founder John ferry. I'm joined by cohost update >>Volante of Wiki bond.org, where people go for free research and peers collaborate to solve problems. And we're here with Jack Norris. Who's the vice president of market marketing at map are a company that we've been tracking for quite some time. Jack, welcome back to the cube. Thank you, Dave. I'm going to hand it to you. You know, we met quite a while ago now. It was well over a year ago and we were pushing at you guys and saying, well, you know, open source and nice look, we're solving problems for customers. We got the right model. We think, you know, this is, this is our strategy. We're sticking to it. Watch what happens. And like I said, I have to hand it to you. You guys are really have some great traction in the market and you're doing what you said. And so congratulations on that. I know you've got a lot more work to do, but >>Yeah, and actually the, the topic of openness is when it's, it's pretty interesting. Um, and, uh, you know, if you look at the different options out there, all of them are combining open source with some proprietary. Uh, now in the case of some distributions, it's very small, like an ODBC driver with a proprietary, um, driver. Um, but I think it represents that that any solution combining to make it more open is, is important. So what we've done is make innovations, but what we've made those innovations we've opened up and provided API. It's like NFS for standard access, like rest, like, uh, ODBC drivers, et cetera. >>So, so it's a spectrum. I mean, actually we were at Oracle open world a few weeks ago and you listen to Larry Ellison, talk about the Oracle public cloud mix of actually a very strong case that it's open. You can move data, it's all Java. So it's all about standards. Yeah. And, uh, yeah, it from an opposite, but it was really all about the business value. That's, that's what the bottom line is. So, uh, we had your CEO, John Schroeder on yesterday. Uh, John and I both were very impressed with, um, essentially what he described as your philosophy of we, we not as a product when we have, we have customers when we announce that product and, um, you know, that's impressive, >>Is that what he was also given some good feedback that startup entrepreneurs out there who are obviously a lot of action going on with the startup community. And he's basically said the same thing, get customers. Yeah. And that's it, that's all and use your tech, but don't be so locked into the tech, get the cutters, understand the needs and then deliver that. So you guys have done great. And, uh, I want to talk about the, the show here. Okay. Because, uh, you guys are, um, have a big booth and big presence here at the show. What, what did you guys are learning? I'll say how's the positioning, how's the new news hitting. Give us a quick update. So, >>Uh, a lot of news, uh, first started, uh, on Tuesday where we announced the M seven edition. And, uh, yeah, I brought a demo here for me, uh, for you all. Uh, because the, the big thing about M seven is what we don't have. So, uh, w we're not demoing Regents servers, we're not demoing compactions, uh, we're not demoing a lot of, uh, manual administration, uh, administrative tasks. So what that really means is that we took this stack. And if you look at HBase HBase today has about half of dupe users, uh, adopting HBase. So it's a lot of momentum in the market, uh, and, you know, use for everything from real-time analytics to kind of lightweight LTP processing. But it's an infrastructure that sits on top of a JVM that stores it's data in the Hadoop distributed file system that sits on a JVM that stores its data in a Linux file system that writes to disk. >>And so a lot of the complexity is that stack. And so as an administrator, you have to worry about how data gets permit, uh, uh, you know, kind of basically written across that. And you've got region servers to keep up, uh, when you're doing kind of rights, you have things called compactions, which increased response time. So it's, uh, it's a complex environment and we've spent quite a bit of time in, in collapsing that infrastructure and with the M seven edition, you've got files and tables together in the same layer writing directly to disc. So there's no region servers, uh, there's no compactions to deal with. There's no pre splitting of tables and trying to do manual merges. It just makes it much, much simpler. >>Let's talk about some of your customers in terms of, um, the profile of these guys are, uh, I'm assuming and correct me if I'm wrong, that you're not selling to the tire kickers. You're selling to the guys who actually have some experience with, with a dupe and have run into some of the limitations and you come in and say, Hey, we can solve some of those problems. Is that, is that, is that right? Can you talk about that a little bit >>Characterization? I think part of it is when you're in the evaluation process and when you first hear about Hadoop, it's kind of like the Gartner hype curve, right. And, uh, you know, this stuff, it does everything. And of course you got data protection, cause you've got things replicated across the cluster. And, uh, of course you've got scalability because you can just add nodes and so forth. Well, once you start using it, you realize that yes, I've got data replicated across the cluster, but if I accidentally delete something or if I've got some corruption that's replicated across the cluster too. So things like snapshots are really important. So you can return to, you know, what was it, five minutes before, uh, you know, performance where you can get the most out of your hardware, um, you know, ease of administration where I can cut this up into, into logical volumes and, and have policies at that whole level instead of at an individual file. >>So there's a, there's a bunch of features that really resonate with users after they've had some experience. And those tend to be our, um, you know, our, our kind of key customers. There's a, there's another phase two, which is when you're testing Hadoop, you're looking at, what's possible with this platform. What, what type of analytics can I do when you go into production? Now, all of a sudden you're looking at how does this fit in with my SLS? How does this fit in with my data protection, uh, policies, you know, how do I integrate with my different data sources? And can I leverage existing code? You know, we had one customer, um, you know, a large kind of a systems integrator for the federal government. They have a million lines of code that they were told to rewrite, to run with other distributions that they could use just out of the box with Matt BARR. >>So, um, let's talk about some of those customers. Can you name some names and get >>Sure. So, um, actually I'll, I'll, I'll talk with, uh, we had a keynote today and, uh, we had this beautiful customer video. They've had to cut because of times it's running in our booth and it's screaming on our website. And I think we've got to, uh, actually some of the bumper here, we kind of inserted. So, um, but I want to shout out to those because they ended up in the cutting room floor running it here. Yeah. So one was Rubicon project and, um, they're, they're an interesting company. They're a real-time advertising platform at auction network. They recently passed a Google in terms of number one ad reach as mentioned by comScore, uh, and a lot of press on that. Um, I particularly liked the headline that mentioned those three companies because it was measured by comScore and comScore's customer to map our customer. And Google's a key partner. >>And, uh, yesterday we announced a world record for the Hadoop pterosaur running on, running on Google. So, um, M seven for Rubicon, it allows them to address and replace different point solutions that were running alongside of Hadoop. And, uh, you know, it simplifies their, their potentially simplifies their architecture because now they have more things done with a single platform, increases performance, simplifies administration. Um, another customer is ancestry.com who, uh, you know, maybe you've seen their ads or heard, uh, some of their radio shots. Um, they're they do a tremendous amount of, of data processing to help family services and genealogy and figure out, you know, family backgrounds. One of the things they do is, is DNA testing. Uh, so for an internet service to do that, advanced technology is pretty impressive. And, uh, you know, you send them it's $99, I believe, and they'll send you a DNA kit spit in the tube, you send it back and then they process that and match and give you insights into your family background. So for them simplifying HBase meant additional performance, so they could do matches faster and really simplified administration. Uh, so, you know, and, and Melinda Graham's words, uh, you know, it's simpler because they're just not there. Those, those components >>Jack, I want to ask you about enterprise grade had duped because, um, um, and then, uh, Ted Dunning, because he was, he was mentioned by Tim SDS on his keynote speech. So, so you have some rockstars stars in the company. I was in his management team. We had your CEO when we've interviewed MC Sri vis and Google IO, and we were on a panel together. So as to know your team solid team, uh, so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. What does that mean now? I mean, obviously you guys were very successful at first. Again, we were skeptics at first, but now your traction and your performance has proven this is a market for that kind of platform. What does that mean now in this, uh, at this event today, as this is evolving as Hadoop ecosystem is not just Hadoop anymore. It's other things. Yeah, >>There's, there's, there's three dimensions to enterprise grade. Um, the first is, is ease of use and ease of use from an administrator standpoint, how easy does it integrate into an existing environment? How easy does it, does it fit into my, my it policies? You know, do you run in a lights out data center? Does the Hadoop distribution fit into that? So that's, that's one whole dimension. Um, a key to that is, is, you know, complete NFS support. So it functions like, uh, you know, like standard storage. Uh, a second dimension is undependability reliability. So it's not just, you know, do you have a checkbox ha feature it's do you have automated stateful fail over? Do you have self healing? Can you handle multiple, uh, failures and, and, you know, automated recovery. So, you know, in a lights out data center, can you actually go there once a week? Uh, and then just, you know, replace drives. And a great example of that is one of our customers had a test cluster with, with Matt BARR. It was a POC went on and did other things. They had a power field, they came back a week later and the cluster was up and running and they hadn't done any manual tasks there. And they were, they were just blown away to the recovery process for the other distributions, a long laundry list of, >>So I've got to ask you, I got to ask you this, the third >>One, what's the third one, third one is performance and performance is, is, you know, kind of Ross' speed. It's also, how do you leverage the infrastructure? Can you take advantage of, of the network infrastructure, multiple Knicks? Can you take advantage of heterogeneous hardware? Can you mix and match for different workloads? And it's really about sharing a cluster for different use cases and, and different users. And there's a lot of features there. It's not just raw >>The existing it infrastructure policies that whole, the whole, what happens when something goes wrong. Can you automate that? And then, >>And it's easy to be dependable, fast, and speed the same thing, making HBase, uh, easy, dependable, fast with themselves. >>So the talk of the show right now, he had the keynote this morning is that map. Our marketing has dropped the big data term and going with data Kozum. Is that true? Is that true? So, Joe, Hellerstein just had a tweet, Joe, um, famous, uh, Cal Berkeley professor, computer science professor now is CEO of a startup. Um, what's the industry trifecta they're doing, and he had a good couple of epic tweets this week. So shout out to Joe Hellerstein, but Joel Hellison's tweet that says map our marketing has decided to drop the term big data and go with data Kozum with a shout out to George Gilder. So I'm kind of like middle intellectual kind of humor. So w w w what's what's your response to that? Is it true? What's happening? What is your, the embargo, the VP of marketing? >>Well, if you look at the big data term, I think, you know, there's a lot of big data washing going on where, um, you know, architectures that have been out there for 30 years or, you know, all about big data. Uh, so I think there's a, uh, there's the need for a more descriptive term. Um, the, the purpose of data Kozum was not to try to coin something or try to, you know, change a big data label. It was just to get people to take a step back and think, and to realize that we are in a massive paradigm shift. And, you know, with a shout out to George Gilder, acknowledging, you know, he recognized what the impact of, of making available compute, uh, meant he recognized with Telekom what bandwidth would mean. And if you look at the combination of we've got all this, this, uh, compute efficiency and bandwidth, now data them is, is basically taking those resources and unleashing it and changing the way we do things. >>And, um, I think, I think one of the ways to look at that is the new things that will be possible. And there's been a lot of focus on, you know, SQL interfaces on top of, of Hadoop, which are important. But I think some of the more interesting use cases are taking this machine J generated data that's being produced very, very rapidly and having automated operational analytics that can respond in a very fast time to change how you do business, either, how you're communicating with customers, um, how you're responding to two different, uh, uh, risk factors in the environment for fraud, et cetera, or, uh, just increasing and improving, um, uh, your response time to kind of cost events. We met earlier called >>Actionable insight. Then he said, assigning intent, you be able to respond. It's interesting that you talk about that George Gilder, cause we like to kind of riff and get into the concept abstract concepts, but he also was very big in supply side economics. And so if you look at the business value conversation, one of things we pointed out, uh, yesterday and this morning, so opening, um, review was, you know, the, the top conversations, insight and analytics, you know, as a killer app right now, the app market has not developed. And that's why we like companies like continuity and what you guys are doing under the hood is being worked on right at many levels, performance units of those three things, but analytics is a no brainer insight, but the other one's business value. So when you look at that kind of data, Kozum, I can see where you're going with that. >>Um, and that's kind of what people want, because it's not so much like I'm Republican because he's Republican George Gilder and he bought American spectator. Everyone knows that. So, so obviously he's a Republican, but politics aside, the business side of what big data is implementing is massive. Now that I guess that's a Republican concept. Um, but not really. I mean, businesses is, is, uh, all parties. So relative to data caused them. I mean, no one talks about e-business anymore. We talking to IBM at the IBM conference and they were saying, Hey, that was a great marketing campaign, but no one says, Hey, uh, you and eat business today. So we think that big data is going to have the same effect, which is, Hey, are you, do you have big data? No, it's just assumed. Yeah. So that's what you're basically trying to establish that it's not just about big. >>Yeah. Let me give you one small example, um, from a business value standpoint and, uh, Ted Dunning, you mentioned Ted earlier, chief application architect, um, and one of the coauthors of, of, uh, the book hoot, which deals with machine learning, uh, he dealt with one of our large financial services, uh, companies, and, uh, you know, one of the techniques on Hadoop is, is clustering, uh, you know, K nearest neighbors, uh, you know, different algorithms. And they looked at a particular process and they sped up that process by 30,000 times. So there's a blog post, uh, that's on our website. You can find out additional information on that. And I, >>There's one >>Point on this one point, but I think, you know, to your point about business value and you know, what does data Kozum really mean? That's an incredible speed up, uh, in terms of, of performance and it changes how companies can react in real time. It changes how they can do pattern recognition. And Google did a really interesting paper called the unreasonable effectiveness of data. And in there they say simple algorithms on big data, on massive amounts of data, beat a complex model every time. And so I think what we'll see is a movement away from data sampling and trying to do an 80 20 to looking at all your data and identifying where are the exceptions that we want to increase because there, you know, revenue exceptions or that we want to address because it's a cost or a fraud. >>Well, that's what I, I would give a shout out to, uh, to the guys that digital reasoning Tim asked he's plugged, uh, Ted. It was idolized him in terms of his work. Obviously his work is awesome, but two, he brought up this concept of understanding gap and he showed an interesting chart in his keynote, which was the date explosion, you know, it's up and, you know, straight up, right. It's massive amount of data, 64% unstructured by his calculation. Then he showed out a flat line called attention. So as data's been exploding over time, going up attention mean user attention is flat with some uptick maybe, but so users and humans, they can't expand their mind fast enough. So machine learning technologies have to bridge that gap. That's analytics, that's insight. >>Yeah. There's a big conversation now going on about more data, better models, people trying to squint through some of the comments that Google made and say, all right, does that mean we just throw out >>The models and data trumps algorithms, data >>Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. Can I actually develop better algorithms that are simpler? And is it a virtuous cycle? >>Yeah, it's I, I think, I mean, uh, there are there's, there are a lot of debate here, a lot of information, but I think one of the, one of the interesting things is given that compute cycles, given the, you know, kind of that compute efficiency that we have and given the bandwidth, you can take a model and then iterate very quickly on it and kind of arrive at, at insight. And in the past, it was just that amount of data in that amount of time to process. Okay. That could take you 40 days to get to the point where you can do now in hours. Right. >>Right. So, I mean, the great example is fraud detection, right? So we used the sample six months later, Hey, your credit card might've been hacked. And now it's, you know, you got a phone call, you know, or you can't use your credit card or whatever it is. And so, uh, but there's still a lot of use cases where, you know, whether is an example where modeling and better modeling would be very helpful. Uh, excellent. So, um, so Dana custom, are you planning other marketing initiatives around that? Or is this sort of tongue in cheek fun? Throw it out there. A little red meat into the chum in the waters is, >>You know, what really motivated us was, um, you know, the cubes here talking, you know, for the whole day, what could we possibly do to help give them a topic of conversation? >>Okay. Data cosmos. Now of course, we found that on our proprietary HBase tools, Jack Norris, thanks for coming in. We appreciate your support. You guys have been great. We've been following you and continue to follow. You've been a great support of the cube. Want to thank you personally, while we're here. Uh, Matt BARR has been generous underwriter supportive of our great independent editorial. We want to recognize you guys, thanks for your support. And we continue to look forward to watching you guys grow and kick ass. So thanks for all your support. And we'll be right back with our next guest after this short break. >>Thank you. >>10 years ago, the video news business believed the internet was a fat. The science is settled. We all know the internet is here to stay bubbles and busts come and go. But the industry deserves a news team that goes the distance coming up on social angle are some interesting new metrics for measuring the worth of a customer on the web. What zinc every morning, we're on the air to bring you the most up-to-date information on the tech industry with scrutiny on releases of the day and news of industry-wide trends. We're here daily with breaking analysis, from the best minds in the business. Join me, Kristin Filetti daily at the news desk on Silicon angle TV, your reference point for tech innovation 18 months.

Published Date : Oct 25 2012

SUMMARY :

And, uh, we're excited. We think, you know, this is, this is our strategy. Um, and, uh, you know, if you look at the different options out there, we not as a product when we have, we have customers when we announce that product and, um, you know, Because, uh, you guys are, um, have a big booth and big presence here at the show. uh, and, you know, use for everything from real-time analytics to you know, kind of basically written across that. Can you talk about that a little bit And, uh, you know, this stuff, it does everything. And those tend to be our, um, you know, Can you name some names and get uh, we had this beautiful customer video. uh, you know, you send them it's $99, I believe, and they'll send you a DNA so let's talk about, uh, Ted in a minute, but I want to ask you about the enterprise grade Hadoop conversation. So it functions like, uh, you know, like standard storage. is, you know, kind of Ross' speed. Can you automate that? And it's easy to be dependable, fast, and speed the same thing, making HBase, So the talk of the show right now, he had the keynote this morning is that map. there's a lot of big data washing going on where, um, you know, architectures that have been out there for you know, SQL interfaces on top of, of Hadoop, which are important. uh, yesterday and this morning, so opening, um, review was, you know, but no one says, Hey, uh, you and eat business today. uh, you know, K nearest neighbors, uh, you know, different algorithms. Point on this one point, but I think, you know, to your point about business value and you which was the date explosion, you know, it's up and, you know, straight up, right. that Google made and say, all right, does that mean we just throw out Trumps algorithms, but the question I have is do you think, and your customer is talking about, okay, well now they have more data. cycles, given the, you know, kind of that compute efficiency that we have and given And now it's, you know, you got a phone call, you know, We want to recognize you guys, thanks for your support. We all know the internet is here to stay bubbles and busts come and go.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Joe HellersteinPERSON

0.99+

George GilderPERSON

0.99+

Ted DunningPERSON

0.99+

Kristin FilettiPERSON

0.99+

Joel HellisonPERSON

0.99+

John SchroederPERSON

0.99+

JoePERSON

0.99+

JackPERSON

0.99+

Larry EllisonPERSON

0.99+

Jack NorrisPERSON

0.99+

JohnPERSON

0.99+

40 daysQUANTITY

0.99+

Melinda GrahamPERSON

0.99+

64%QUANTITY

0.99+

$99QUANTITY

0.99+

comScoreORGANIZATION

0.99+

TimPERSON

0.99+

DavePERSON

0.99+

TuesdayDATE

0.99+

Matt BARRPERSON

0.99+

HellersteinPERSON

0.99+

GoogleORGANIZATION

0.99+

George GilderPERSON

0.99+

TedPERSON

0.99+

John ferryPERSON

0.99+

30 yearsQUANTITY

0.99+

30,000 timesQUANTITY

0.99+

todayDATE

0.99+

IBMORGANIZATION

0.99+

a week laterDATE

0.99+

yesterdayDATE

0.99+

twoQUANTITY

0.99+

three companiesQUANTITY

0.99+

DanaPERSON

0.99+

Tim SDSPERSON

0.99+

one pointQUANTITY

0.99+

JavaTITLE

0.99+

firstQUANTITY

0.99+

six months laterDATE

0.99+

oneQUANTITY

0.99+

OracleORGANIZATION

0.99+

one customerQUANTITY

0.99+

LinuxTITLE

0.98+

once a weekQUANTITY

0.98+

18 monthsQUANTITY

0.98+

RubiconORGANIZATION

0.98+

HBaseTITLE

0.98+

KozumPERSON

0.98+

GartnerORGANIZATION

0.98+

this morningDATE

0.97+

TelekomORGANIZATION

0.97+

this weekDATE

0.97+

10 years agoDATE

0.97+

second dimensionQUANTITY

0.97+

bothQUANTITY

0.97+

KozumORGANIZATION

0.95+

third oneQUANTITY

0.95+

OneQUANTITY

0.94+

three thingsQUANTITY

0.94+

a year agoDATE

0.94+

HadoopTITLE

0.93+

siliconangle.comOTHER

0.93+

KnicksORGANIZATION

0.93+

RegentsORGANIZATION

0.92+

Dr. Amr Awadallah - Interview 1 - Hadoop World 2011 - theCUBE


 

okay we're back live in new york city for hadoop world 2011 john furrier its founder SiliconANGLE calm and we have a special walk-in guest tomorrow and allah the vp of engineering co founder of Cloudera who's going to be on at two thirty eastern time on the cube to go more in depth but since we saw her in the hallway we had a quick spot wanted to grab him in here this is the cube our flagship telecast where we go out to the event atop the smartest people and i'm here with my co-host i'm dave vellante Wikibon door welcome back you're a longtime cube alum so appreciate you coming back on and doing a quick drive by here thanks for the nice welcome so you know we go talk to the smart people in the room you're one of the smartest guys that I know and we've been friends for years and it was your my tweet heard around the world by you to find space and we've been sharing the office space at Cloudera a year didn't have you I meant to have you we're going to be trying to find space because you're expanding so fast we have to get in a new home sorry about that but I wanted to really thank you personally appear on live you've enabled SiliconANGLE Wikibon to we figured it out early because of you I mean we had our nose sniffing around the big data area before it's called big data but when we met talked we've been tracking the social web and really it's exploded in an amazing way and I'm just really thankful because I've been had a front-row seat in the trenches with you guys and and it's been amazing so I want to thank you're welcome and that's great to have you on board and so so you you've been evangelizing in the trenches at Yahoo you were a ir a textile partners announcing the hundred million dollar fund which is all great news today but you've been the real spark get cloudy air is one of the 10 others one of them but I know one of the main sparks a co-founder a lots of ginger cuz I'm Rebecca and my co-founder from facebook I mean we both we said this before like we saw the future like an hour companies we saw the future where everybody is gonna go next and now Jeff's gonna be on as well he's now taking this whole date of science thing art yep building out a team you gotta drilled that down with him what do you what do you think about all this I mean like right now how do you feel personally emotionally and looking at the marketplace share with us your yeah I'm very emotional today actually yeah lots of the good news is you heard about the funding news yes million dollars for startups but no but the 14 oh yeah yeah it is more most actually the news was supposed to come out today came out a bit earlier sir day but yeah I'm very very emotional because of that it's a very Testament from very big name investor's of how well we were doing and recognition of how big this wave really is also the hundred million fun from Excel that's also a huge testament and lots of hopefully lots of new innovations or startups will come out of that so I'm very emotional about that but also overwhelmed by the by the the size of this event and how many people are really gravitating towards the technology which shows how much work we still have to do going forward it was very very August of a great a bit scared a bit scared Michaels is a great CEO on stage they're great guy we love Mike just really he's geeky and he's pragmatic Jerry strategist and you got Kirk who's the operator yeah but he showed a slide up at his keynote that showed the evolution of Hadoop yes the core Hadoop and then he showed ya year-by-year and now we got that columns extending and you got new new components coming out take us through that that progression just go back a few years in and walk us through why is this going on so fast and what are the what's the what's the community doing and just yeah and what happened in 2008 it doesn't need was one mr. yeah when we when we started so I mean first 2008 when we started and what he was believing us back then that hey this thing is going to be big like we had the belief because we saw it happen firsthand but many folks were dismissive and no no no this this big data thing is a fat and nobody will care about it and look and behold today it's obviously proving not to be the case in terms of the maturity of the of the platform you're absolutely right i mean the slide that Mike showed should but only thirty percent of the contributions happening today are in the Hadoop core layer and and and and the overall kind of vision there is very system very similar to the operating system right except what this really is it's a data operating system right it's how to operate large amounts of data in a big data center so sorry it's like an operating system for many machines as opposed to Linux which does not bring system for a single machine right so Hadoop when it came out Hadoop is only the colonel it's only that inner layers which if you look at any opening system like windows or linux and so on the core functionality is two things storing files and running applications on top of these files that's what windows does that's what linux does that was loop does at the heart but then to really get an opening system to work you need many ancillary components around it that really make it functional you need libraries in it applications in eat integration IO devices etc etc and that's really what's happening in the hadoop world so started with the core OS layer which is Hadoop HDFS for storage MapReduce for computation but then now all of these other things are showing around that core kernel to really make it a fully functional extensible data opening system I which made a little replay button but let's just put the paws on that because this is kind of an important point in folks out there there's a lot of different and a lot of people and metaphors are used in this business so it's the Linux I want to be it's just like Red Hat right yeah we kind of use that term the business model is talk a little bit about that we just mentioned you know not like Linux just unpack that a little bit deeper for us what's the difference you mentioned Linux is can you replay what you just said that was really so I was actually talking about the similarity the similarity and then i can and then i can talk about the difference the similarity is the heart of Hadoop is a system for storing files which is sdfs and a system for running applications on top of these files which is MapReduce the heart of Linux is the same thing assistant for storing files which is a txt for and a system for scheduling applications on top of these files that's the same heart of Windows and so on the difference though so that's the similarity I got a difference is Linux is made to run on a single note right and when this is made to run on a single note Hadoop is really made to run on many many notes so hadoo bicester cares about taking a data center of servers a rack of servers or a data center of servers and having them look like one big massive mainframe built out of commodity hardware that can store arbitrary amounts of data and run any type of hence the new components like the hives of the world so now so now these new components coming up like high for example I've makes it easier to write queries for Hadoop it's it's a sequel language for writing queries on top of Hadoop so you don't have to go and write it in MapReduce which we call that assembly language of Hadoop so if you write it and MapReduce you will get the most flexibility you will get the most performance but only if you know what you're doing very similar when you do machine code if you do machine cool assembly you will able do anything but you can also shoot yourself in the foot sunbelt is that right the same thing with MapReduce right when you use hive hive abstracts that out for you so your rights equal and then hive takes care of doing all of the plumbing work to get that compulsion to map it is for you so that's hive HBase for example is a very nice system that augments a dupe makes it low latency and makes it makes it support update and insert and delete transactions which are HDFS does not support out of the box so small like a database it's more like my sequel yeah the energy of my sequel to Linux is very similar to hbase to HDFS and what's your take on were from you know your founders had on now yeah on the business model similarities and differences with with redhead yes so actually they are different I mean that the sonority the similarity stops at open source we are both open source right in the sense that the core system is open source is available out there you can look at the source code again the and so on the difference is with redhead red that actually has a license on their bits so there's the source code and then there's the bits so when Red Hat compiles the source code and two bits these bits you cannot deploy them without having a red hat license with us is very different is now we have the source code which is Apache is all in the patchy we compile the source code into a bunch of bits which is our distribution called cdh these bits are one hundred percent open-source 103 can deploy them use them you don't have to face anything the only reason why you would come back and pay us is for Cloudera enterprise which is really when you go operational when become operational a mission-critical cloud enterprise gives you two things first it gives you a proprietary management suite that we built and it's very unique to us nobody in the market has anything close to what we have right now that makes it easier for you to deploy configure monitor provision do capacity planning security management etc for a loop nobody else has anything close what we have right now for that management's that is unique to cloud area and not part of a patchy open source yes it's not part of the vet's office you only get that as a subscriber to cloud era we do have a free version of that that's available for download and it can run up to 15 hours just for you to get up and running quickly yeah and it's really very simple has a very simple installer like you should be able to go fire off that software and say install Hadoop these are one of my servers and would take care of everything else for you it's like having these installers you know when windows came out in the beginning and he had this nice progress bar and you can install applications very easily imagine that now for a cluster of servers right that's ready what this is the other reason why people subscribe to the cloud enterprise in addition to getting this management suite is getting our support services right and support is necessary for any software even if it's free even for hardware think if I give you a free airplane right now just comment just give it here you go here is an airplane right you can run this airplane make money from passengers you still need somebody to maintain their plane for you right you can still go higher your mechanics maybe we'd have a tweetup bummer you can hire your own mechanics to maintain that airplane but we tell you like if you subscribe with us as the mechanics for your airplane the support you will get with us will be way better than anything else and economics of it also would be way better than having your own stuff for doing the maintenance for that airplane okay final question and we got a one-minute because we slid you in real quick we're going to come back for folks armor is going to come back at two-thirty so come back its eastern time and we'll have a more in-depth conversation but just share with the folks watching your view of what's going on in the patchy and you know there's all these kind of weird you know Fudd being thrown around that clutter is not this and that and you guys clearly the leader we talked with Kirk about that we don't need to go into that but just surely this what's going on what's the real deal happening with Apache the code and you have a unique offering which I mean the real deal and I advise people to go look at this blog post that our CEO wrote called by Michaelson road called the community effect and the real deal is there is a very big healthy community developing the source code for Hadoop the core system which is actually fsm MapReduce and all the components around around that core system we at Cloudera employ a very large engineering organization and tactile engineering relation is bigger than many of these other companies in the space that's our engineering is bigger if you look at the whole company itself is much much bigger than any of these other players so we we do a lot of contributions and to the core system and to the projects around it however we are part of the community and we're definitely doing this with the community it's not just a clowder thing for the core platform so that that's the real deal all right yeah so here we are armor that co-founder congratulations great funding hundred L from accel partners who invested in you guys congratulations you're part of the community we all know that just kind of clarifying that for the record and you have a unique differentiator management suite and the enterprise stuff and say expand the experience experience yeah I think a huge differentiation we have is we have been doing this for three years I had over everybody else we have the experience across all the industries that matter so when you come to us we know how to do this in the finance industry in the retail industry and the health industry and the government so that that's something also that so I'll just for the audience out there arm is coming back at two third you're gonna go deeper in today's the highly decorated or a general because there is there a leak oh and thanks for the small extra info he's in the uniform to the cloud era logo yes sir affecting some of those for us to someday great so what you see you again love love our great great friend

Published Date : May 1 2012

SUMMARY :

clarifying that for the record and you

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
RebeccaPERSON

0.99+

MikePERSON

0.99+

ClouderaORGANIZATION

0.99+

2008DATE

0.99+

ExcelTITLE

0.99+

HadoopTITLE

0.99+

three yearsQUANTITY

0.99+

linuxTITLE

0.99+

one-minuteQUANTITY

0.99+

windowsTITLE

0.99+

MichaelsPERSON

0.99+

JeffPERSON

0.99+

john furrierPERSON

0.99+

2011DATE

0.99+

LinuxTITLE

0.99+

KirkPERSON

0.99+

todayDATE

0.99+

thirty percentQUANTITY

0.99+

YahooORGANIZATION

0.99+

hbaseTITLE

0.98+

single noteQUANTITY

0.98+

two thingsQUANTITY

0.97+

single noteQUANTITY

0.97+

two bitsQUANTITY

0.97+

dave vellantePERSON

0.97+

HDFSTITLE

0.97+

10QUANTITY

0.97+

firstQUANTITY

0.97+

JerryPERSON

0.97+

facebookORGANIZATION

0.97+

hundred LQUANTITY

0.96+

bothQUANTITY

0.96+

million dollarsQUANTITY

0.96+

one hundred percentQUANTITY

0.95+

Red HatTITLE

0.95+

AugustDATE

0.95+

MapReduceTITLE

0.95+

Amr AwadallahPERSON

0.95+

tomorrowDATE

0.94+

hundred millionQUANTITY

0.94+

Dr.PERSON

0.94+

hundred million dollarQUANTITY

0.94+

up to 15 hoursQUANTITY

0.93+

hadoopTITLE

0.93+

WindowsTITLE

0.93+

single machineQUANTITY

0.92+

HBaseTITLE

0.92+

new york cityLOCATION

0.9+

yearsQUANTITY

0.9+

a yearQUANTITY

0.9+

ApacheORGANIZATION

0.9+

oneQUANTITY

0.89+

a lot of peopleQUANTITY

0.87+

red hatTITLE

0.85+

Hadoop WorldTITLE

0.84+

SiliconANGLEORGANIZATION

0.82+

two-thirtyDATE

0.8+

FuddPERSON

0.77+

Michaelson roadPERSON

0.74+

Dr. Amr Awadallah - Interview 2 - Hadoop World 2011 - theCUBE


 

Yeah, I'm Aala, They're the co-founder back to back. This is the cube silicon angle.com, Silicon angle dot TV's production of the cube, our flagship telecasts. We go out to the event. That was a great conversation. I was really just, just cool. I could have, we could have probably hit on a few more things, obviously well read. Awesome. Co-founder of Cloudera a. You were, you did a good job teaming up with that co-founder, huh? Not bad on the cube, huh? He's not bad on the cube, isn't he? He, >>He reads the internet. >>That's what I'm saying. >>Anything is going on. >>He's a cube star, you know, And >>Technology. Jeff knows it. Yeah. >>We, we tell you, I'm smarter just by being in Cloudera all those years. And I actually was following what he was saying, Sad and didn't dust my brain. So, Okay, so you're back. So we were talking earlier with Michaels and about the relational database thing. So I kind of pick that up where we left off with you around, you know, he was really excited. It's like, you know, hey, we saw that relational database movement happen. He was part of that. Yeah, yeah. That generation. And then, but things were happening or kind of happening the same way in a similar way, still early. So I was trying to really peg with him, how early are we, like, so, you know, as the curve, you know, this is 1400, it's not the Javit Center yet. Maybe the Duke world, you know, next year might be at the Javit Center, 35,000 just don't go to Vegas. So I'm trying to figure out where we are on that curve. Yeah. And we on the upwards slope, you know, down here, not even hitting that, >>I think, I think, I think we're moving up quicker than previous waves. And actually if you, if you look for example, Oracle, I think it took them 15, 20 years until they, they really became a mature company, VM VMware, which started about, what, 12, 13 years ago. It took them about maybe eight years to, to be a big company, met your company, and I'm hoping we're gonna do it in five. So a couple more years. >>Highly accelerated. >>Yes. But yeah, we see, I mean, I'm, I'm, I've been surprised by the growth. I have been, Right? I've been told, warned about enterprise software and, and that it takes long for production to take place. >>But the consumerization trend is really changing that. I mean, it seems to be that, yeah, the enterprises always last. Why the shorter >>Cycle? I think the shorter cycle is coming from having the, the, the, the right solution for the right problem at the right time. I think that's a big part of it. So luck definitely is a big part of this. Now, in terms of why this is changing compared to a couple of dec decades ago, why the adoption is changing compared to a couple of decades ago. I, I think that's coming just because of how quickly the technology itself, the underlying hardware is evolving. So right now, the fact that you can buy a single server and it has eight cores to 16 cores has 12 hards to terabytes. Each is, is something that's just pushing the, the, the, the limits what you can do with the existing systems and hence making it more likely for new systems to disrupt them. >>Yeah. We can talk about a lot. It's very easy for people to actually start a, a big data >>Project. >>Yes. For >>Example. Yes. And the hardest part is, okay, what, what do I really, what problem do I need to solve? How am I gonna, how am I gonna monetize it? Right? Those are the hard parts. It's not the, not the underlying >>Technology. Yes, Yes, that's true. That's true. I mean, >>You're saying, eh, you're saying >>Because, because I'm seeing both so much. I'm, I'm seeing both. I'm seeing both. And like, I'm seeing cases where you're right. There's some companies that was like, Oh, this Hadoop thing is so cool. What problem can I solve with it? And I see other companies, like, I have this huge problem and, and, and they don't know that HA exists. It's so, And once they know, they just jump on it right away. It's like, we know when you have a headache and you're searching for the medicine in Espin. Wow. It >>Works. I was talking to Jeff Hiba before he came on stage and, and I didn't even get to it cuz we were so on a nice riff there. Right. Bunch of like a musicians playing the guitar together. But like he, we talked about the it and and dynamics and he said something that I thoughts right. On money and SAP is talking the same thing and said they're going to the lines of business. Yes. Because it is the gatekeeper that's, it's like selling mini computers to a mainframe selling client servers from a mini computer team. Yeah. >>There's not, we're seeing, we're seeing both as well. So more likely the, the former one meaning, meaning that yes, line of business and departments, they adopt the technology and then it comes in and they see there's already these five different departments having it and they think, okay, now we need to formalize this across the organization. >>So what happens then? What are you seeing out there? Like when that happens, that mean people get their hands on, Hey, we got a problem to solve. Yeah. Is that what it comes down to? Well, Hadoop exist. Go get Hadoop. Oh yeah. They plop it in there and I what does it do? They, >>So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and solving the problem for them. Yeah. And when that happens, it's a very, it's a very easy adoption from there on because they just go tell it, We need this right now because it's solving this problem and it's gonna make, make us much >>More money moving it right in. Yes. No problems. >>Is is that another reason why the cycle's compressed? I mean, you know, you think client server, there was a lot of resistance from it and now it's more much, Same thing with mobile. I mean mobile is flipped, right? I mean, so okay, bring it in. We gotta deal with it. Yep. I would think the same thing. We, we have a data problem. Let's turn it into an >>Opportunity. Yeah. In my, and it goes back to what I said earlier, the right solution for the right problem at the right time. Like when they, when you have larger amounts of unstructured data, there isn't anything else out there that can even touch what had, can >>Do. So Amar, I need to just change gears here a minute. The gaming stuff. So we have, we we're featured on justin.tv right now on the front page. Oh wow. But the numbers aren't coming in because there's a competing stream of a recently released Modern Warfare three feature. Yes. Yes. So >>I was looking for, we >>Have to compete with Modern Warfare three. So can you, can we talk about Modern Warfare three for a minute and share the folks what you think of the current version, if any, if you played it. Yeah. So >>Unfortunately I'm waiting to get back home. I don't have my Xbox with me here. >>A little like a, I'm talking about >>My lines and business. >>Boom. Water warfares like a Christmas >>Tree here. Sorry. You know, I love, I'm a big gamer. I'm a big video gamer at Cloudera. We have every Thursday at five 30 end office, we, we play Call of of Beauty version four, which is modern world form one actually. And I challenge, I challenge people out there to come challenge our team. Just ping me on Twitter and we'll, we'll do a Cloudera versus >>Let's, let's, let's reframe that. Let team out. There am Abalas company. This is the geeks that invent the future. Jeff Haer Baer at Facebook now at Cloudera. Hammerer leading the charge. These guys are at gamers. So all the young gamers out there am are saying they're gonna challenge you. At which version? >>Modern Warfare one. >>Modern Warfare one. Yes. How do they fire in? Can you set up an >>External We'll >>We'll figure it out. We'll figure it out. Okay. >>Yeah. Just p me on Twitter and We'll, >>We can carry it live actually we can stream that. Yeah, >>That'd be great. >>Great. >>Yeah. So I'll tell you some of our best Hadooop committers and Hadoop developers pitch >>A picture. Modern Warfare >>Three going now Model Warfare three. Very excited about the game. I saw the, the trailers for it looks, graphics look just amazing. Graphics are amazing. I love the Sirius since the first one that came out. And I'm looking forward to getting back home to playing the game. >>I can't play, my son won't let me play. I'm such a fumbler with the Hub. I'm a keyboard controller. I can't work the Xbox controller. Oh, I have a coordination problem my age and I'm just a gluts and like, like Dad, sorry, Charity's over. I can I play with my friends? You the box. But I'm around big gamer. >>But, but in terms of, I mean, something I wanted to bring up is how to link up gaming with big data and analysis and so on. So like, I, I'm a big gamer. I love playing games, but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. So it's like, I mean, yeah, it's fun and I'm getting lots of enjoyment on it makes my life much more cheerful. But still, how can we harness all of this, all of these hours that gamers spend playing a game like Modern Warfare three, How can we, how can we collect instrument, all of the data that's coming from that and coming up, for example, with something useful with predicted. >>This is exactly, this is exactly the kind of application that's mainstream is gaming. Yeah. Yeah. Danny at Riot G is telling me, we saw him at Oracle Open World. He's up there for the Java one. He said that they, they don't really have a big data platform and their business is about understanding user behavior rep tons of data about user playing time, who they're playing with. Yeah, Yeah. How they want us to get into currency trading, You know, >>Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop right now. And, and depending on CDH for doing exactly that kind of thing, creating >>A good user experience >>Today, they're doing it for the purpose of enhancing the user experience and improving retention. So they do track everything. Like every single bullet, you fire everything in best Ball Head, you get everything home run, you do. And, and, and in, in a three >>Type of game consecutive headshot, you get >>Everything, everything is being Yeah. Headshot you get and so on. But, but as you said, they are using that information today to sell more products and, and, and retain their users. Now what I'm suggesting is that how can you harness that energy for the good as well? I mean for making money, money is good and everything, but how can you harness that for doing something useful so that all of this entertainment time is also actually productive time as well. I think that'd be a holy grail in this, in this environment if we >>Can achieve that. Yeah. It used to be that corn used to be the telegraph of the future of about, of applications, but gaming really is, if you look at gaming, you know, you get the headset on. It's a collaborative environment. Oh yeah. You got unified communications. >>Yeah. And you see our teenager kids, how, how many hours they spend on these things. >>You got play as a play environments, very social collaborative. Yeah. You know, some say, you know, we we're saying, what I'm saying is that that's the, that's the future work environment with Skype evolving. We're our multiplayer game's called our job. Right? Yeah. You know, so I'm big on gaming. So all the gamers out there, a has challenged you. Yeah. Got a big data example. What else are we seeing? So let's talk about the, the software. So we, one of the things you were talking about that I really liked, you were going down the list. So on Mike's slide he had all the new features. So around the core, can you just go down the core and rattle off your version of what, what it means and what it is. So you start off with say H Base, we talked about that already. What are the other ones that are out there? >>So the projects that we have right there, >>The projects that are around those tools that are being built. Cause >>Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use for processing. Yeah. And then the, the immediate layer above that is how to make MAP reduce easier for the masses. So how can, not everybody knows how to learn map, use Java, everybody knows sql, right? So, so one of the most successful projects right now that has the highest attach rate, meaning people usually when they install had do installed as well is Hive. So Hive takes sequel and so Jeff Harm Becker, my co-founder, when he was at Facebook, his team built the Hive system. Essentially Hive takes sql so you don't have to learn a new language, you already know sql. And then converts that into MAP use for you. That not only expands the developer base for how many people can use adu, but also makes it easier to integrate Hadoop through all DBC and JDBC integrated with BI tools like MicroStrategy and Tableau and Informatica, et cetera, et cetera. >>You mentioned R too. You mentioned R Program R >>As well. Yeah, R is one of our best partnerships. We're very, very happy with them. So that's, that's one of the very key projects is Hive assisted project to Hive ISS called Pig. A pig Latin is a language that ya invented that you have to learn the language. It's very easy, it's very easy to learn compared to map produce. But once you learn it, you can, you can specify very deep data pipelines, right? SQL is good for queries. It's not good for data pipelines because it becomes very convoluted. It becomes very hard for the, the human brain to understand it. So Pig is much more natural to the human. It's more like Pearl very similar to scripting kind of languages. So with Peggy can write very, very long data pipelines, again, very successful projects doing very, very well. Another key project is Edge Base, like you said. So Edge Base allows you to do low latencies. So you can do very, very quick lookups and also allows you to do transactions. So you can do updates in inserts and deletes. So one of the talks here that had World we try to recommend people watch when the videos come out is the Talk by Jonathan Gray from Facebook. And he talked about how they use Edge Base, >>Jonathan, something on here in the Cube later. Yeah. So >>Drill him on that. So they use Edge Base now for many, many things within Facebook. They have a big team now committed to building an improving edge base with us and with the community at large. And they're using it for doing their online messaging system. The live mail system in Facebook is powered by Edge Base right now. Again, Pro and eBay, The Casini project, they gave a keynote earlier today at the conference as well is using Edge Base as well. So Edge Base is definitely one of the projects that's growing very, very quickly right now within the Hudu system. Another key project that Jeff alluded to earlier when he was on here is Flum. So Flume is very instrumental because you have this nice system had, but Hadoop is useless unless you have data inside it. So how do you get the data inside do? >>So Flum essentially is this very nice framework for having these agents all over your infrastructure, inside your web servers, inside your application servers, inside your mobile devices, your network equipment that collects all of that data and then reliably and, and materializes it inside Hado. So Flum does that. Another good project is Uzi, so many of them, I dunno how, how long you want me to keep going here, But, but Uzi is great. Uzi is a workflow processing system. So Uzi allows you to define a series of jobs. Some of them in Pig, some of them in Hive, some of them in map use. You can define a series of them and then link them to each other and say, only start this job when these other jobs, two jobs finish because I'm waiting for the input from them before I can kick off and so on. >>So Uzi is a very nice framework that will will do that. We'll manage the whole graph of jobs for you and retry things when they fail, et cetera, et cetera. Another good project is where W H I R R and where allows you to very easily start ADU cluster on top of Amazon. Easy two on top of Rackspace, virtualized environ. It's more for kicking off, it's for kicking off Hadoop instances or edge based instances on any virtual infrastructure. Okay. VMware, vCloud. So that it supports all of the major vCloud, sorry, all of the me, all of the major virtualized infrastructure systems out there, Eucalyptus as well, and so on. So that's where W H I R R ARU is another key project. It's one, it's duck cutting's main kind of project right now. Don of that gut cutting came on stage with you guys has, So Aru ARO is a project about how do we encode with our files, the schema of these files, right? >>Because when you open up a text file and you don't know how to what the columns mean and how to pars it, it becomes very hard to work for it. So ARU allows you to do that much more easily. It's also useful for doing rrp. We call rtc remove procedure calls for having different services talk to each other. ARO is very useful for that as well. And the list keeps going on and on Maha. Yeah. Which we just, thanks for me for reminding me of my house. We just added Maha very recently actually. What is that >>Adam? I'm not >>Familiar with it. So Maha is a data mining library. So MAHA takes some of the most popular data mining algorithms for doing clustering and regression and statistical modeling and implements them using the map map with use model. >>They have, they have machine learning in it too or Yes, yes. So that's the machine learning. >>So, So yes. Stay vector to machines and so on. >>What Scoop? >>So Scoop, you know, all of them. Thanks for feeding me all the names. >>The ones I don't understand, >>But there's so many of them, right? I can't even remember all of them. So Scoop actually is a very interesting project, is short for SQL to Hadoop, hence the name Scoop, right? So SQ from SQL and Oops from Hadoop and also means Scoop as in scooping up stuff when you scoop up ice cream. Yeah. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata and it is a vertical and so on and Hadoop. So you can very simply say, Scoop the name of the table inside the relation system, the name of the file inside Hadoop. And the, the table will be copied over to the file and Vice and Versa can say Scoop the name of the file in Hadoop, the name of the table over there, it'll move the table over there. So it's a connectivity tool between the relational world and the Hadoop world. >>Great, great tutorial. >>And all of these are Apache projects. They're all projects built. >>It's not part of your, your unique proprietary. >>Yes. But >>These are things that you've been contributing >>To, We're contributing to the whole ecosystem. Yes. >>And you understand very well. Yes. And >>And contribute to your knowledge of the marketplace >>And Absolutely. We collaborate with the, with the community on creating these projects. We employ committers and founders for many of these projects. Like Duck Cutting, the founder of He works in Cloudera, the founder for that UIE project. He works at Calera for zookeeper works at Calera. So we have a number of them on stuff >>Work. So we had Aroon from Horton Works. Yes. And and it was really good because I tell you, I walk away from that conversation and I gotta say for the folks out there, there really isn't a war going on in Apache. There isn't. And >>Apache, there isn't. I mean isn't but would be honest. Like, and in the developer community, we are friends, we're working together. We want to achieve the, there's >>No war. It's all Kumbaya. Everyone understands the rising tide floats, all boats are all playing nice in the same box. Yes. It's just a competitive landscape in Horton. Works >>In the business, >>Business business, competitive business, PR and >>Pr. We're trying to be friendly, as friendly as we can. >>Yeah, no, I mean they're, they're, they're hying it up. But he was like, he was cool. Like, Hey, you know, we know each other. Yes. We all know each other and we're just gonna offer free Yes. And charge with support. And so are they. And that's okay. And they got other things going on. Yes. But he brought up the question. He said they're, they're launching a management console. So I said, Tyler's got a significant lead. He kind of didn't really answer the question. So the question is, that's your core bread and butter, That's your yes >>And no. Yes and no. I mean if you look at, if you look at Cloudera Enterprise, and I mentioned this earlier and when we talked in the morning, it has two main things in it. Cloudera Enterprise has the management suite, but it also has the, the the the support and maintenance that we provide to our customers and all the experience that we have in our team part That subscription. Yes. For a description. And I, I wanna stress the point that the fact that I built a sports car doesn't mean that I'm good at running that sports car. The driver of the car usually is much better at driving the car than the guy who built the car, right? So yes, we have many people on staff that are helping build had, but we have many more people on stuff that helped run Hado at large scale, at at financial indu, financial industry, retail industry, telecom industry, media industry, health industry, et cetera, et cetera. So that's very, very important for our customer. All that experience that we bring in on how to run the system technically Yeah. Within these verticals. >>But their strategies clear. We're gonna create an open source project within Apache for a management consult. Yes. And we sell support too. Yes. So there'll be a free alternative to management. >>So we have to see, But I mean we look at the product, I mean our products, >>It's gotta come down to product differentiation. >>Our product has been in the market for two years, so they just started building their products. It's >>Alpha, It's just Alpha. The >>Product is Alpha in Alpha right now. Yeah. Okay. >>Well the Apache products, it is >>Apache, right? Yeah. The Apache project is out. So we'll see how it does it compare to ours. But I think ours is way, way ahead of anything else out there. Yeah. Essentially people to try that for themselves and >>See essentially, John, when I asked Arro why does the world need Hortonwork? You know, eventually the answer we got was, well it's free. It needs to be more open. Had needs to be more open. >>No, there's, >>It's going to be, That's not really the reason why Warton >>Works. >>No, they want, they want to go make money. >>Exactly. We wasn't >>Gonna say them you >>When I kept pushing and pushing and that's ultimately the closest we can get cuz you >>Just listens. Not gonna >>12 open source projects. Yes. >>I >>Mean, yeah, yeah. You can't get much more open. Yeah. Look >>At management >>Consult, but Airs not shooting on all those. I mean, I mean not only we are No, no, not >>No, no, we absolutely >>Are. No, you are contributing. You're not. But that's not all your projects. There's other people >>Involved. Yeah, we didn't start, we didn't start all of these projects. Yeah, that's >>True. You contributing heavily to all of them. >>Yes, we >>Are. And that's clear. Todd Lipkin said that, you know, he contributed his first patch to HPAC in 2008. Yes. So I mean, you go back through the ranks >>Of your people and Todd now is a committer on Edge base is a committer on had itself. So on a number >>Of you clearly the lead and, and you know, and, but >>There is a concern. But we, we've heard it and I wanna just ask you No, no. So there's a concern that if I build processes around a proprietary management console, Yes. I'm gonna end up being locked into that proprietary management CNA all over again. Now this is so far from ca Yes. >>Right. >>But that's a concern that some people have expressed. And, and, and I think one of the reasons why Port Works is getting so much attention. So Yes. >>Talk about that. It's, it's a very good, it's a very good observation to make. Actually, >>There there is two separate things here. There's the platform where all the data sets and then there's this management parcel beside the platform. Now why did we make the management console why the cloud didn't make the management console? Because it makes our job for supporting the customers much more achievable. When a customer calls in and says, We have a problem, help us fix this problem. When they go to our management console, there is a button they click that gives us a dump of the state, of the cluster. And that's what allows us to very quickly debug what's going on. And within minutes tell them you need to do this and you to do that. Yeah. Without that we just can't offer the support services. There's >>Real value there. >>Yes. So, so now a year from, But, but, but you have to keep in mind that the, the underlying platform is completely open source and free CBH is completely a hundred percent open source, a hundred percent free, a hundred percent Apache. So a year from now, when it comes time to renew with us, if the customer is not happy with our management suite is not happy with our support data, they can, they can go to work >>And works. People are afraid >>Of all they can go to ibm. >>The data, you can take the data that >>You don't even need to take the data. You're not gonna move the data. It's the same system, the same software. Every, everything in CDH is Apache. Right? We're not putting anything in cdh, which is not Apache. So a year from now, if you're not happy with our service to you and the value that we're providing, you can switch. There is no lock in. There is no lock. And >>Your, your argument would be the switching costs to >>The only lock in is happiness. The only lock in is which >>Happiness inspection customer delay. Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. We made that statement. We've got some heat for it. Yes. And >>This is sort of at scale though. What the, what the people are saying, they're throwing the tomatoes is saying if this is, again, in theory at scale, the customers are so comfortable with that, the console that they don't switch. Now my argument was >>Yes, but that means they're happy with it. That means they're satisfied and happy >>With it. >>And it's more economical for them than going and hiding people full-time on stuff. Yeah. >>So you're, you're always on check as, as long as the customer doesn't feel like Oracle. >>Yeah. See that's different. Oracle is very, Oracle >>Is like different, right? Yeah. Here it's like Cisco routers, they get nested into the environment, provide value. That's just good competitive product strategy. Yes. If it they're happy. Yeah. It's >>Called open washing with >>Oracle, >>I mean our number one core attribute on the company, the number one value for us is customer satisfaction. Keeping our people Yeah. Our customers happy with the service that we provide. >>So differentiate in the product. Yes. Keep the commanding lead. That's the strategist. That's the, that's what's happening. That's your goal. Yes. >>That's what's happening. >>Absolutely. Okay. Co-founder of Cloudera, Always a pleasure to have you on the cube. We really appreciate all the hospitality over the beer and a half. And wanna personally thank you for letting us sit in your office and we'll miss you >>And we'll miss you too. We'll >>See you at the, the Cube events off Swing by, thanks for coming on the cube and great to see you and congratulations on all your success. >>Thank >>You. And thanks for the review on Modern Warfare three. Yeah, yeah. >>Love me again. If there any gaming stuff, you know, I.

Published Date : May 1 2012

SUMMARY :

Yeah, I'm Aala, They're the co-founder back to back. Yeah. So I kind of pick that up where we left off with you around, you know, he was really excited. So a couple more years. takes long for production to take place. But the consumerization trend is really changing that. So right now, the fact that you can buy a single server and it It's very easy for people to actually start a, a big data Those are the hard parts. I mean, It's like, we know when you have a headache and you're On money and SAP is talking the same thing and said they're going to the lines of business. the former one meaning, meaning that yes, line of business and departments, they adopt the technology and What are you seeing out there? So they pop it into their, in their own installation or on the, on the cloud and they show that this actually is working and Yes. I mean, you know, you think client server, there was a lot of resistance from for the right problem at the right time. Do. So Amar, I need to just change gears here a minute. of the current version, if any, if you played it. I don't have my Xbox with me here. And I challenge, I challenge people out there to come challenge our team. So all the young gamers out there am are saying they're gonna challenge you. Can you set up an We'll figure it out. We can carry it live actually we can stream that. Modern Warfare I love the Sirius since the first one that came out. You the box. but at the same time, whenever I play games, I feel a little bit guilty because it's kind of like wasted time. Danny at Riot G is telling me, we saw him at Oracle Open World. Buy, I can't, I can't mention the names, but some of the biggest giving companies out there are using Hadoop So they do Now what I'm suggesting is that how can you harness that energy for the good as well? but gaming really is, if you look at gaming, you know, you get the headset on. So around the core, can you just go down the core and rattle off your version of what, The projects that are around those tools that are being built. Yeah, so the foundational, the foundational one as we mentioned before, is sdfs for storage map use You mentioned R too. So one of the talks here that had World we Jonathan, something on here in the Cube later. So Edge Base is definitely one of the projects that's growing very, very quickly right now So Uzi allows you to define a series of So that it supports all of the major vCloud, So ARU allows you to do that much more easily. So MAHA takes some of the most popular data mining So that's the machine learning. So, So yes. So Scoop, you know, all of them. And the idea for Scoop is to make it easy to move data between relational systems like Oracle metadata And all of these are Apache projects. To, We're contributing to the whole ecosystem. And you understand very well. So we have a number of them on And and it was really good because I tell you, Like, and in the developer community, It's all Kumbaya. So the question is, the experience that we have in our team part That subscription. So there'll be a free alternative to management. Our product has been in the market for two years, so they just started building their products. Alpha, It's just Alpha. Product is Alpha in Alpha right now. So we'll see how it does it compare to ours. You know, eventually the answer We wasn't Not gonna Yes. Yeah. I mean, I mean not only we are No, But that's not all your projects. Yeah, we didn't start, we didn't start all of these projects. So I mean, you go back through the ranks So on a number But we, we've heard it and I wanna just ask you No, no. So there's a concern that So Yes. It's, it's a very good, it's a very good observation to make. And within minutes tell them you need to do this and you to do that. So a year from now, when it comes time to renew with us, if the customer is And works. It's the same system, the same software. The only lock in is which Which by, by the way, we just wrote a piece about those wars and we said the risk of lockin is low. the console that they don't switch. Yes, but that means they're happy with it. And it's more economical for them than going and hiding people full-time on stuff. Oracle is very, Oracle Yeah. I mean our number one core attribute on the company, the number one value for us is customer satisfaction. So differentiate in the product. And wanna personally thank you for letting us sit in your office and we'll miss you And we'll miss you too. you and congratulations on all your success. Yeah, yeah. If there any gaming stuff, you know, I.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JeffPERSON

0.99+

Jeff HibaPERSON

0.99+

Todd LipkinPERSON

0.99+

2008DATE

0.99+

CiscoORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JohnPERSON

0.99+

MikePERSON

0.99+

Modern Warfare threeTITLE

0.99+

ApacheORGANIZATION

0.99+

DannyPERSON

0.99+

Jonathan GrayPERSON

0.99+

Jeff Haer BaerPERSON

0.99+

15QUANTITY

0.99+

two yearsQUANTITY

0.99+

CaleraORGANIZATION

0.99+

Modern WarfareTITLE

0.99+

16 coresQUANTITY

0.99+

Jeff Harm BeckerPERSON

0.99+

ToddPERSON

0.99+

eight coresQUANTITY

0.99+

JonathanPERSON

0.99+

bothQUANTITY

0.99+

FacebookORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

JavaTITLE

0.99+

next yearDATE

0.99+

SkypeORGANIZATION

0.99+

two jobsQUANTITY

0.99+

VegasLOCATION

0.99+

MichaelsPERSON

0.99+

ClouderaORGANIZATION

0.99+

oneQUANTITY

0.99+

HadoopTITLE

0.99+

hundred percentQUANTITY

0.99+

35,000QUANTITY

0.99+

Horton WorksORGANIZATION

0.99+

TodayDATE

0.99+

PeggyPERSON

0.99+

eBayORGANIZATION

0.99+

HortonLOCATION

0.99+

12 hardsQUANTITY

0.99+

EachQUANTITY

0.99+

vCloudTITLE

0.99+

HPACORGANIZATION

0.99+

AalaPERSON

0.99+

AdamPERSON

0.99+

TylerPERSON

0.98+

UIEORGANIZATION

0.98+

Hadoop WorldTITLE

0.98+

first oneQUANTITY

0.98+

12 open source projectsQUANTITY

0.98+

Edge BaseTITLE

0.98+

W H I R RTITLE

0.98+

fiveQUANTITY

0.98+

HammererPERSON

0.98+

XboxCOMMERCIAL_ITEM

0.98+

Port WorksORGANIZATION

0.98+

HiveTITLE

0.98+

AmarPERSON

0.98+

five different departmentsQUANTITY

0.98+

todayDATE

0.98+

ChristmasEVENT

0.98+

SQLTITLE

0.97+

Silicon angle dot TVORGANIZATION

0.97+

TableauTITLE

0.97+

twoQUANTITY

0.97+

W H I R RTITLE

0.97+

Ed Albanese - Hadoop World 2011 - theCUBE


 

>>Ed, welcome to the Cube. All right, Thanks guys. Good >>To see you. Thanks. Good to see you as well, >>John. Okay. Ed runs Biz dev for Cloudera, Industry veteran, worked at VMware. Ed, gotten to know you the past year. You guys have been doing great. What a difference one year makes, right? I mean, absolutely. Tell us, just let's start it off with what's happened in a year. I mean, you know, here at Hadoop World Cloudera, the ecosystem. Just give us your view of your perspective of what a difference one year makes. >>I think more than double is probably the, the fastest answer I could give you, which is, I mean, even looking around at the conference, it's, it itself is literally double from what it was last year. But in terms of the number of partners that have entered the market and really decided to work with, with Cloudera, but also in general, just the, the, the, the scope and size of the ecosystem itself, investors from every angle. You've got companies really well-branded marquee companies like Oracle coming into the mix and saying, Hey, Hadoop is the, is the real deal and we need to invest here. Marquee companies like IBM and EMC also doing the same. And of course, you know, as a result, you know, lots and lots of customer interest in the technology. And Cloudera's been fortunate to have been in the market early and really made the right investments with the right team. And so we're able to serve a lot of those customer needs. So it's been really, it's been a fantastic year for the company. >>So we had a great day yesterday with Cloudera. We had Kirk on, we had AER on twice, who by the way went viral with his modern warfare review, but we had Jeff Harmar Baer on, so we had pretty much the brain trust, Mike and Michaelson. Yep. The brain trust, the Cloudera. So we talked about the risk factors for Cloudera. Obviously you guys are number one, you've been kind of had untouchable lead and then all of a sudden boom competition. So Mike talked about that. So the strategy and the product side, they addressed, you're on the, the biz dev side, so you know, when you were number one, everyone wants to stand next to you and your phone rings off the hook from tier one partners all the way down to anyone's just getting in the business. Who wants a big data strategy on the execution. Now, what are you guys doing right now to, to continue your lead on the, on the sales marketing biz dev? I mean, I know you get the partner program, but what's your strategy for Phil, how to continue >>In that lead? The, the beautiful thing is honestly, our strategy hasn't changed at all. And I know that might sound counterintuitive, but we started off with a, a really crisp vision. And we want, what we wanna do is create a very attractive platform for partners. And, and, you know, one of the core, you know, sort of corporate strategy, Edix for Quadera is a recognition that the end of the day, the platform itself, Hado is an input into a solution. And Quadra is not likely to deliver the complete solution to market. Instead, it's going to be companies like Dell, for example, or it's going to be companies on the, on the ISV side like Informatica, which you're gonna deliver not only a base platform, but also the, the, the, the BI or analytics or data integration technologies on top. And as a result, what we've done is we've really focused in on creating a very attractive platform to vendors to build on. >>And one of the, I think one of the biggest misconceptions that I'm excited about that, you know, we are now having an opportunity to correct and that's a result, frankly, of the additional competitive dynamic. And I think the, the Wiki bond team pointed that out rather pointedly in their most recent articles. But is, is the sort of the lack of understanding around what CDH is and also the, some of the other investments that we're making to create a truly attractive platform for vendors to build on. And you know, I mean, I think you, you may have familiarity with exactly what CDH is, but for the sake of the audience here, what I'd like to do is say, say, first off, you know, first and foremost this is a hundred percent free in Apache license open source. But more importantly, it is everything that we build on the platform, meaning it's completely full featured. >>We put all of that out in the open. There's no turbo version of Hadoop that we've got hiding in the closet for our, our four pay customers. We're absolutely making investment. But I think, you know, when you think about it from the vendor perspective, and that's my bias. So I always think about, I treat all of the potential partners as really my customer. And when you think about it from that perspective, the things that matter most to vendors, number one, transparency. They need to understand exactly what our business model is, where we plan to make money and where we plan, don't make money. They need to know what we're really good at developing and what we're not so good at developing. And sort of where we draw the, the boundaries around that investment. I think, you know, a testament to that, for example, is tomorrow we're hosting a partner summit. >>So after this event, there are gonna be over 60 individuals, but they max two per per vendor. So we're gonna have over 35 vendors attending this event. And what they're gonna hear from is our entire management team is as deeply as we can and as open as we can. And you know, it, it's, it's, it's funny, you know, I think I saw this article in Forbes the other day about Cloudera. It was this, the title of the article was something like Spies Like Us. And it it, and it, what it highlighted was that some, some competitor of Cloudera had actually hired a, a, a competitive intelligence agency to go on and, and try to engage with, you know, and, and try to learn more about Cloudera. And so they went on to Cora, which we have a lot of active engineers on Cora. And they, you know, they went out and they asked a bunch of product related questions to our to, to someone on Cora. And our engineers immediately responded and they started being very transparent, completely open to what, what they're building and why they're building it. And the article basically summarized to say, Hey, you know what, you know, clearly some people aren't all that sophisticated in figuring out, you know, who they're talking to. And it's really important to do that. And they got the absolute wrong conclusion. Our engineers are actually encouraged and in fact rewarded for being extremely transparent in the market because we believe that it's transparency will ultimately allow us to be that platform vendor. >>And that's what attracts me. Jeff Hummer Bucker, who's active on core as well, he's recruiting there too. So you guys are out engaging the community. Yeah. So just let me just review, cuz this is cool that you're addressing this because Hortonworks and others, and I'll say the name Hortonworks has been pumping up the PR and creating a lot of noise around open and kind of Depositioning Cloudera. So you guys are completely open, a hundred percent Hadoop, open source, everything you build in, in every way, in every way. You have engineers building core, you've got tools and all the other stuff is being built in Cloudera then contributing into the community. >>Actually it's the other way around. We build it and the community@apache.org. So all of our technology is built@apache.org. It's, it's developed there. It's, it's, it's initially shared there. And then we have another team inside our company that pulls down bits from apache.org and then assembles them and integrates them. So it's really, it's a really key thing. And there's no, we do, we have no bits that we don't develop@apache.org that are part of cdh. So there, I mean there can be no mistake that everything that that is in CDH is everything we got. >>So CDH is free. >>It is free >>And every it's open source. It's open you >>Charge enterprise edition. That's the only thing that's different you guys charge >>Yeah. Which is your management console, right. >>Management >>Suite and all kinds of >>The tools. And that's not free and that's not open source. That's correct. Just to be clear. Yep. But so AER took us yesterday through, I don't know, half a dozen probably open source projects and then the one is the, the management console. And that's what you charge for, that's where you're gonna make money? >>Yeah. We, we manufacture, essentially we manufacture two products, but we sell one. So we manufacture the Quadera distribution, including Apache Duke, that's free. It's free. And then we all in open source and built it Apache and, and really heavily tested and well documented and, and, and well integrated. And then we also manufacture quadera Enterprise, which includes support and indemnities and warranties for that full featured CDH product and also includes the Quadra management suite. And >>That's a subscription. >>And that's a subscription. And so customers can, can run cdh, they can then buy and license Cloudera Enterprise and then someday if they decide they don't need Cloud Air Enterprise for whatever reason, if they're, if their team are scripting wizards and they've decided that they, you know, they don't need the extra opportunity for being able to track all of the things that Cloudier Enterprise allows 'em to, they can step off of cloud enterprise and continue to use full feature to do as they see >>Fit. So take an example of one of your partners that you announced this week. NetApp NetApp's gonna package your cdh CDH and the subscription Correct. To their, their customers. And then they're gonna let their channel either, you know, they'll pre bule it or do a reference architecture, you'll get paid for that subscription that's bundled. That's correct. Will make money off of its filers. Yes. And the customer gets a package solution. >>Exactly. Right. And in fact, that's another important thing that you know, is probably worth discussing, which is our go to market model. I don't know if you guys had a chance to talk with anyone yesterday on that, but I'm responsible for our channel strategy and one of the key things that we've agreed to as a, as a company is that we really are gonna go to market through channel partners. Yeah. >>We covered sgi, that was a great announcement. >>Yep, a >>Hundred percent >>As, as close as we can get. Okay. I mean that is our, he's >>Still doing the direct deals. You still have that belly to belly sales force because it's still early, right? So there's a mix of direct and indirects, not a pure >>Indirect, but as, and that's only, that's only as we're able to, until we're able to ramp up our partners fully, in which case we really want our, the current team that is working belly to belly to really support our partners. >>So all so VMware like, but I I wanted to ask >>You VMware, like NetApp, like very similar. >>Yes. Very, very NetApp. Like NetApp probably 75%, you know. Exactly. What are the similarities and differences with VMware in, in the ecosystem? You know it well, >>I do know it well. Yeah. I spent several years working at VMware and you know, I think, I mean the first and most obvious difference is that when you think, when I think about platform software in general, you know, there are a few different flavors of platform. One of the things that makes Hadoop very unique, very unique relative to other platforms is that it, not only is it Apache license, but it really is, it's dependent upon other external innovators to, to create the entire full value of the ecosystem. So, or, or you know, of the solution, right? So unlike for example, so like, let's take a platform like everyone's familiar with like Apple iTunes, right? What happens is Apple creates the platform and they put it kind of in the middle on top of and behind the scenes is the innovator, the app builder, he builds it, he publishes it on Apple, and then Apple controls all access to the >>Customer. Yep. >>That's not adu, right? Right. Let's take VMware or Red Hat for example. So in that case, they publish a platform they own and control the, the absolute structure and boundaries of what that platform is. And then on top of that application vendors build and then they deliver to the, the customer. But you know, at the end of the day, the, you know, the relationship really is, you know, from that external innovator straight down, and there's no, there's, you know, there's no way for them to really modify the platform. And you take kadu, which is a hundred percent Apache licensed to open source, and you really, you really open up the opportunity for vendors to take ADU as an input into their system and then deliver it straight to their customers or for customers themselves to say, I want straight up vanilla Hadoop, I'm gonna go this way and I'm gonna add on my own be app of applications. So you're, we're seeing all sorts of variants right now in the market. We're seeing software as a service being delivered that's based on Hadoop. There was a great announcement a few weeks ago from a company named Tidemark, previously known as Per Ferry, and they're taking all of cdh. They're, but they're, the customer doesn't know that they're, and what they're doing is they're delivering software as a, as a service based on adu. >>Yeah. So I mean, you know, we are psyched that you're clearing this up because obviously we're seeing, we saw all that stuff, but I really think that indirect strategy as a home run, I'm said it when we talked about the SGI thing, and it's accelerates you guys, you enable, but you know, channels is an interesting business. I mean the, you have to have pure transparency as you mentioned, but they need comp, people need confidence and, and they don't, they worry about competition. So channel conflict is always the big issue, right? Right. Is Cloudera gonna compete with us? So talk that, talk us through that, that strategy. So obviously the market's growing, new solutions are coming around the corner, These guys wanna make money. I mean channel, it's all about, you know, what have you done for me today? >>Right. That, that is exactly right. And you know what, that's, that's why we decided on the channel strategy specifically around our product is because we recognize that each and every single potential channel partner of ours can actually innovate themselves on top of and create differentiation. And we're not an obstacle to that process. So we provide our platform as an input and we're capable of managing that platform, but ultimately creating differentiation is all in the hands of our partners and we're there to help, but it gives them wide latitudes. So take for example, the differences between Dell and NetApp solution, they are very different reference architectures leveraging the exact same platform. >>Yeah. And they have to make money. I mean, the money making side of it is, you know, people have kind of, don't really talk about that, but, you know, channel partners loyalty is all about who can help them make cash. Right. Right. Exactly. What are you hearing there in terms of the ecosystem? Has the channels Bess and the partnerships or the more as size, what's the profile of your, of your partners? I mean, can you give us the breakdown of Sure. We have what you look like from Dell. We know Dell and NetApp, but they're gear guys. But, >>So a big part of our strategy is to work with IHVs and then Ihv resellers. So you're talking about companies like Dell, like sgi, like NetApp, for example, independent hardware manufacturers. Another part of our strategy though, and a key, a key requirement from our customers is to work with a whole variety of ISVs, particularly in the data management space. So you've got really marquee companies in the database space like IBM's Netezza or Terradata. You've got in companies like Informatica and Talent, you've got companies on the BI side, like Micro Strategy and Tableau. These kinds of technologies are currently in play at our customers that have made substantial investments. And ultimately they want to be able to continue to leverage them with the data platform, whichever data platform that they end up choosing. So we invest considerably there. A big part of that has been our Qera Connect partner program. >>It's an opportunity for us to help the customer to understand which technologies work and work well with, with our platform. It's also an opportunity for us to engage directly and assist the vendor. So one of the things that we created as part of that program is first off, immediate and absolute discounted access to any part of our training. Second, lots of free information, access to our world class knowledge base, access to our support team, direct access to our support team. The, the vendors also get access to a developer portal that would created specifically for them. So if, if you think about it this way, Hadoop gets built@apache.org, but solutions don't get built@apache.org. Right? So what we're really trying to help our vendors do is be able to develop their solutions by having real clear visibility to the API level points of Hadoop. They're not necessarily interested in, in trying to figure out how, how MR two works or, or contributing code to that. >>But they absolutely are interested in figuring out how to run and execute their software on top of a do. So when I think about the things that matter to create an attractive platform, and at the end of the day, that's what we're really trying to do, first and foremost is transparency, right? Second really ultimately is really clear visibility to the APIs and the documentation of that platform so that there's no ambiguity that the, the vendor, this is the user in this case, it's building a solution, can absolutely absorb all of that content really cleanly. And then ultimately, you know, I think it's customers, right? Users of the technology. And I think our download numbers are, they're, they're, there's something we're proud of. >>We, we are, we're hearing good feedback. I mean, the feedback we hear from folks is, yeah, I love how they take away the complexity of handling versions and whatnot. So, you know, I think totally is a great way, The CDH is a great bundle. You know, the questions that we have for you is what are you hearing about the other products, the ones you're actually selling? Does that create the lock in? So that's something that we asked Elmer directly, you know, is that the, is that the lock in and what happens when the deployments get so big? You know, >>I mean, the way, I >>Don't really see an issue there, but that's what people are afraid of. I mean, that's kind of the, it's more of fear. I mean, some people can use that fear and, and >>Play against. I think, I think what we've seen in other markets is that management tools are ultimately interchangeable. And the only way that we're gonna retain a customer is by out innovating the competition on the management side, the lock in, the lock in component, as you will, is not really part of our business model. It's very difficult to achieve with an Apache licensed platform and a management suite that sits on outside of that, that licensed artifact. So ultimately, if we don't owe innovate, we're gonna lose. So we're working on the innovation and that's, >>How's the hiring go? Oh, go ahead. >>I, I had a, I wanted to come back to that. You mentioned download numbers. Can you share the numbers >>With the others? I can't, I can't share them publicly, but what I can say is that they've been on an incredible trajectory. Okay. That, and what we've seen is month to month growth rates, every single month we continue to see really significant growth rates. >>And then I, I had a follow up question on, you talked about the, the partner program. How do you manage all those partners? How do you prioritize them? I mean, the, the hardware vendors, it's pretty easy. There's a few big whales, but the, the ISVs, they're, I mean, your phone, like John said, must be ringing off the hook. How do you juggle that and, and can you do it better than VMware, for example? >>Well, we do it, we handle the, the influx of partner interest in two ways. One, we've been relatively structured with the Quadra Connect partner program, and we make real investments there. So we have dedicated folks that are there to help. We have our engineering team that is actually feeding inputs, and we're, we're leveraging some of the same resources that we provide to our customers and feeding those directly to our partners as well. So that's one way that we handle it. But the other way, frankly, is, I mean, customers help here having access to and, and a real customer population, they help you set priorities pretty quickly. And so we're able to understand what we track in inside of our systems, which, which technologies our customers use. So we know, for example, what percentage of our customer base has has SaaS installed, and we'd like to use that with a, do we know which percentage of our customer base is currently running on Red Hat and which is not. So having core visibility, that helps us to prioritize. >>How about incentives? I mean, obviously channel businesses as, like I said, very fickle people, you know, you know the channel business, I spent, you know, almost a decade in, in HP's channel organization and you know, you have to provide soft dollars. There's a lot of kind of blocking and tackling. You guys are clearly building out that tier one with the SGIs of the world and other vendors, and then get the partner connect program for kinda everyone else who's gonna grow up into a tier one. Yeah. Training, soft dollars incentives. You guys have that going yet, or is the >>Roadmap? We do. And in fact, you know, in addition to the sort of more wide publicized relationships you see with companies like Dell and Cloudera, we're actually building a very successful network of independent ours. And the VAs in general. What we do is we prioritize and select ours based on the top level relationships that we have, because that really helps them to hone in. They've got validation from, for, for example, someone that sells resells. SGI is an organization that now is heard really loud and clear from sgi the, the specific platform configurations that they're gonna represent to their customers, and they ultimately wanna represent them directly. And how we make investments is we're, I mean, the investments we're making ultimately in our sales org, I'm gonna lose the word direct from that conversation because our sales org is being built to help our partners succeed. And I think that's where you're, >>The end game is to go completely indirect and have all your support go into managing that channel. What, what's the mix of revenue generation from your partners? Obviously as a, you know, with sgi they have pre-built channels that you're funneling in, you got NetApp and they're wrapping their products and services around it. How much is services and how much is a solution specifically? Do you have any visibility or a feel for that at this >>Point? I mean, services relative to, You mean for Cloudera particularly, or for our >>Partner? No, for the, for the part. I mean, if I'm a partner, I'm like, Hey, okay, I'm gonna use cdh. I'm on bundles. I don't mind paying you a wholesale if I'm gonna be able to throw off more cash on, you know, deployment and cloud and services, et cetera. And or if I'm a product manufacturer, a product, a solution I fund you in. I need to have that step >>Up a absolutely great question. So depending upon the partner we're dealing with, they like to either monetize or generate their revenue in different ways. So for example, NetApp, NetApp is a company that has very limited services, and their, their focus is a business is really on delivering hardware and software configured together. And they, they rely heavily on a services channel to fulfill, you take in, in contrast to a company like, for example, Dell, which has a very successful services business and really is excited about having service offerings around Hadoop. So it depends upon the company. But when we talk about our VAR channel in particular, one of the things that's a, in an internal acronym, but I'll share it publicly here. We, we call our, our supervisors and what makes them super and why, why we've selected the, the, the organizations that we are selecting right now to be our bar is that they not only can fulfill orders for hardware and software, particularly data management or infrastructure software, but they also have a services team on hand because we recognize that there is a services opportunity with every Hadoop deployment. And we want our partners to have that. So as an organization, we're structuring our, our services staff to facilitate and enable our partners not to be sold >>Directly. Okay. So that's the follow up that I had tomorrow when the partners ask, Okay, what do you want to be when you're really growing up? Is it services, is it software? >>Is it Carter is a software company, Crewing through, >>Oh, er we kind of got ett, well, he didn't say it, but we said it's a operating system. Yeah. >>So given that, so given that, I mean, you can make money on services, right? People need services. Okay, great. >>And partners will make that money for >>Us. And, and you know, early on you, you had to do some of that and you're, you've been very clear about where it's going. It's hard to make money in software when you're given all the software away for free. Well, >>We're not giving all >>The software. I know you've got that piece now, but, but here's my question. As ADU goes into the enterprise, which is clearly doing, is that that whole bundling, like what you're doing with NetApp is that really ultimately how you're gonna start to, to monetize and, and successfully monetize your software, >>Is by pushing it through >>Yeah. Packaging and that bundling that solution, in other words, our enterprise customer is gonna be more receptive to that solution package than say the, the fridge that has been using Hadoop for the last >>Two or three years. I think there's no question about it. If you, if you look at what Quadra Enterprise does, I don't know if, if you've had a chance to attend any of the sessions, maybe where Quadra Enterprise is, is currently being demonstrated. >>We just had Alex Williams as about on the air. Did a review, >>Okays >>Been going good and impressed with it? >>Yeah, there's no question about it. And I, I don't, and Alex probably hasn't seen the new version that, you know, our team is working on and it's, you know, quietly working on in the background. Incredible, incredible developments in, And that's really a function of when you have direct access to so many customers and you're getting so much input and feedback and they're the kinds of access to the kinds of customers we ultimately wanna serve. So real enterprises, what you get is really fast innovation from a really talented team that knows to do well. I mean, we are years ahead on the management side. Absolutely. Years ahead. And you know, I, so I was a guy who worked at VMware for several years, and I can tell you that while the hypervisor itself was, was a core component to VMware success, the monetization strategy was very squarely around vCenter. Yeah. Yes. Out. And we're not ignorant to that. Yeah. >>You can learn a lot from your VMware experience cause absolutely. The, the market changed significantly. And, you know, >>There were free hypervisors available all of a sudden. VMware itself had a free hypervisor. We had, we had VMware server and we had also our VMware player products, right? And those were all free. And they were very good technology. They were the best available in the market for free. And they were better, in my opinion, they were better than anything else. Open or not. No, our time >>Too, since still >>Are, they were, they, they were, they were superior products in every way. But yet how VMware was successful was recognizing that in the interest of running a production environment with an sola, you need management software. And they've also built the best management software. And there's no question that we understand that strategy and >>A phenomenal ecosystem. I mean, there's the >>Similarities, right? They did. And you, and the, and the ecosystem was in, in large part predicated on transparency act, very clear access to the APIs and a willingness to help partners be successful with those APIs. And ultimately drawing a very tight box about what the company wanted to do and didn't want to do. >>I mean, look, you're not, you're not gonna lose friends when you make people money. That's my philosophy, right? I agree. So when you're in that business where you can come in and enable a channel and have options on your growth strategy, which you do, I mean, you can say, Okay, bundling, I can go, you know, I can have this sold direct, or at least as long as you've got the options, you can grow with that market. So, you know, again, the, it's a money making opportunity for the partnerships, but there's >>More than that, right? Because you mentioned Apple, iTunes, Oracle's another example. And the way you make money with Apple and the way you make money with Oracle is different than the way you make money with VMware and presumably Cloudera. >>Yeah, I mean, our strategy is, if you make this base platform easier to install, more reliable, and you make it ultimately, you know, really rock solid from an integration standpoint, more people are gonna use it. So what happens when more people use it? First thing that happens is more solu, it's out there. So it's more solutions get built. When more solutions get built, then you see more clusters get developed. When more clusters are out there, they start to move into production. And then they, they need an sla when they need an sla, Cloudera and Enterprise gets purchased. But along that path, when those solutions got built, guess what else happened? More cloud units got sold, more servers got sold, more networking. Gear got sold, more services got created. You get, you get ultimately more operating systems got sold, more databases, got data into them, more BI clients got created. The ecosystem is deep and rich, and a lot of people stand to make money hop >>In people. The water's great. >>What about, what about support? Okay, so, you know, the other guys are saying, We're just gonna make money on support. I mean support, You guys still are doing support, right? I mean, you're selling >>Support. There's no question. Quad Enterprise contains two things, right? The management suite and support this is, this is not uncomplicated technology and having a world class support team is of value and customers do want to pay for that value. But we, we believe that support in and of itself is not enough. And that ultimately, when you wanna deliver an sla, being able to call when you have a problem is the wrong approach. You want to be proactive and understand the problem well in advance of it actually occurring. That's really important. When, for example, if you're a customer, a lot of our customers have a data pipeline that >>They, they're building out basically. I mean they're, it's, it's new and emerging. So they're building out, It's not just support. They need other tools. >>Yeah. And it building out I think is an understatement for some, where some of our customers are. I mean, when you have a thousand node cluster that you're operating Yeah, Yeah. To, that's mission critical to your business. I don't think that's building out anymore. I think that's an investment in a technology that's mission critical. And what you wanna see when you have a mission critical technology is you wanna know early and often when a problem may emerge. Not, Oh, oh my gosh, we have a problem now I need to go, you know, phone a friend, phone a friend is, is kind of a last resort. We offer that. But what we really do is, and that's the, that's the beau, That's why we don't decouple our support from our management suite. It's not about phone a friend. It's about understanding the operation of your cluster the entire way through 24. >>And the other op the other thing that people don't talk about in the support is that with open source, a lot of support gets handled in the community as well. So like That's right. So in a way, you're already pre cannibalized with the community >>By us and by others. Absolutely. But you, you'll never see to that Forbes article I referenced earlier. You will never, you will not see our, our engineers are not trained to withhold information and under any circumstances to anyone free or paying. Yeah. This is about getting, You >>Don't wanna hold back your business. I mean, you have nothing to hide. It's open rights. >>Open source. It's open. And we're here to help. We're here to help. Whether you're paying us or not, >>This is value to that anticipatory >>Remediation. Yeah. That's what you're packaging and clearing up the air. Great. Great cube guest, you're awesome on the cube. Gonna have you more on because great to get the info out there. Really impressed with the channel strategy. Love the love the growth strategy, the cloud air. You guys are really impressive. I'm really, really impressed to see that you guys got everything pumping on all cylinders, Kirk, and you are cranking out on the business execution. We're in the team playing this chest mask open. Perfect. So great. Congratulations. Great. Thanks. You guys just in the financing. >>Oh, thank you as >>Well. Hey, Ed from Cloudera, clearing it up here inside the cube. We're gonna take a quick break and we'll be right back with more video. >>Thanks guys. All right.

Published Date : Apr 30 2012

SUMMARY :

Ed, welcome to the Cube. All right, Thanks guys. Good to see you as well, I mean, you know, here at Hadoop World Cloudera, the ecosystem. And of course, you know, as a result, you know, lots and lots of customer I know you get the partner program, but what's your strategy for Phil, how to continue And, and, you know, one of the core, you know, sort of corporate strategy, but for the sake of the audience here, what I'd like to do is say, say, first off, you know, first and foremost this I think, you know, a testament to that, for example, is tomorrow we're hosting a partner summit. And you know, it, it's, it's, it's funny, you know, I think I saw this article So you guys are out engaging the community. And then we have another team inside our company that pulls down bits from apache.org and then assembles them and integrates It's open you That's the only thing that's different you guys charge And that's what you charge for, that's where you're gonna make money? And then we also manufacture quadera Enterprise, if they're, if their team are scripting wizards and they've decided that they, you know, either, you know, they'll pre bule it or do a reference architecture, you'll get paid for that subscription And in fact, that's another important thing that you know, is probably worth discussing, I mean that is our, he's You still have that belly to belly sales force because it's still early, right? Indirect, but as, and that's only, that's only as we're able to, until we're able to ramp up our partners fully, Like NetApp probably 75%, you know. I mean the first and most obvious difference is that when you think, when I think about platform software in Yep. But you know, at the end of the day, the, you know, the relationship really is, I mean the, you have to have pure transparency as you mentioned, but they need comp, And you know what, that's, that's why we decided on the channel strategy specifically I mean, the money making side of it is, you know, people have kind of, don't really talk about that, So a big part of our strategy is to work with IHVs and then Ihv resellers. So if, if you think about it And then ultimately, you know, I think it's customers, You know, the questions that we have for you is what are you hearing about I mean, that's kind of the, it's more of fear. the lock in, the lock in component, as you will, is not really part of our business model. How's the hiring go? Can you share the numbers I can't, I can't share them publicly, but what I can say is that they've been on an incredible And then I, I had a follow up question on, you talked about the, the partner program. So we know, for example, what percentage of our customer base has has SaaS installed, and we'd like to use that with a, and you know, you have to provide soft dollars. And in fact, you know, in addition to the sort of more wide publicized relationships you see with companies like Dell Obviously as a, you know, if I'm gonna be able to throw off more cash on, you know, deployment and cloud and services, So for example, NetApp, NetApp is a company that has very limited services, Is it services, is it software? Oh, er we kind of got ett, well, he didn't say it, but we said it's a operating system. So given that, so given that, I mean, you can make money on services, right? Us. And, and you know, early on you, you had to do some of that and you're, you've been very clear about where it's going. that really ultimately how you're gonna start to, to monetize and, and successfully monetize your to that solution package than say the, the fridge that has been using Hadoop for the last I don't know if, if you've had a chance to attend any of the sessions, maybe where Quadra Enterprise is, We just had Alex Williams as about on the air. you know, our team is working on and it's, you know, quietly working on in the background. And, you know, And they were very that in the interest of running a production environment with an sola, you need management software. I mean, there's the And ultimately drawing a very tight box about what the company wanted to do and didn't want to do. So, you know, again, And the way you make money with Apple and Yeah, I mean, our strategy is, if you make this base platform easier to install, The water's great. Okay, so, you know, the other guys are saying, We're just gonna make money on support. And that ultimately, when you wanna deliver an sla, being able to call when you have a problem is the wrong approach. So they're building out, It's not just support. And what you wanna see when And the other op the other thing that people don't talk about in the support is that with open source, a lot of support gets handled in the You will never, you will not see our, our engineers are not trained to withhold information and under any circumstances to I mean, you have nothing to hide. And we're here to help. I'm really, really impressed to see that you guys got everything pumping on all cylinders, Kirk, and you are cranking We're gonna take a quick break and we'll be right back with more All right.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

EMCORGANIZATION

0.99+

MikePERSON

0.99+

DellORGANIZATION

0.99+

EdPERSON

0.99+

JohnPERSON

0.99+

OracleORGANIZATION

0.99+

AppleORGANIZATION

0.99+

PhilPERSON

0.99+

Alex WilliamsPERSON

0.99+

ClouderaORGANIZATION

0.99+

Jeff Hummer BuckerPERSON

0.99+

last yearDATE

0.99+

AlexPERSON

0.99+

yesterdayDATE

0.99+

two productsQUANTITY

0.99+

SGIORGANIZATION

0.99+

half a dozenQUANTITY

0.99+

HPORGANIZATION

0.99+

SecondQUANTITY

0.99+

Ed AlbanesePERSON

0.99+

Jeff Harmar BaerPERSON

0.99+

75%QUANTITY

0.99+

CoraORGANIZATION

0.99+

Spies Like UsTITLE

0.99+

HortonworksORGANIZATION

0.99+

TidemarkORGANIZATION

0.99+

two thingsQUANTITY

0.99+

InformaticaORGANIZATION

0.99+

community@apache.orgOTHER

0.99+

NetAppORGANIZATION

0.99+

firstQUANTITY

0.99+

twiceQUANTITY

0.99+

VMwareORGANIZATION

0.99+

Hundred percentQUANTITY

0.99+

tomorrowDATE

0.99+

this weekDATE

0.99+

TerradataORGANIZATION

0.98+

past yearDATE

0.98+

Cloudier EnterpriseTITLE

0.98+

TwoQUANTITY

0.98+

two waysQUANTITY

0.98+

built@apache.orgOTHER

0.98+

over 60 individualsQUANTITY

0.98+

MichaelsonPERSON

0.98+

ClouderaTITLE

0.98+

one yearQUANTITY

0.98+

NetezzaORGANIZATION

0.98+

HadoopTITLE

0.98+

OneQUANTITY

0.98+

oneQUANTITY

0.98+

TalentORGANIZATION

0.98+

three yearsQUANTITY

0.98+

one wayQUANTITY

0.98+

Justin Emerson, Pure Storage | SuperComputing 22


 

(soft music) >> Hello, fellow hardware nerds and welcome back to Dallas Texas where we're reporting live from Supercomputing 2022. My name is Savannah Peterson, joined with the John Furrier on my left. >> Looking good today. >> Thank you, John, so are you. It's been a great show so far. >> We've had more hosts, more guests coming than ever before. >> I know. >> Amazing, super- >> We've got a whole thing going on. >> It's been a super computing performance. >> It, wow. And, we'll see how many times we can say super on this segment. Speaking of super things, I am in a very unique position right now. I am a flanked on both sides by people who have been doing content on theCUBE for 12 years. Yes, you heard me right, our next guest was on theCUBE 12 years ago, the third event, was that right, John? >> Man: First ever VM World. >> Yeah, the first ever VM World, third event theCUBE ever did. We are about to have a lot of fun. Please join me in welcoming Justin Emerson of Pure Storage. Justin, welcome back. >> It's a pleasure to be here. It's been too long, you never call, you don't write. (Savannah laughs) >> Great to see you. >> Yeah, likewise. >> How fun is this? Has the set evolved? Is everything looking good? >> I mean, I can barely remember what happened last week, so. (everyone laughs) >> Well, I remember lot's changed that VM world. You know, Paul Moritz was the CEO if you remember at that time. His actual vision actually happened but not the way, for VMware, but the industry, the cloud, he called the software mainframe. We were kind of riffing- >> It was quite the decade. >> Unbelievable where we are now, how we got here, but not where we're going to be. And you're with Pure Storage now which we've been, as you know, covering as well. Where's the connection into the supercomputing? Obviously storage performance, big part of this show. >> Right, right. >> What's the take? >> Well, I think, first of all it's great to be back at events in person. We were talking before we went on, and it's been so great to be back at live events now. It's been such a drought over the last several years, but yeah, yeah. So I'm very glad that we're doing in person events again. For Pure, this is an incredibly important show. You know, the product that I work with, with FlashBlade is you know, one of our key areas is specifically in this high performance computing, AI machine learning kind of space. And so we're really glad to be here. We've met a lot of customers, met a lot of other folks, had a lot of really great conversations. So it's been a really great show for me. And also just seeing all the really amazing stuff that's around here, I mean, if you want to find, you know, see what all the most cutting edge data center stuff that's going to be coming down the pipe, this is the place to do it. >> So one of the big themes of the show for us and probably, well, big theme of your life, is balancing power efficiency. You have a product in this category, Direct Flash. Can you tell us a little bit more about that? >> Yeah, so Pure as a storage company, right, what do we do differently from everybody else? And if I had to pick one thing, right, I would talk about, it's, you know, as the name implies, we're an all, we're purely flash, we're an all flash company. We've always been, don't plan to be anything else. And part of that innovation with Direct Flash is the idea of rather than treating a solid state disc as like a hard drive, right? Treat it as it actually is, treat it like who it really is and that's a very different kind of thing. And so Direct Flash is all about bringing native Flash interfaces to our product portfolio. And what's really exciting for me as a FlashBlade person, is now that's also part of our FlashBlade S portfolio, which just launched in June. And so the benefits of that are our myriad. But, you know, talking about efficiency, the biggest difference is that, you know, we can use like 90% less DRAM in our drives, which you know, everything uses, everything that you put in a drive uses power, it adds cost and all those things and so that really gives us an efficiency edge over everybody else and at a show like this, where, I mean, you walk the aisles and there's there's people doing liquid cooling and so much immersion stuff, and the reason they're doing that is because power is just increasing everywhere, right? So if you can figure out how do we use less power in some areas means you can shift that budget to other places. So if you can talk to a customer and say, well, if I could shrink your power budget for storage by two thirds or even, save you two-thirds of power, how many more accelerators, how many more CPUs, how much more work could you actually get done? So really exciting. >> I mean, less power consumption, more power and compute. >> Right. >> Kind of power center. So talk about the AI implications, where the use cases are. What are you seeing here? A lot of simulations, a lot of students, again, dorm room to the boardroom we've been saying here on theCUBE this is a great broad area, where's the action in the ML and the AI for you guys? >> So I think, not necessarily storage related but I think that right now there's this enormous explosion of custom silicon around AI machine learning which I as a, you said welcome hardware nerds at the beginning and I was like, ah, my people. >> We're all here, we're all here in Dallas. >> So wonderful. You know, as a hardware nerd we're talking about conferences, right? Who has ever attended hot chips and there's so much really amazing engineering work going on in the silicon space. It's probably the most exciting time for, CPU and accelerator, just innovation in, since the days before X 86 was the defacto standard, right? And you could go out and buy a different workstation with 16 different ISAs. That's really the most exciting thing, I walked past so many different places where you know, our booth is right next to Havana Labs with their gout accelerator, and they're doing this cute thing with one of the AI image generators in their booth, which is really cute. >> Woman: We're going to have to go check that out. >> Yeah, but that to me is like one of the more exciting things around like innovation at a, especially at a show like this where it's all about how do we move forward, the state of the art. >> What's different now than just a few years ago in terms of what's opening up the creativity for people to look at things that they could do with some of the scale that's different now. >> Yeah well, I mean, every time the state of the art moves forward what it means is, is that the entry level gets better, right? So if the high end is going faster, that means that the mid-range is going faster, and that means the entry level is going faster. So every time it pushes the boundary forward, it's a rising tide that floats all boats. And so now, the kind of stuff that's possible to do, if you're a student in a dorm room or if you're an enterprise, the world, the possible just keeps expanding dramatically and expanding almost, you know, geometrically like the amount of data that we are, that we have, as a storage guy, I was coming back to data but the amount of data that we have and the amount of of compute that we have, and it's not just about the raw compute, but also the advances in all sorts of other things in terms of algorithms and transfer learning and all these other things. There's so much amazing work going on in this area and it's just kind of this Kay Green explosion of innovation in the area. >> I love that you touched on the user experience for the community, no matter the level that you're at. >> Yeah. >> And I, it's been something that's come up a lot here. Everyone wants to do more faster, always, but it's not just that, it's about making the experience and the point of entry into this industry more approachable and digestible for folks who may not be familiar, I mean we have every end of the ecosystem here, on the show floor, where does Pure Storage sit in the whole game? >> Right, so as a storage company, right? What AI is all about deriving insights from data, right? And so everyone remembers that magazine cover data's the new oil, right? And it's kind of like, okay, so what do you do with it? Well, how do you derive value from all of that data? And AI machine learning and all of this supercomputing stuff is about how do we take all this data? How do we innovate with it? And so if you want data to innovate with, you need storage. And so, you know, our philosophy is that how do we make the best storage platforms that we can using the best technology for our customers that enable them to do really amazing things with AI machine learning and we've got different products, but, you know at the show here, what we're specifically showing off is our new flashlight S product, which, you know, I know we've had Pure folks on theCUBE before talking about FlashBlade, but for viewers out there, FlashBlade is our our scale out unstructured data platform and AI and machine learning and supercomputing is all about unstructured data. It's about sensor data, it's about imaging, it's about, you know, photogrammetry, all this other kinds of amazing stuff. But, you got to land all that somewhere. You got to process that all somewhere. And so really high performance, high throughput, highly scalable storage solutions are really essential. It's an enabler for all of the amazing other kinds of engineering work that goes on at a place like Supercomputing. >> It's interesting you mentioned data's oil. Remember in 2010, that year, our first year of theCUBE, Hadoop World, Hadoop just started to come on the scene, which became, you know kind of went away and, but now you got, Spark and Databricks and Snowflake- >> Justin: And it didn't go away, it just changed, right? >> It just got refactored and right size, I think for what the people wanted it to be easy to use but there's more data coming. How is data driving innovation as you bring, as people see clearly the more data's coming? How is data driving innovation as you guys look at your products, your roadmap and your customer base? How is data driving innovation for your customers? >> Well, I think every customer who has been, you know collecting all of this data, right? Is trying to figure out, now what do I do with it? And a lot of times people collect data and then it will end up on, you know, lower slower tiers and then suddenly they want to do something with it. And it's like, well now what do I do, right? And so there's all these people that are reevaluating you know, we, when we developed FlashBlade we sort of made this bet that unstructured data was going to become the new tier one data. It used to be that we thought unstructured data, it was emails and home directories and all that stuff the kind of stuff that you didn't really need a really good DR plan on. It's like, ah, we could, now of course, as soon as email goes down, you realize how important email is. But, the perspectives that people had on- >> Yeah, exactly. (all laughing) >> The perspectives that people had on unstructured data and it's value to the business was very different and so now- >> Good bet, by the way. >> Yeah, thank you. So now unstructured data is considered, you know, where companies are going to derive their value from. So it's whether they use the data that they have to build better products whether it's they use the data they have to develop you know, improvements in processes. All those kinds of things are data driven. And so all of the new big advancements in industry and in business are all about how do I derive insights from data? And so machine learning and AI has something to do with that, but also, you know, it all comes back to having data that's available. And so, we're working very hard on building platforms that customers can use to enable all of this really- >> Yeah, it's interesting, Savannah, you know, the top three areas we're covering for reinventing all the hyperscale events is data. How does it drive innovation and then specialized solutions to make customers lives easier? >> Yeah. >> It's become a big category. How do you compose stuff and then obviously compute, more and more compute and services to make the performance goes. So those seem to be the three hot areas. So, okay, data's the new oil refineries. You've got good solutions. What specialized solutions do you see coming out because once people have all this data, they might have either large scale, maybe some edge use cases. Do you see specialized solutions emerging? I mean, obviously it's got DPU emerging which is great, but like, do you see anything else coming out at that people are- >> Like from a hardware standpoint. >> Or from a customer standpoint, making the customer's lives easier? So, I got a lot of data flowing in. >> Yeah. >> It's never stopping, it keeps powering in. >> Yeah. >> Are there things coming out that makes their life easier? Have you seen anything coming out? >> Yeah, I think where we are as an industry right now with all of this new technology is, we're really in this phase of the standards aren't quite there yet. Everybody is sort of like figuring out what works and what doesn't. You know, there was this big revolution in sort of software development, right? Where moving towards agile development and all that kind of stuff, right? The way people build software change fundamentally this is kind of like another wave like that. I like to tell people that AI and machine learning is just a different way of writing software. What is the output of a training scenario, right? It's a model and a model is just code. And so I think that as all of these different, parts of the business figure out how do we leverage these technologies, what it is, is it's a different way of writing software and it's not necessarily going to replace traditional software development, but it's going to augment it, it's going to let you do other interesting things and so, where are things going? I think we're going to continue to start coalescing around what are the right ways to do things. Right now we talk about, you know, ML Ops and how development and the frameworks and all of this innovation. There's so much innovation, which means that the industry is moving so quickly that it's hard to settle on things like standards and, or at least best practices you know, at the very least. And that the best practices are changing every three months. Are they really best practices right? So I think, right, I think that as we progress and coalesce around kind of what are the right ways to do things that's really going to make customers' lives easier. Because, you know, today, if you're a software developer you know, we build a lot of software at Pure Storage right? And if you have people and developers who are familiar with how the process, how the factory functions, then their skills become portable and it becomes easier to onboard people and AI is still nothing like that right now. It's just so, so fast moving and it's so- >> Wild West kind of. >> It's not standardized. It's not industrialized, right? And so the next big frontier in all of this amazing stuff is how do we industrialize this and really make it easy to implement for organizations? >> Oil refineries, industrial Revolution. I mean, it's on that same trajectory. >> Yeah. >> Yeah, absolutely. >> Or industrial revolution. (John laughs) >> Well, we've talked a lot about the chaos and sort of we are very much at this early stage stepping way back and this can be your personal not Pure Storage opinion if you want. >> Okay. >> What in HPC or AIML I guess it all falls under the same umbrella, has you most excited? >> Ooh. >> So I feel like you're someone who sees a lot of different things. You've got a lot of customers, you're out talking to people. >> I think that there is a lot of advancement in the area of natural language processing and I think that, you know, we're starting to take things just like natural language processing and then turning them into vision processing and all these other, you know, I think the, the most exciting thing for me about AI is that there are a lot of people who are, you are looking to use these kinds of technologies to make technology more inclusive. And so- >> I love it. >> You know the ability for us to do things like automate captioning or the ability to automate descriptive, audio descriptions of video streams or things like that. I think that those are really,, I think they're really great in terms of bringing the benefits of technology to more people in an automated way because the challenge has always been bandwidth of how much a human can do. And because they were so difficult to automate and what AI's really allowing us to do is build systems whether that's text to speech or whether that's translation, or whether that's captioning or all these other things. I think the way that AI interfaces with humans is really the most interesting part. And I think the benefits that it can bring there because there's a lot of talk about all of the things that it does that people don't like or that they, that people are concerned about. But I think it's important to think about all the really great things that maybe don't necessarily personally impact you, but to the person who's not cited or to the person who you know is hearing impaired. You know, that's an enormously valuable thing. And the fact that those are becoming easier to do they're becoming better, the quality is getting better. I think those are really important for everybody. >> I love that you brought that up. I think it's a really important note to close on and you know, there's always the kind of terminator, dark side that we obsess over but that's actually not the truth. I mean, when we think about even just captioning it's a tool we use on theCUBE. It's, you know, we see it on our Instagram stories and everything else that opens the door for so many more people to be able to learn. >> Right? >> And the more we all learn, like you said the water level rises together and everything is magical. Justin, it has been a pleasure to have you on board. Last question, any more bourbon tasting today? >> Not that I'm aware of, but if you want to come by I'm sure we can find something somewhere. (all laughing) >> That's the spirit, that is the spirit of an innovator right there. Justin, thank you so much for joining us from Pure Storage. John Furrier, always a pleasure to interview with you. >> I'm glad I can contribute. >> Hey, hey, that's the understatement of the century. >> It's good to be back. >> Yeah. >> Hopefully I'll see you guys in, I'll see you guys in 2034. >> No. (all laughing) No, you've got the Pure Accelerate conference. We'll be there. >> That's right. >> We'll be there. >> Yeah, we have our Pure Accelerate conference next year and- >> Great. >> Yeah. >> I love that, I mean, feel free to, you know, hype that. That's awesome. >> Great company, great runs, stayed true to the mission from day one, all Flash, continue to innovate congratulations. >> Yep, thank you so much, it's pleasure being here. >> It's a fun ride, you are a joy to talk to and it's clear you're just as excited as we are about hardware, so thanks a lot Justin. >> My pleasure. >> And thank all of you for tuning in to this wonderfully nerdy hardware edition of theCUBE live from Dallas, Texas, where we're at, Supercomputing, my name's Savannah Peterson and I hope you have a wonderful night. (soft music)

Published Date : Nov 16 2022

SUMMARY :

and welcome back to Dallas Texas It's been a great show so far. We've had more hosts, more It's been a super the third event, was that right, John? Yeah, the first ever VM World, It's been too long, you I mean, I can barely remember for VMware, but the industry, the cloud, as you know, covering as well. and it's been so great to So one of the big the biggest difference is that, you know, I mean, less power consumption, in the ML and the AI for you guys? nerds at the beginning all here in Dallas. places where you know, have to go check that out. Yeah, but that to me is like one of for people to look at and the amount of of compute that we have, I love that you touched and the point of entry It's an enabler for all of the amazing but now you got, Spark and as you guys look at your products, the kind of stuff that Yeah, exactly. And so all of the new big advancements Savannah, you know, but like, do you see a hardware standpoint. the customer's lives easier? It's never stopping, it's going to let you do And so the next big frontier I mean, it's on that same trajectory. (John laughs) a lot about the chaos You've got a lot of customers, and I think that, you know, or to the person who you and you know, there's always And the more we all but if you want to come by that is the spirit of an Hey, hey, that's the Hopefully I'll see you guys We'll be there. free to, you know, hype that. all Flash, continue to Yep, thank you so much, It's a fun ride, you and I hope you have a wonderful night.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Paul MoritzPERSON

0.99+

JustinPERSON

0.99+

Justin EmersonPERSON

0.99+

JohnPERSON

0.99+

Savannah PetersonPERSON

0.99+

SavannahPERSON

0.99+

DallasLOCATION

0.99+

JuneDATE

0.99+

John FurrierPERSON

0.99+

12 yearsQUANTITY

0.99+

2010DATE

0.99+

Kay GreenPERSON

0.99+

Dallas, TexasLOCATION

0.99+

third eventQUANTITY

0.99+

Dallas TexasLOCATION

0.99+

last weekDATE

0.99+

12 years agoDATE

0.99+

two-thirdsQUANTITY

0.99+

FirstQUANTITY

0.98+

VM WorldEVENT

0.98+

firstQUANTITY

0.98+

two thirdsQUANTITY

0.98+

Havana LabsORGANIZATION

0.98+

Pure AccelerateEVENT

0.98+

next yearDATE

0.98+

todayDATE

0.98+

both sidesQUANTITY

0.98+

Pure StorageORGANIZATION

0.97+

first yearQUANTITY

0.97+

16 different ISAsQUANTITY

0.96+

FlashBladeTITLE

0.96+

three hot areasQUANTITY

0.94+

threeQUANTITY

0.94+

SnowflakeORGANIZATION

0.93+

oneQUANTITY

0.93+

2034DATE

0.93+

one thingQUANTITY

0.93+

SupercomputingORGANIZATION

0.9+

90% lessQUANTITY

0.89+

theCUBEORGANIZATION

0.86+

agileTITLE

0.84+

VM worldEVENT

0.84+

few years agoDATE

0.81+

day oneQUANTITY

0.81+

Hadoop WorldORGANIZATION

0.8+

VMwareORGANIZATION

0.79+

InstagramORGANIZATION

0.78+

Spark andORGANIZATION

0.77+

HadoopORGANIZATION

0.74+

yearsDATE

0.73+

lastDATE

0.73+

three monthsQUANTITY

0.69+

FlashBladeORGANIZATION

0.68+

Direct FlashTITLE

0.67+

yearDATE

0.65+

tier oneQUANTITY

0.58+

SupercomputingTITLE

0.58+

DirectTITLE

0.56+

FlashORGANIZATION

0.55+

86TITLE

0.55+

acesQUANTITY

0.55+

PureORGANIZATION

0.51+

DatabricksORGANIZATION

0.5+

2022ORGANIZATION

0.5+

XEVENT

0.45+

Ali Ghodsi, Databricks | Supercloud22


 

(light hearted music) >> Okay, welcome back to Supercloud '22. I'm John Furrier, host of theCUBE. We got Ali Ghodsi here, co-founder and CEO of Databricks. Ali, Great to see you. Thanks for spending your valuable time to come on and talk about Supercloud and the future of all the structural change that's happening in cloud computing. >> My pleasure, thanks for having me. >> Well, first of all, congratulations. We've been talking for many, many years, and I still go back to the video that we have in archive, you talking about cloud. And really, at the beginning of the big reboot, I called the post Hadoop, a revitalization of data. Congratulations, you've been cloud-first, now on multiple clouds. Congratulations to you and your team for achieving what looks like a billion dollars in annualized revenue as reported by the Wall Street Journal, so first, congratulations. >> Thank you so much, appreciate it. >> So I was talking to some young developers and I asked a random poll, what do you think about Databricks? Oh, we love those guys, they're AI and ML-native, and that's their advantage over the competition. So I pressed why. I don't think they knew why, but that's an interesting perspective. This idea of cloud native, AI/ML-native, ML Ops, this has been a big trend and it's continuing. This is a big part of how this change and this structural change is happening. How do you react to that? And how do you see Databricks evolving into this new Supercloud-like multi-cloud environment? >> Yeah, look, I think it's a continuum. It starts with having data, but they want to clean it, you know, and they want to get insights out of it. But then, eventually, you'd like to start asking questions, doing reports, maybe ask questions about what was my revenue yesterday, last week, but soon you want to start using the crystal ball, predictive technology. Okay, but what will my revenue be next week? Next quarter? Who's going to churn? And if you can finally automate that completely so that you can act on the predictions, right? So this credit card that got swiped, the AI thinks it's fraud, we're going to deny it. That's when you get real value. So we're trying to help all these organizations move through this data AI maturity curve, all the way to that, the prescriptive, automated AI machine learning. That's when you get real competitive advantage. And you know, we saw that with the fans, right? I mean, Google wouldn't be here today if it wasn't for AI. You know, we'd be using AltaVista or something. We want to help all organizations to be able to leverage data and AI that way that the fans did. >> One of the things we're looking at with supercloud and why we call it supercloud versus other things like multi-cloud is that today a lot of the successful companies have started in the cloud have been successful, but have realized and even enterprises who have gotten by accident, and maybe have done nothing with cloud have just some cloud projects on multiple clouds. So, people have multiple cloud operational things going on but it hasn't necessarily been a strategy per se. It's been more of kind of a default reaction to things but the ones that are innovating have been successful in one native cloud because the use cases that drove that got scale got value, and then they're making that super by bringing it on premise, putting in a modern data stack, for the modern application development, and kind of dealing with the things that you guys are in the middle of with data bricks is that, that is where the action is, and they don't want to go, lose the trajectory in all the economies of scale. So we're seeing another structural change where the evolutionary nature of the cloud has solved a bunch of use cases, but now other use cases are emerging that's on premises and edge that have been driven by applications because of the developer boom, that's happening. You guys are in the middle of it. What is happening with this structural change? Are people looking for the modern data stack? Are they looking for more AI? What's the, what's your perspective on this supercloud kind of position? >> Look, it started with not AR on multiple clouds, right? So multi-cloud has been a thing. It became a thing 70, 80% of our customers when you ask them, they're more than one cloud. But then soon to start realizing that, hey, you know, if I'm on multiple clouds, this data stuff is hard enough as it is. Do I want to redo it again and again with different proprietary technologies, on each of the clouds. And that's when I started thinking about let's standardize this, let's figure out a way which just works across them. That's where I think open source comes in, becomes really important. Hey, can we leverage open standards because then we can make it work in these different environments, as we said so that we can actually go super, as you said, that's one. The second thing is, can we simplify it? You know, and I think today, the data landscape is complicated. Conceptually it's simple. You have data which is essentially customer data that you have, maybe employee data. And you want to get some kind of insights from that. But how you do that is very complicated. You have to buy data warehouse, hire data analysts. You have to buy, store stuff in the Delta Lake you know, get your data engineers. If you want streaming real time thing that's another complete different set of technologies you have to buy. And then you have to stitch all these together, and you have to do again and again on every cloud. So they just want simplification. So that's why we're big believers in this Delta Lakehouse concept. Which is an open standard to simplifying this data stack and help people to just get value out of their data in any environment. So they can do that in this sort of supercloud as you call it. >> You know, we've been talking about that in previous interviews, do the heavy lifting let them get the value. I have to ask you about how you see that going forward, Because if I'm a customer, I have a lot of operational challenges. Cause the developers are are kicking butt right now. We see that clearly. Open sources growing at, and continue to be great. But ops and security teams they really care about this stuff. And most companies don't want to spin up multiple ops teams to deal with different stacks. This is one big problem that I think that's leading into the multi-cloud viability. How do you guys deal with that? How do you talk to customers when they say, I want to have less complications on operations? >> Yeah, you're absolutely right. You know, it's easy for a developer to adopt all these technologies and new things are coming out all the time. The ops teams are the ones that have to make sure this works. Doing that in multiple different environments is super hard. especially when there's a proprietary stack in each environment that's different. So they just want standardization. They want open source, that's super important. We hear that all the time from them. They want open the source technologies. They believe in the communities around it. You know, they know that source code is open. So you can also see if there's issues with it. If there's security breaches, those kind of things that they can have a community around it. So they can actually leverage that. So they're the ones that are really pushing this, and we're seeing it across the board. You know, it starts first with the digital natives you know, the companies that are, but slowly it's also now percolating to the other organizations, we're hearing across the board. >> Where are we, Ali on the innovation strategies for customers? Where are they on the trajectory around how they're building out their teams? How are they looking at the open source? How are they extending the value proposition of Databricks, and data at scale, as they start to build out their teams and operations, because some are like kind of starting, crawl, walk, run, kind of vibe. Some are big companies, they're dealing with data all the time. Where are they in their journey? What's the core issues that they're solving? What are some of the use cases that you see that are most pressing in customer? >> Yeah, what I've seen, that's really exciting about this Delta Lakehouse concept is that we're now seeing a lot of use cases around real time. So real time fraud detection, real time stock ticker pricing, anyone that's doing trading, they want that to work real time. Lots of use cases around that. Lots of use cases around how do we in real time drive more engagement on our web assets if we're a media company, right? We have all these assets how do we get people to get engaged? Stay on our sites. Continue engaging with the material we have. Those are real time use cases. And the interesting thing is, they're real time. So, you know, it's really important that you that now you don't want to recommend someone, hey, you should go check out this restaurant if they just came from that restaurant, half an hour ago. So you want it to be real time, but B, that it's also all based on machine learning. These are a lot of this is trying to predict what you want to see, what you want to do, is it fraudulent? And that's also interesting because basically more and more machine learning is coming in. So that's super exciting to see, the combination of real time and machine learning on the Lakehouse. And finally, I would say the Lakehouse is really important for this because that's where the data is flowing in. If they have to take that data that's flowing into the lake and actually copy it into a separate warehouse, that delays the real time use cases. And then it can't hit those real time deadlines. So that's another catalyst for this Lakehouse pattern. >> Would that be an example of how the metrics are changing? Cause I've been looking at some people saying, well you can tell if someone's doing well there's a lot of data being transferred. And then I was saying, well, wait a minute. Data transfer costs money, right? And time. So this is interesting dynamic, in a way you don't want to have a lot of movement, right? >> Yeah, movement actually decreases for a lot of these real time use cases. 'Cause what we saw in the past was that they would run a batch processing to process all the data. So once they process all the data. But actually if you look at the things that have changed since the data that we have yesterday it's actually not that much. So if you can actually incrementally process it in real time, you can actually reduce the cost of transfers and storage and processing. So that's actually a great point. That's also one of the main things that we're seeing with the use cases, the bill shrinks and the cost goes down, and they can process less. >> Yeah, and it'd be interesting to see how those KPIs evolve into industry metrics down the road around the supercloud of evolution. I got to ask you about the open source concept of data platforms. You guys have been a pioneer in there doing great work, kind of picking the baton off where the Hadoop World left off as Dave Vellante always points out. But if working across clouds is super important. How are you guys looking at the ability to work across the different clouds with data bricks? Are you going to build that abstraction yourself? Does data sharing and model sharing kind of come into play there? How do you see this data bricks capability across the clouds? >> Yeah, I mean, let me start by saying, we just we're big fans of open source. We think that open source is a force in software. That's going to continue for, decades, hundreds of years, and it's going to slowly replace all proprietary code in its way. We saw that, it could do that with the most advanced technology. Windows, you know proprietary operating system, very complicated, got replaced with Linux. So open source can pretty much do anything. And what we're seeing with the Delta Lakehouse is that slowly the open source community is building a replacement for the proprietary data warehouse, Delta Lake, machine learning, real time stack in open source. And we're excited to be part of it. For us, Delta Lake is a very important project that really helps you standardize how you layout your data in the cloud. And when it comes a really important protocol called data sharing, that enables you in a open way actually for the first time ever share large data sets between organizations, but it uses an open protocol. So the great thing about that is you don't need to be a Databricks customer. You don't need to even like Databricks, you just need to use this open source project and you can now securely share data sets between organizations across clouds. And it actually does so really efficiently just one copy of the data. So you don't have to copy it if you're within the same cloud. >> So you're playing the long game on open source. >> Absolutely. I mean, this is a force it's going to be there if if you deny it, before you know it there's going to be, something like Linux, that is going to be a threat to your propriety. >> I totally agree by the way. I was just talking to somebody the other day and they're like hey, the software industry someone made the comment, the software industry, the software industry is open source. There's no more software industry, it's called open source. It's integrations that become interesting. And I was looking at integrations now is really where the action is. And we had a panel with the Clouderati we called it, the people have been around for a long time. And it was called the innovator's dilemma. And one of the comments was it's the integrator's dilemma, not the innovator's dilemma. And this is a big part of this piece of supercloud. Can you share your thoughts on how cloud and integration need to be tightened up to really make it super? >> Actually that's a great point. I think the beauty of this is, look the ecosystem of data today is vast, there's this picture that someone puts together every year of all the different vendors and how they relate, and it gets bigger and bigger and messy and messier. So, we see customers use all kinds of different aspects of what's existing in the ecosystem and they want it to be integrated in whatever you're selling them. And that's where I think the power of open source comes in. Open source, you get integrations that people will do without you having to push it. So us, Databricks as a vendor, we don't have to go tell people please integrate with Databricks. The open source technology that we contribute to, automatically, people are integrating with it. Delta Lake has integrations with lots of different software out there and Databricks as a company doesn't have to push that. So I think open source is also another thing that really helps with the ecosystem integrations. Many of these companies in this data space actually have employees that are full-time dedicated to make sure make sure our software works well with Spark. Make sure our software works well with Delta and they contribute back to that community. And that's the way you get this sort of ecosystem to further sort of flourish. >> Well, I really appreciate your time. And I, my final question for you is, as we're kind of unpack and and kind of shape and frame supercloud for the future, how would you see a roadmap or architecture or outcome for companies that are going to clearly be in the cloud where it's open source is going to be dominating. Integrations has got to be seamless and frictionless. Abstraction layer make things super easy and take away the complexity. What is supercloud to them? What does the outcome look like? How would you define a supercloud environment for an enterprise? >> Yeah, for me, it's the simplification that you get where you standardize an open source. You get your data in one place, in one format in one standardized way, and then you can get your insights from it, without having to buy lots of different idiosyncratic proprietary software from different vendors. That's different in each environment. So it's this slow standardization that's happening. And I think it's going to happen faster than we think. And I think in a couple years it's going to be a requirement that, does your software work on all these different departments? Is it based on open source? Is it using this Delta Lake house pattern? And if it's not, I think they're going to demand it. >> Yeah, I feel like we're close to some sort of defacto standard coming and you guys are a big part of it, once that clicks in, it's going to highly accelerate in the open, and I think it's going to be super valuable. Ali, thank you so much for your time, and congratulations to you and your team. Like we've been following you guys since the beginning. Remember the early days and look how far it's come. And again, you guys are really making a big difference in making a super cool environment out there. Thanks for coming on sharing. >> Thank you so much John. >> Okay, this is supercloud 22. I'm John Furrier stay with more for more coverage and more commentary after this break. (light hearted music)

Published Date : Aug 7 2022

SUMMARY :

and the future of all Congratulations to you and your team And how do you see Databricks evolving And if you can finally One of the things we're And then you have to I have to ask you about how We hear that all the time from them. What are some of the use cases that delays the real time use cases. in a way you don't want to So if you can actually incrementally I got to ask you about So you don't have to copy it So you're playing the that is going to be a And one of the comments was And that's the way you and take away the complexity. simplification that you get and congratulations to you and your team. Okay, this is supercloud 22.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Ali GhodsiPERSON

0.99+

Dave VellantePERSON

0.99+

GoogleORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

JohnPERSON

0.99+

last weekDATE

0.99+

next weekDATE

0.99+

AliPERSON

0.99+

Next quarterDATE

0.99+

yesterdayDATE

0.99+

John FurrierPERSON

0.99+

DeltaORGANIZATION

0.99+

one formatQUANTITY

0.99+

firstQUANTITY

0.99+

todayDATE

0.98+

second thingQUANTITY

0.98+

oneQUANTITY

0.98+

LinuxTITLE

0.98+

one copyQUANTITY

0.98+

Delta LakehouseORGANIZATION

0.98+

supercloud 22ORGANIZATION

0.98+

more than one cloudQUANTITY

0.98+

each environmentQUANTITY

0.98+

ClouderatiORGANIZATION

0.98+

Supercloud22ORGANIZATION

0.98+

hundreds of yearsQUANTITY

0.97+

Delta LakeLOCATION

0.97+

one big problemQUANTITY

0.97+

70, 80%QUANTITY

0.97+

WindowsTITLE

0.96+

one placeQUANTITY

0.96+

first timeQUANTITY

0.96+

billion dollarsQUANTITY

0.95+

decadesQUANTITY

0.95+

Delta LakeORGANIZATION

0.95+

OneQUANTITY

0.94+

supercloudORGANIZATION

0.94+

SupercloudORGANIZATION

0.94+

half an hour agoDATE

0.93+

Delta LakeTITLE

0.92+

LakehouseORGANIZATION

0.92+

SparkTITLE

0.91+

eachQUANTITY

0.91+

a minuteQUANTITY

0.85+

one ofQUANTITY

0.73+

one nativeQUANTITY

0.72+

supercloudTITLE

0.7+

couple yearsQUANTITY

0.66+

AltaVistaORGANIZATION

0.65+

Wall Street JournalORGANIZATION

0.63+

theCUBEORGANIZATION

0.63+

LakehouseTITLE

0.51+

LakeLOCATION

0.46+

Hadoop WorldTITLE

0.41+

'22EVENT

0.24+

Breaking Analysis: Snowflake Summit 2022...All About Apps & Monetization


 

>> From theCUBE studios in Palo Alto in Boston, bringing you data driven insights from theCUBE and ETR. This is "Breaking Analysis" with Dave Vellante. >> Snowflake Summit 2022 underscored that the ecosystem excitement which was once forming around Hadoop is being reborn, escalated and coalescing around Snowflake's data cloud. What was once seen as a simpler cloud data warehouse and good marketing with the data cloud is evolving rapidly with new workloads of vertical industry focus, data applications, monetization, and more. The question is, will the promise of data be fulfilled this time around, or is it same wine, new bottle? Hello, and welcome to this week's Wikibon CUBE Insights powered by ETR. In this "Breaking Analysis," we'll talk about the event, the announcements that Snowflake made that are of greatest interest, the major themes of the show, what was hype and what was real, the competition, and some concerns that remain in many parts of the ecosystem and pockets of customers. First let's look at the overall event. It was held at Caesars Forum. Not my favorite venue, but I'll tell you it was packed. Fire Marshall Full, as we sometimes say. Nearly 10,000 people attended the event. Here's Snowflake's CMO Denise Persson on theCUBE describing how this event has evolved. >> Yeah, two, three years ago, we were about 1800 people at a Hilton in San Francisco. We had about 40 partners attending. This week we're close to 10,000 attendees here. Almost 10,000 people online as well, and over over 200 partners here on the show floor. >> Now, those numbers from 2019 remind me of the early days of Hadoop World, which was put on by Cloudera but then Cloudera handed off the event to O'Reilly as this article that we've inserted, if you bring back that slide would say. The headline it almost got it right. Hadoop World was a failure, but it didn't have to be. Snowflake has filled the void created by O'Reilly when it first killed Hadoop World, and killed the name and then killed Strata. Now, ironically, the momentum and excitement from Hadoop's early days, it probably could have stayed with Cloudera but the beginning of the end was when they gave the conference over to O'Reilly. We can't imagine Frank Slootman handing the keys to the kingdom to a third party. Serious business was done at this event. I'm talking substantive deals. Salespeople from a host sponsor and the ecosystems that support these events, they love physical. They really don't like virtual because physical belly to belly means relationship building, pipeline, and deals. And that was blatantly obvious at this show. And in fairness, all theCUBE events that we've done year but this one was more vibrant because of its attendance and the action in the ecosystem. Ecosystem is a hallmark of a cloud company, and that's what Snowflake is. We asked Frank Slootman on theCUBE, was this ecosystem evolution by design or did Snowflake just kind of stumble into it? Here's what he said. >> Well, when you are a data clouding, you have data, people want to do things with that data. They don't want just run data operations, populate dashboards, run reports. Pretty soon they want to build applications and after they build applications, they want build businesses on it. So it goes on and on and on. So it drives your development to enable more and more functionality on that data cloud. Didn't start out that way, you know, we were very, very much focused on data operations. Then it becomes application development and then it becomes, hey, we're developing whole businesses on this platform. So similar to what happened to Facebook in many ways. >> So it sounds like it was maybe a little bit of both. The Facebook analogy is interesting because Facebook is a walled garden, as is Snowflake, but when you come into that garden, you have assurances that things are going to work in a very specific way because a set of standards and protocols is being enforced by a steward, i.e. Snowflake. This means things run better inside of Snowflake than if you try to do all the integration yourself. Now, maybe over time, an open source version of that will come out but if you wait for that, you're going to be left behind. That said, Snowflake has made moves to make its platform more accommodating to open source tooling in many of its announcements this week. Now, I'm not going to do a deep dive on the announcements. Matt Sulkins from Monte Carlo wrote a decent summary of the keynotes and a number of analysts like Sanjeev Mohan, Tony Bear and others are posting some deeper analysis on these innovations, and so we'll point to those. I'll say a few things though. Unistore extends the type of data that can live in the Snowflake data cloud. It's enabled by a new feature called hybrid tables, a new table type in Snowflake. One of the big knocks against Snowflake was it couldn't handle and transaction data. Several database companies are creating this notion of a hybrid where both analytic and transactional workloads can live in the same data store. Oracle's doing this for example, with MySQL HeatWave and there are many others. We saw Mongo earlier this month add an analytics capability to its transaction system. Mongo also added sequel, which was kind of interesting. Here's what Constellation Research analyst Doug Henschen said about Snowflake's moves into transaction data. Play the clip. >> Well with Unistore, they're reaching out and trying to bring transactional data in. Hey, don't limit this to analytical information and there's other ways to do that like CDC and streaming but they're very closely tying that again to that marketplace, with the idea of bring your data over here and you can monetize it. Don't just leave it in that transactional database. So another reach to a broader play across a big community that they're building. >> And you're also seeing Snowflake expand its workload types in its unique way and through Snowpark and its stream lit acquisition, enabling Python so that native apps can be built in the data cloud and benefit from all that structure and the features that Snowflake is built in. Hence that Facebook analogy, or maybe the App Store, the Apple App Store as I propose as well. Python support also widens the aperture for machine intelligence workloads. We asked Snowflake senior VP of product, Christian Kleinerman which announcements he thought were the most impactful. And despite the who's your favorite child nature of the question, he did answer. Here's what he said. >> I think the native applications is the one that looks like, eh, I don't know about it on the surface but he has the biggest potential to change everything. That's create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >> Snowflake also announced support for Apache Iceberg, which is a new open table format standard that's emerging. So you're seeing Snowflake respond to these concerns about its lack of openness, and they're building optionality into their cloud. They also showed some cost op optimization tools both from Snowflake itself and from the ecosystem, notably Capital One which launched a software business on top of Snowflake focused on optimizing cost and eventually the rollout data management capabilities, and all kinds of features that Snowflake announced that the show around governance, cross cloud, what we call super cloud, a new security workload, and they reemphasize their ability to read non-native on-prem data into Snowflake through partnerships with Dell and Pure and a lot more. Let's hear from some of the analysts that came on theCUBE this week at Snowflake Summit to see what they said about the announcements and their takeaways from the event. This is Dave Menninger, Sanjeev Mohan, and Tony Bear, roll the clip. >> Our research shows that the majority of organizations, the majority of people do not have access to analytics. And so a couple of the things they've announced I think address those or help to address those issues very directly. So Snowpark and support for Python and other languages is a way for organizations to embed analytics into different business processes. And so I think that'll be really beneficial to try and get analytics into more people's hands. And I also think that the native applications as part of the marketplace is another way to get applications into people's hands rather than just analytical tools. Because most people in the organization are not analysts. They're doing some line of business function. They're HR managers, they're marketing people, they're sales people, they're finance people, right? They're not sitting there mucking around in the data, they're doing a job and they need analytics in that job. >> Primarily, I think it is to contract this whole notion that once you move data into Snowflake, it's a proprietary format. So I think that's how it started but it's usually beneficial to the customers, to the users because now if you have large amount of data in paket files you can leave it on S3, but then you using the Apache Iceberg table format in Snowflake, you get all the benefits of Snowflake's optimizer. So for example, you get the micro partitioning, you get the metadata. And in a single query, you can join, you can do select from a Snowflake table union and select from an iceberg table and you can do store procedure, user defined function. So I think what they've done is extremely interesting. Iceberg by itself still does not have multi-table transactional capabilities. So if I'm running a workload, I might be touching 10 different tables. So if I use Apache Iceberg in a raw format, they don't have it, but Snowflake does. So the way I see it is Snowflake is adding more and more capabilities right into the database. So for example, they've gone ahead and added security and privacy. So you can now create policies and do even cell level masking, dynamic masking, but most organizations have more than Snowflake. So what we are starting to see all around here is that there's a whole series of data catalog companies, a bunch of companies that are doing dynamic data masking, security and governance, data observability which is not a space Snowflake has gone into. So there's a whole ecosystem of companies that is mushrooming. Although, you know, so they're using the native capabilities of Snowflake but they are at a level higher. So if you have a data lake and a cloud data warehouse and you have other like relational databases, you can run these cross platform capabilities in that layer. So that way, you know, Snowflake's done a great job of enabling that ecosystem. >> I think it's like the last mile, essentially. In other words, it's like, okay, you have folks that are basically that are very comfortable with Tableau but you do have developers who don't want to have to shell out to a separate tool. And so this is where Snowflake is essentially working to address that constituency. To Sanjeev's point, and I think part of it, this kind of plays into it is what makes this different from the Hadoop era is the fact that all these capabilities, you know, a lot of vendors are taking it very seriously to put this native. Now, obviously Snowflake acquired Streamlit. So we can expect that the Streamlit capabilities are going to be native. >> I want to share a little bit about the higher level thinking at Snowflake, here's a chart from Frank Slootman's keynote. It's his version of the modern data stack, if you will. Now, Snowflake of course, was built on the public cloud. If there were no AWS, there would be no Snowflake. Now, they're all about bringing data and live data and expanding the types of data, including structured, we just heard about that, unstructured, geospatial, and the list is going to continue on and on. Eventually I think it's going to bleed into the edge if we can figure out what to do with that edge data. Executing on new workloads is a big deal. They started with data sharing and they recently added security and they've essentially created a PaaS layer. We call it a SuperPaaS layer, if you will, to attract application developers. Snowflake has a developer-focused event coming up in November and they've extended the marketplace with 1300 native apps listings. And at the top, that's the holy grail, monetization. We always talk about building data products and we saw a lot of that at this event, very, very impressive and unique. Now here's the thing. There's a lot of talk in the press, in the Wall Street and the broader community about consumption-based pricing and concerns over Snowflake's visibility and its forecast and how analytics may be discretionary. But if you're a company building apps in Snowflake and monetizing like Capital One intends to do, and you're now selling in the marketplace, that is not discretionary, unless of course your costs are greater than your revenue for that service, in which case is going to fail anyway. But the point is we're entering a new error where data apps and data products are beginning to be built and Snowflake is attempting to make the data cloud the defacto place as to where you're going to build them. In our view they're well ahead in that journey. Okay, let's talk about some of the bigger themes that we heard at the event. Bringing apps to the data instead of moving the data to the apps, this was a constant refrain and one that certainly makes sense from a physics point of view. But having a single source of data that is discoverable, sharable and governed with increasingly robust ecosystem options, it doesn't have to be moved. Sometimes it may have to be moved if you're going across regions, but that's unique and a differentiator for Snowflake in our view. I mean, I'm yet to see a data ecosystem that is as rich and growing as fast as the Snowflake ecosystem. Monetization, we talked about that, industry clouds, financial services, healthcare, retail, and media, all front and center at the event. My understanding is that Frank Slootman was a major force behind this shift, this development and go to market focus on verticals. It's really an attempt, and he talked about this in his keynote to align with the customer mission ultimately align with their objectives which not surprisingly, are increasingly monetizing with data as a differentiating ingredient. We heard a ton about data mesh, there were numerous presentations about the topic. And I'll say this, if you map the seven pillars Snowflake talks about, Benoit Dageville talked about this in his keynote, but if you map those into Zhamak Dehghani's data mesh framework and the four principles, they align better than most of the data mesh washing that I've seen. The seven pillars, all data, all workloads, global architecture, self-managed, programmable, marketplace and governance. Those are the seven pillars that he talked about in his keynote. All data, well, maybe with hybrid tables that becomes more of a reality. Global architecture means the data is globally distributed. It's not necessarily physically in one place. Self-managed is key. Self-service infrastructure is one of Zhamak's four principles. And then inherent governance. Zhamak talks about computational, what I'll call automated governance, built in. And with all the talk about monetization, that aligns with the second principle which is data as product. So while it's not a pure hit and to its credit, by the way, Snowflake doesn't use data mesh in its messaging anymore. But by the way, its customers do, several customers talked about it. Geico, JPMC, and a number of other customers and partners are using the term and using it pretty closely to the concepts put forth by Zhamak Dehghani. But back to the point, they essentially, Snowflake that is, is building a proprietary system that substantially addresses some, if not many of the goals of data mesh. Okay, back to the list, supercloud, that's our term. We saw lots of examples of clouds on top of clouds that are architected to spin multiple clouds, not just run on individual clouds as separate services. And this includes Snowflake's data cloud itself but a number of ecosystem partners that are headed in a very similar direction. Snowflake still talks about data sharing but now it uses the term collaboration in its high level messaging, which is I think smart. Data sharing is kind of a geeky term. And also this is an attempt by Snowflake to differentiate from everyone else that's saying, hey, we do data sharing too. And finally Snowflake doesn't say data marketplace anymore. It's now marketplace, accounting for its application market. Okay, let's take a quick look at the competitive landscape via this ETR X-Y graph. Vertical access remembers net score or spending momentum and the x-axis is penetration, pervasiveness in the data center. That's what ETR calls overlap. Snowflake continues to lead on the vertical axis. They guide it conservatively last quarter, remember, so I wouldn't be surprised if that lofty height, even though it's well down from its earlier levels but I wouldn't be surprised if it ticks down again a bit in the July survey, which will be in the field shortly. Databricks is a key competitor obviously at a strong spending momentum, as you can see. We didn't draw it here but we usually draw that 40% line or red line at 40%, anything above that is considered elevated. So you can see Databricks is quite elevated. But it doesn't have the market presence of Snowflake. It didn't get to IPO during the bubble and it doesn't have nearly as deep and capable go-to market machinery. Now, they're getting better and they're getting some attention in the market, nonetheless. But as a private company, you just naturally, more people are aware of Snowflake. Some analysts, Tony Bear in particular, believe Mongo and Snowflake are on a bit of a collision course long term. I actually can see his point. You know, I mean, they're both platforms, they're both about data. It's long ways off, but you can see them sort of in a similar path. They talk about kind of similar aspirations and visions even though they're quite in different markets today but they're definitely participating in similar tam. The cloud players are probably the biggest or definitely the biggest partners and probably the biggest competitors to Snowflake. And then there's always Oracle. Doesn't have the spending velocity of the others but it's got strong market presence. It owns a cloud and it knows a thing about data and it definitely is a go-to market machine. Okay, we're going to end on some of the things that we heard in the ecosystem. 'Cause look, we've heard before how particular technology, enterprise data warehouse, data hubs, MDM, data lakes, Hadoop, et cetera. We're going to solve all of our data problems and of course they didn't. And in fact, sometimes they create more problems that allow vendors to push more incremental technology to solve the problems that they created. Like tools and platforms to clean up the no schema on right nature of data lakes or data swamps. But here are some of the things that I heard firsthand from some customers and partners. First thing is, they said to me that they're having a hard time keeping up sometimes with the pace of Snowflake. It reminds me of AWS in 2014, 2015 timeframe. You remember that fire hose of announcements which causes increased complexity for customers and partners. I talked to several customers that said, well, yeah this is all well and good but I still need skilled people to understand all these tools that I'm integrated in the ecosystem, the catalogs, the machine learning observability. A number of customers said, I just can't use one governance tool, I need multiple governance tools and a lot of other technologies as well, and they're concerned that that's going to drive up their cost and their complexity. I heard other concerns from the ecosystem that it used to be sort of clear as to where they could add value you know, when Snowflake was just a better data warehouse. But to point number one, they're either concerned that they'll be left behind or they're concerned that they'll be subsumed. Look, I mean, just like we tell AWS customers and partners, you got to move fast, you got to keep innovating. If you don't, you're going to be left. Either if your customer you're going to be left behind your competitor, or if you're a partner, somebody else is going to get there or AWS is going to solve the problem for you. Okay, and there were a number of skeptical practitioners, really thoughtful and experienced data pros that suggested that they've seen this movie before. That's hence the same wine, new bottle. Well, this time around I certainly hope not given all the energy and investment that is going into this ecosystem. And the fact is Snowflake is unquestionably making it easier to put data to work. They built on AWS so you didn't have to worry about provisioning, compute and storage and networking and scaling. Snowflake is optimizing its platform to take advantage of things like Graviton so you don't have to, and they're doing some of their own optimization tools. The ecosystem is building optimization tools so that's all good. And firm belief is the less expensive it is, the more data will get brought into the data cloud. And they're building a data platform on which their ecosystem can build and run data applications, aka data products without having to worry about all the hard work that needs to get done to make data discoverable, shareable, and governed. And unlike the last 10 years, you don't have to be a keeper and integrate all the animals in the Hadoop zoo. Okay, that's it for today, thanks for watching. Thanks to my colleague, Stephanie Chan who helps research "Breaking Analysis" topics. Sometimes Alex Myerson is on production and manages the podcasts. Kristin Martin and Cheryl Knight help get the word out on social and in our newsletters, and Rob Hof is our editor in chief over at Silicon, and Hailey does some wonderful editing, thanks to all. Remember, all these episodes are available as podcasts wherever you listen. All you got to do is search Breaking Analysis Podcasts. I publish each week on wikibon.com and siliconangle.com and you can email me at David.Vellante@siliconangle.com or DM me @DVellante. If you got something interesting, I'll respond. If you don't, I'm sorry I won't. Or comment on my LinkedIn post. Please check out etr.ai for the best survey data in the enterprise tech business. This is Dave Vellante for theCUBE Insights powered by ETR. Thanks for watching, and we'll see you next time. (upbeat music)

Published Date : Jun 18 2022

SUMMARY :

bringing you data driven that the ecosystem excitement here on the show floor. and the action in the ecosystem. Didn't start out that way, you know, One of the big knocks against Snowflake the idea of bring your data of the question, he did answer. is the one that looks like, and from the ecosystem, And so a couple of the So that way, you know, from the Hadoop era is the fact the defacto place as to where

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Frank SlootmanPERSON

0.99+

Frank SlootmanPERSON

0.99+

Doug HenschenPERSON

0.99+

Stephanie ChanPERSON

0.99+

Christian KleinermanPERSON

0.99+

AWSORGANIZATION

0.99+

Dave VellantePERSON

0.99+

Rob HofPERSON

0.99+

Benoit DagevillePERSON

0.99+

2014DATE

0.99+

Matt SulkinsPERSON

0.99+

JPMCORGANIZATION

0.99+

2019DATE

0.99+

Cheryl KnightPERSON

0.99+

Palo AltoLOCATION

0.99+

Denise PerssonPERSON

0.99+

Alex MyersonPERSON

0.99+

Tony BearPERSON

0.99+

Dave MenningerPERSON

0.99+

DellORGANIZATION

0.99+

JulyDATE

0.99+

GeicoORGANIZATION

0.99+

NovemberDATE

0.99+

SnowflakeTITLE

0.99+

40%QUANTITY

0.99+

OracleORGANIZATION

0.99+

App StoreTITLE

0.99+

Capital OneORGANIZATION

0.99+

second principleQUANTITY

0.99+

Sanjeev MohanPERSON

0.99+

SnowflakeORGANIZATION

0.99+

1300 native appsQUANTITY

0.99+

Tony BearPERSON

0.99+

David.Vellante@siliconangle.comOTHER

0.99+

Kristin MartinPERSON

0.99+

MongoORGANIZATION

0.99+

DatabricksORGANIZATION

0.99+

Snowflake Summit 2022EVENT

0.99+

FirstQUANTITY

0.99+

twoDATE

0.99+

PythonTITLE

0.99+

10 different tablesQUANTITY

0.99+

FacebookORGANIZATION

0.99+

ETRORGANIZATION

0.99+

bothQUANTITY

0.99+

SnowflakeEVENT

0.98+

one placeQUANTITY

0.98+

each weekQUANTITY

0.98+

O'ReillyORGANIZATION

0.98+

This weekDATE

0.98+

Hadoop WorldEVENT

0.98+

this weekDATE

0.98+

PureORGANIZATION

0.98+

about 40 partnersQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

last quarterDATE

0.98+

OneQUANTITY

0.98+

S3TITLE

0.97+

HadoopLOCATION

0.97+

singleQUANTITY

0.97+

Caesars ForumLOCATION

0.97+

IcebergTITLE

0.97+

single sourceQUANTITY

0.97+

SiliconORGANIZATION

0.97+

Nearly 10,000 peopleQUANTITY

0.97+

Apache IcebergORGANIZATION

0.97+

Clint Sharp, Cribl | Cube Conversation


 

(upbeat music) >> Hello, welcome to this CUBE conversation I'm John Furrier your host here in theCUBE in Palo Alto, California, featuring Cribl a hot startup taking over the enterprise when it comes to data pipelining, and we have a CUBE alumni who's the co-founder and CEO, Clint Sharp. Clint, great to see you again, you've been on theCUBE, you were on in 2013, great to see you, congratulations on the company that you co-founded, and leading as the chief executive officer over $200 million in funding, doing this really strong in the enterprise, congratulations thanks for joining us. >> Hey, thanks John it's really great to be back. >> You know, remember our first conversation the big data wave coming in, Hadoop World 2010, now the cloud comes in, and really the cloud native really takes data to a whole nother level. You've seeing the old data architectures being replaced with cloud scale. So the data landscape is interesting. You know, Data as Code you're hearing that term, data engineering teams are out there, data is everywhere, it's now part of how developers and companies are getting value whether it's real time, or coming out of data lakes, data is more pervasive than ever. Observability is a hot area, there's a zillion companies doing it, what are you guys doing? Where do you fit in the data landscape? >> Yeah, so what I say is that Cribl and our products and we solve the problem for our customers of the fundamental tension between data growth and budget. And so if you look at IDCs data data's growing at a 25%, CAGR, you're going to have two and a half times the amount of data in five years that you have today, and I talk to a lot of CIOs, I talk to a lot of CISOs, and the thing that I hear repeatedly is my budget is not growing at a 25% CAGR so fundamentally, how do I resolve this tension? We sell very specifically into the observability in security markets, we sell to technology professionals who are operating, you know, observability in security platforms like Splunk, or Elasticsearch, or Datadog, Exabeam, like these types of platforms they're moving, protocols like syslog, they're moving, they have lots of agents deployed on every endpoint and they're trying to figure out how to get the right data to the right place, and fundamentally you know, control cost. And we do that through our product called Stream which is what we call an observability pipeline. It allows you to take all this data, manipulate it in the stream and get it to the right place and fundamentally be able to connect all those things that maybe weren't originally intended to be connected. >> So I want to get into that new architecture if you don't mind, but let me first ask you on the problem space that you're in. So cloud native obviously instrumentating, instrumenting everything is a key thing. You mentioned data got all these tools, is the problem that there's been a sprawl of things being instrumented and they have to bring it together, or it's too costly to run all these point solutions and get it to work? What's the problem space that you're in? >> So I think customers have always been forced to make trade offs John. So the, hey I have volumes and volumes and volumes of data that's relevant to securing my enterprise, that's relevant to observing and understanding the behavior of my applications but there's never been an approach that allows me to really onboard all of that data. And so where we're coming at is giving them the tools to be able to, you know, filter out noise and waste, to be able to, you know, aggregate this high fidelity telemetry data. There's a lot of growing changes, you talk about cloud native, but digital transformation, you know, the pandemic itself and remote work all these are driving significantly greater data volumes, and vendors unsurprisingly haven't really been all that aligned to giving customers the tools in order to reshape that data, to filter out noise and waste because, you know, for many of them they're incentivized to get as much data into their platform as possible, whether that's aligned to the customer's interests or not. And so we saw an opportunity to come out and fundamentally as a customers-first company give them the tools that they need, in order to take back control of their data. >> I remember those conversations even going back six years ago the whole cloud scale, horizontally scalable applications, you're starting to see data now being stuck in the silos now to have high, good data you have to be observable, which means you got to be addressable. So you now have to have a horizontal data plane if you will. But then you get to the question of, okay, what data do I need at the right time? So is the Data as Code, data engineering discipline changing what new architectures are needed? What changes in the mind of the customer once they realize that they need this new way to pipe data and route data around, or make it available for certain applications? What are the key new changes? >> Yeah, so I think one of the things that we've been seeing in addition to the advent of the observability pipeline that allows you to connect all the things, is also the advent of an observability lake as well. Which is allowing people to store massively greater quantities of data, and also different types of data. So data that might not traditionally fit into a data warehouse, or might not traditionally fit into a data lake architecture, things like deployment artifacts, or things like packet captures. These are binary types of data that, you know, it's not designed to work in a database but yet they want to be able to ask questions like, hey, during the Log4Shell vulnerability, one of all my deployment artifacts actually had Log4j in it in an affected version. These are hard questions to answer in today's enterprise. Or they might need to go back to full fidelity packet capture data to try to understand that, you know, a malicious actor's movement throughout the enterprise. And we're not seeing, you know, we're seeing vendors who have great log indexing engines, and great time series databases, but really what people are looking for is the ability to store massive quantities of data, five times, 10 times more data than they're storing today, and they're doing that in places like AWSS3, or in Azure Blob Storage, and we're just now starting to see the advent of technologies we can help them query that data, and technologies that are generally more specifically focused at the type of persona that we sell to which is a security professional, or an IT professional who's trying to understand the behaviors of their applications, and we also find that, you know, general-purpose data processing technologies are great for the enterprise, but they're not working for the people who are running the enterprise, and that's why you're starting to see the concepts like observability pipelines and observability lakes emerge, because they're targeted at these people who have a very unique set of problems that are not being solved by the general-purpose data processing engines. >> It's interesting as you see the evolution of more data volume, more data gravity, then you have these specialty things that need to be engineered for the business. So sounds like observability lake and pipelining of the data, the data pipelining, or stream you call it, these are new things that they bolt into the architecture, right? Because they have business reasons to do it. What's driving that? Sounds like security is one of them. Are there others that are driving this behavior? >> Yeah, I mean it's the need to be able to observe applications and observe end-user behavior at a fine-grain detail. So, I mean I often use examples of like bank teller applications, or perhaps, you know, the app that you're using to, you know, I'm going to be flying in a couple of days. I'll be using their app to understand whether my flight's on time. Am I getting a good experience in that particular application? Answering the question of is Clint getting a good experience requires massive quantities of data, and your application and your service, you know, I'm going to sit there and look at, you know, American Airlines which I'm flying on Thursday, I'm going to be judging them based on off of my experience. I don't care what the average user's experience is I care what my experience is. And if I call them up and I say, hey, and especially for the enterprise usually this is much more for, you know, in-house applications and things like that. They call up their IT department and say, hey, this application is not working well, I don't know what's going on with it, and they can't answer the question of what was my individual experience, they're living with, you know, data that they can afford to store today. And so I think that's why you're starting to see the advent of these new architectures is because digital is so absolutely critical to every company's customer experience, that they're needing to be able to answer questions about an individual user's experience which requires significantly greater volumes of data, and because of significantly greater volumes of data, that requires entirely new approaches to aggregating that data, bringing the data in, and storing that data. >> Talk to me about enabling customer choice when it comes around controlling their data. You mentioned that before we came on camera that you guys are known for choice. How do you enable customer choice and control over their data? >> So I think one of the biggest problems I've seen in the industry over the last couple of decades is that vendors come to customers with hugely valuable products that make their lives better but it also requires them to maintain a relationship with that vendor in order to be able to continue to ask questions of that data. And so customers don't get a lot of optionality in these relationships. They sign multi-year agreements, they look to try to start another, they want to go try out another vendor, they want to add new technologies into their stack, and in order to do that they're often left with a choice of well, do I roll out like get another agent, do I go touch 10,000 computers, or a 100,000 computers in order to onboard this data? And what we have been able to offer them is the ability to reuse their existing deployed footprints of agents and their existing data collection technologies, to be able to use multiple tools and use the right tool for the right job, and really give them that choice, and not only give them the choice once, but with the concepts of things like the observability lake and replay, they can go back in time and say, you know what? I wanted to rehydrate all this data into a new tool, I'm no longer locked in to the way one vendor stores this, I can store this data in open formats and that's one of the coolest things about the observability late concept is that customers are no longer locked in to any particular vendor, the data is stored in open formats and so that gives them the choice to be able to go back later and choose any vendor, because they may want to do some AI or ML on that type of data and do some model training. They may want to be able to forward that data to a new cloud data warehouse, or try a different vendor for log search or a different vendor for time series data. And we're really giving them the choice and the tools to do that in a way in which was simply not possible before. >> You know you are bring up a point that's a big part of the upcoming AWS startup series Data as Code, the data engineering role has become so important and the word engineering is a key word in that, but there's not a lot of them, right? So like how many data engineers are there on the planet, and hopefully more will come in, come from these great programs in computer science but you got to engineer something but you're talking about developing on data, you're talking about doing replays and rehydrating, this is developing. So Data as Code is now a reality, how do you see Data as Code evolving from your perspective? Because it implies DevOps, Infrastructure as Code was DevOps, if Data as Code then you got DataOps, AIOps has been around for a while, what is Data as Code? And what does that mean to you Clint? >> I think for our customers, one, it means a number of I think sort of after-effects that maybe they have not yet been considering. One you mentioned which is it's hard to acquire that talent. I think it is also increasingly more critical that people who were working in jobs that used to be purely operational, are now being forced to learn, you know, developer centric tooling, things like GET, things like CI/CD pipelines. And that means that there's a lot of education that's going to have to happen because the vast majority of the people who have been doing things in the old way from the last 10 to 20 years, you know, they're going to have to get retrained and retooled. And I think that one is that's a huge opportunity for people who have that skillset, and I think that they will find that their compensation will be directly correlated to their ability to have those types of skills, but it also represents a massive opportunity for people who can catch this wave and find themselves in a place where they're going to have a significantly better career and more options available to them. >> Yeah and I've been thinking about what you just said about your customer environment having all these different things like Datadog and other agents. Those people that rolled those out can still work there, they don't have to rip and replace and then get new training on the new multiyear enterprise service agreement that some other vendor will sell them. You come in and it sounds like you're saying, hey, stay as you are, use Cribl, we'll have some data engineering capabilities for you, is that right? Is that? >> Yup, you got it. And I think one of the things that's a little bit different about our product and our market John, from kind of general-purpose data processing is for our users they often, they're often responsible for many tools and data engineering is not their full-time job, it's actually something they just need to do now, and so we've really built tool that's designed for your average security professional, your average IT professional, yes, we can utilize the same kind of DataOps techniques that you've been talking about, CI/CD pipelines, GITOps, that sort of stuff, but you don't have to, and if you're really just already familiar with administering a Datadog or a Splunk, you can get started with our product really easily, and it is designed to be able to be approachable to anybody with that type of skillset. >> It's interesting you, when you're talking you've remind me of the big wave that was coming, it's still here, shift left meant security from the beginning. What do you do with data shift up, right, down? Like what do you, what does that mean? Because what you're getting at here is that if you're a developer, you have to deal with data but you don't have to be a data engineer but you can be, right? So we're getting in this new world. Security had that same problem. Had to wait for that group to do things, creating tension on the CI/CD pipelining, so the developers who are building apps had to wait. Now you got shift left, what is data, what's the equivalent of the data version of shift left? >> Yeah so we're actually doing this right now. We just announced a new product a week ago called Cribl Edge. And this is enabling us to move processing of this data rather than doing it centrally in the stream to actually push this processing out to the edge, and to utilize a lot of unused capacity that you're already paying AWS, or paying Azure for, or maybe in your own data center, and utilize that capacity to do the processing rather than having to centralize and aggregate all of this data. So I think we're going to see a really interesting, and left from our side is towards the origination point rather than anything else, and that allows us to really unlock a lot of unused capacity and continue to drive the kind of cost down to make more data addressable back to the original thing we talked about the tension between data growth, if we want to offer more capacity to people, if we want to be able to answer more questions, we need to be able to cost-effectively query a lot more data. >> You guys had great success in the enterprise with what you got going on. Obviously the funding is just the scoreboard for that. You got good growth, what are the use cases, or what's the customer look like that's working for you where you're winning, or maybe said differently what pain points are out there the customer might be feeling right now that Cribl could fit in and solve? How would you describe that ideal persona, or environment, or problem, that the customer may have that they say, man, Cribl's a perfect fit? >> Yeah, this is a person who's working on tooling. So they administer a Splunk, or an Elastic, or a Datadog, they may be in a network operations center, a security operation center, they are struggling to get data into their tools, they're always at capacity, their tools always at the redline, they really wish they could do more for the business. They're kind of tired of being this department of no where everybody comes to them and says, "hey, can I get this data in?" And they're like, "I wish, but you know, we're all out of capacity, and you know, we have, we wish we could help you but we frankly can't right now." We help them by routing that data to multiple locations, we help them control costs by eliminating noise and waste, and we've been very successful at that in, you know, logos, like, you know, like a Shutterfly, or a, blanking on names, but we've been very successful in the enterprise, that's not great, and we continue to be successful with major logos inside of government, inside of banking, telco, et cetera. >> So basically it used to be the old hyperscalers, the ones with the data full problem, now everyone's got the, they're full of data and they got to really expand capacity and have more agility and more engineering around contributions of the business sounds like that's what you guys are solving. >> Yup and hopefully we help them do a little bit more with less. And I think that's a key problem for our enterprises, is that there's always a limit on the number of human resources that they have available at their disposal, which is why we try to make the software as easy to use as possible, and make it as widely applicable to those IT and security professionals who are, you know, kind of your run-of-the-mill tools administrator, our product is very approachable for them. >> Clint great to see you on theCUBE here, thanks for coming on. Quick plug for the company, you guys looking for hiring, what's going on? Give a quick update, take 30 seconds to give a plug. >> Yeah, absolutely. We are absolutely hiring cribl.io/jobs, we need people in every function from sales, to marketing, to engineering, to back office, GNA, HR, et cetera. So please check out our job site. If you are interested it in learning more you can go to cribl.io. We've got some great online sandboxes there which will help you educate yourself on the product, our documentation is freely available, you can sign up for up to a terabyte a day on our cloud, go to cribl.cloud and sign up free today. The product's easily accessible, and if you'd like to speak with us we'd love to have you in our community, and you can join the community from cribl.io as well. >> All right, Clint Sharp co-founder and CEO of Cribl, thanks for coming to theCUBE. Great to see you, I'm John Furrier your host thanks for watching. (upbeat music)

Published Date : Mar 31 2022

SUMMARY :

Clint, great to see you again, really great to be back. and really the cloud native and get it to the right place and get it to work? to be able to, you know, So is the Data as Code, is the ability to store that need to be engineered that they're needing to be that you guys are known for choice. is the ability to reuse their does that mean to you Clint? from the last 10 to 20 years, they don't have to rip and and it is designed to be but you don't have to be a data engineer and to utilize a lot of unused capacity that the customer may have and you know, we have, and they got to really expand capacity as easy to use as possible, Clint great to see you on theCUBE here, and you can join the community Great to see you, I'm

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Clint SharpPERSON

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

10 timesQUANTITY

0.99+

ClintPERSON

0.99+

30 secondsQUANTITY

0.99+

100,000 computersQUANTITY

0.99+

ThursdayDATE

0.99+

CriblORGANIZATION

0.99+

AWSORGANIZATION

0.99+

25%QUANTITY

0.99+

American AirlinesORGANIZATION

0.99+

five timesQUANTITY

0.99+

10,000 computersQUANTITY

0.99+

2013DATE

0.99+

five yearsQUANTITY

0.99+

Palo Alto, CaliforniaLOCATION

0.99+

oneQUANTITY

0.99+

over $200 millionQUANTITY

0.99+

six years agoDATE

0.99+

CUBEORGANIZATION

0.98+

a week agoDATE

0.98+

firstQUANTITY

0.98+

telcoORGANIZATION

0.98+

DatadogORGANIZATION

0.97+

todayDATE

0.97+

AWSS3TITLE

0.97+

Log4ShellTITLE

0.96+

two and a half timesQUANTITY

0.94+

last couple of decadesDATE

0.89+

first conversationQUANTITY

0.89+

OneQUANTITY

0.87+

Hadoop World 2010EVENT

0.87+

Log4jTITLE

0.83+

cribl.ioORGANIZATION

0.81+

20 yearsQUANTITY

0.8+

AzureORGANIZATION

0.8+

first companyQUANTITY

0.79+

big waveEVENT

0.79+

theCUBEORGANIZATION

0.78+

up to a terabyte a dayQUANTITY

0.77+

Azure BlobTITLE

0.77+

cribl.cloudTITLE

0.74+

ExabeamORGANIZATION

0.72+

ShutterflyORGANIZATION

0.71+

bankingORGANIZATION

0.7+

DataOpsTITLE

0.7+

waveEVENT

0.68+

lastDATE

0.67+

cribl.ioTITLE

0.66+

thingsQUANTITY

0.65+

zillion companiesQUANTITY

0.63+

syslogTITLE

0.62+

10QUANTITY

0.61+

SplunkORGANIZATION

0.6+

AIOpsTITLE

0.6+

EdgeTITLE

0.6+

Data asTITLE

0.59+

cribl.io/jobsORGANIZATION

0.58+

ElasticsearchTITLE

0.58+

ElasticTITLE

0.55+

onceQUANTITY

0.5+

problemsQUANTITY

0.48+

CodeTITLE

0.46+

SplunkTITLE

0.44+

Zhamak Dehghani, ThoughtWorks | theCUBE on Cloud 2021


 

>>from around the globe. It's the Cube presenting Cuban cloud brought to you by silicon angle in 2000 >>nine. Hal Varian, Google's chief economist, said that statisticians would be the sexiest job in the coming decade. The modern big data movement >>really >>took off later in the following year. After the Second Hadoop World, which was hosted by Claudette Cloudera in New York City. Jeff Ham Abakar famously declared to me and John further in the Cube that the best minds of his generation, we're trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now, what actually transpired Over the next 10 years on Lee, a small handful of companies could really master the complexities of big data and attract the data science talent really necessary to realize massive returns as well. Back then, Cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade and as the years passed, Maurin Mawr data got moved to the cloud and the number of data sources absolutely exploded. Experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace things like data Lakes, data hubs, new open source projects, new tools which piled on even Mawr complexity. And as we reported, we believe what's needed is a comm pleat bit flip and how we approach data architectures. Our next guest is Jean Marc de Connie, who is the director of emerging technologies That thought works. John Mark is a software engineer, architect, thought leader and adviser to some of the world's most prominent enterprises. She's, in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures. Favoring a decentralized over monolithic structure and elevating domain knowledge is a primary criterion. And how we organize so called big data teams and platforms. Chamakh. Welcome to the Cube. It's a pleasure to have you on the program. >>Hi, David. This wonderful to be here. >>Well, okay, so >>you're >>pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms that scale. Why do you feel we need such a radical change? What's your thoughts there? >>Well, I think if you just look back over the last decades you gave us, you know, a summary of what happened since 2000 and 10. But if even if we go before then what we have done over the last few decades is basically repeating and, as you mentioned, incrementally improving how we've managed data based on a certain assumptions around. As you mentioned, centralization data has to be in one place so we can get value from it. But if you look at the parallel movement off our industry in general since the birth of Internet, we are actually moving towards decentralization. If we think today, like if this move data side, if he said the only way Web would work the only way we get access to you know various applications on the Web pages is to centralize it. We would laugh at that idea, but for some reason we don't. We don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, you know, embrace the distribution of sources of data that they're not just within one part of organization. They're not just within even bounds of organization there beyond the bounds of organization. And then look back and say Okay, if that's the trend off our industry in general, Um, given the fabric of computation and data that we put in, you know globally in place, then how the architecture and technology and organizational structure incentives need to move to embrace that complexity. And to me, that requires a paradigm shift, a full stack from how we organize our organizations, how we organize our teams, how we, you know, put a technology in place, um, to to look at it from a decentralized angle. >>Okay, so let's let's unpack that a little bit. I mean, you've spoken about and written that today's big architecture and you basically just mentioned that it's flawed, So I wanna bring up. I love your diagrams of a simple diagram, guys, if you could bring up ah, figure one. So on the left here we're adjusting data from the operational systems and other enterprise data sets and, of course, external data. We cleanse it, you know, you've gotta do the do the quality thing and then serve them up to the business. So So what's wrong with that picture that we just described and give granted? It's a simplified form. >>Yeah, quite a few things. So, yeah, I would flip the question may be back to you or the audience if we said that. You know, there are so many sources off the data on the Actually, the data comes from systems and from teams that are very diverse in terms off domains. Right? Domain. If if you just think about, I don't know retail, Uh, the the E Commerce versus Order Management versus customer This is a very diverse domains. The data comes from many different diverse domains. And then we expect to put them under the control off a centralized team, a centralized system. And I know that centralization. Probably if you zoom out, it's centralized. If you zoom in it z compartmentalized based on functions that we can talk about that and we assume that the centralized model will be served, you know, getting that data, making sense of it, cleansing and transforming it then to satisfy in need of very diverse set of consumers without really understanding the domains, because the teams responsible for it or not close to the source of the data. So there is a bit of it, um, cognitive gap and domain understanding Gap, um, you know, without really understanding of how the data is going to be used, I've talked to numerous. When we came to this, I came up with the idea. I talked to a lot of data teams globally just to see, you know, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations that they actually didn't know after they built these pipelines and put the data in whether the data warehouse tables or like, they didn't know how the data was being used. But yet the responsible for making the data available for these diverse set of these cases, So s centralized system. A monolithic system often is a bottleneck. So what you find is, a lot of the teams are struggling with satisfying the needs of the consumers, the struggling with really understanding the data. The domain knowledge is lost there is a los off understanding and kind of in that in that transformation. Often, you know, we end up training machine learning models on data that is not really representative off the reality off the business. And then we put them to production and they don't work because the semantic and the same tax off the data gets lost within that translation. So we're struggling with finding people thio, you know, to manage a centralized system because there's still the technology is fairly, in my opinion, fairly low level and exposes the users of those technologies. I said, Let's say warehouse a lot off, you know, complexity. So in summary, I think it's a bottleneck is not gonna, you know, satisfy the pace of change, of pace, of innovation and the pace of, you know, availability of sources. Um, it's disconnected and fragmented, even though the centralizes disconnected and fragmented from where the data comes from and where the data gets used on is managed by, you know, a team off hyper specialized people that you know, they're struggling to understand the actual value of the data, the actual format of the data, so it's not gonna get us where our aspirations and ambitions need to be. >>Yes. So the big data platform is essentially I think you call it, uh, context agnostic. And so is data becomes, you know, more important, our lives. You've got all these new data sources, you know, injected into the system. Experimentation as we said it with the cloud becomes much, much easier. So one of the blockers that you've started, you just mentioned it is you've got these hyper specialized roles the data engineer, the quality engineer, data scientists and and the It's illusory. I mean, it's like an illusion. These guys air, they seemingly they're independent and in scale independently. But I think you've made the point that in fact, they can't that a change in the data source has an effect across the entire data lifecycle entire data pipeline. So maybe you could maybe you could add some color to why that's problematic for some of the organizations that you work with and maybe give some examples. >>Yeah, absolutely so in fact, that initially the hypothesis around that image came from a Siris of requests that we received from our both large scale and progressive clients and progressive in terms of their investment in data architectures. So this is where clients that they were there were larger scale. They had divers and reached out of domains. Some of them were big technology tech companies. Some of them were retail companies, big health care companies. So they had that diversity off the data and the number off. You know, the sources of the domains they had invested for quite a few years in, you know, generations. If they had multi generations of proprietary data warehouses on print that they were moving to cloud, they had moved to the barriers, you know, revisions of the Hadoop clusters and they were moving to the cloud. And they the challenges that they were facing were simply there were not like, if I want to just, like, you know, simplifying in one phrase, they were not getting value from the data that they were collecting. There were continuously struggling Thio shift the culture because there was so much friction between all of these three phases of both consumption of the data and transformation and making it available consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So its bottom line is that you're collecting all this data. There is delay. There is lack of trust in the data itself because the data is not representative of the reality has gone through a transformation. But people that didn't understand really what the data was got delayed on bond. So there is no trust. It's hard to get to the data. It's hard to create. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure. But it's still, you know, struggling. So we often you know, our solutions like we are. You know, Technologies will often pointed to technology. So we go. Okay, This this version of you know, some some proprietary data warehouse we're using is not the right thing. We should go to the cloud, and that certainly will solve our problems. Right? Or warehouse wasn't a good one. Let's make a deal Lake version. So instead of you know, extracting and then transforming and loading into the little bits. And that transformation is that, you know, heavy process, because you fundamentally made an assumption using warehouses that if I transform this data into this multi dimensional, perfectly designed schema that then everybody can run whatever choir they want that's gonna solve. You know everybody's problem, but in reality it doesn't because you you are delayed and there is no universal model that serves everybody's need. Everybody that needs the divers data scientists necessarily don't don't like the perfectly modeled data. They're looking for both signals and the noise. So then, you know, we've We've just gone from, uh, et elles to let's say now to Lake, which is okay, let's move the transformation to the to the last mile. Let's just get load the data into, uh into the object stores into semi structured files and get the data. Scientists use it, but they're still struggling because the problems that we mentioned eso then with the solution. What is the solution? Well, next generation data platform, let's put it on the cloud, and we sell clients that actually had gone through, you know, a year or multiple years of migration to the cloud. But with it was great. 18 months I've seen, you know, nine months migrations of the warehouse versus two year migrations of the various data sources to the clubhouse. But ultimately, the result is the same on satisfy frustrated data users, data providers, um, you know, with lack of ability to innovate quickly on relevant data and have have have an experience that they deserve toe have have a delightful experience off discovering and exploring data that they trust. And all of that was still a missed so something something else more fundamentally needed to change than just the technology. >>So then the linchpin to your scenario is this notion of context and you you pointed out you made the other observation that look, we've made our operational systems context aware. But our data platforms are not on bond like CRM system sales guys very comfortable with what's in the CRM system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders, if you will, of data products or data services there now, first class citizens in the data flow and they're injecting by design domain knowledge into the system. So So I wanna put up another one of your charts. Guys, bring up the figure to their, um it talks about, you know, convergence. You showed data distributed domain, dream and architecture. Er this self serve platform design and this notion of product thinking. So maybe you could explain why this approach is is so desirable, in your view, >>sure. The motivation and inspiration for the approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to micro services with monolithic systems, monolithic systems where you know the bottleneck. Um, the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized and we found a nice nation. I'm not saying this is the perfect way of decoupling a monolith, but it's a way that currently where we are in our journey to become data driven, um is a nice place to be, um, which is distribution or decomposition off your system as well as organization. I think when we whenever we talk about systems, we've got to talk about people and teams that's responsible for managing those systems. So the decomposition off the systems and the teams on the data around domains because that's how today we are decoupling our business, right? We're decoupling our businesses around domains, and that's a that's a good thing and that What does that do really for us? What it does? Is it localizes change to the bounded context of fact business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization on that particular team, so removes the friction that often we have for both managing the change and both serving data or capability. So it's the first principle of data meshes. Let's decouple this world off analytical data the same to mirror the same way we have to couple their systems and teams and business why data is any different. And the moment you do that, So you, the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silence that's connected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain Context on the second principle is well, we have to expect a certain level off quality and accountability and responsibility for the teams that provide the data. So let's bring product thinking and treating data as a product to the data that these teams now, um share and let's put accountability around. And we need a new set of incentives and metrics for domain teams to share the data. We need to have a new set off kind of quality metrics that define what it means for the data to be a product. And we can go through that conversation perhaps later eso then the second principle is okay. The teams now that are responsible, the domain teams responsible for the analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product and bring products thinking to that. And then the next question you get asked off by C. E. O s or city or the people who build the infrastructure and, you know, spend the money. They said, Well, it's actually quite complex to manage big data, and now we're We want everybody, every independent team to manage the full stack of, you know, storage and computation and pipelines and, you know, access, control and all of that. And that's well, we have solved that problem in operational world. And that requires really a new level of platform thinking toe provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data. And that I think that requires reimagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away ton of complexity that unnecessarily people get exposed to and that that's the third principle of creating Selves of infrastructure, um, to allow autonomous teams to build their domains. But then the last pillar, the last you know, fundamental pillar is okay. Once you distributed problem into a smaller problems that you found yourself with another set of problems, which is how I'm gonna connect this data, how I'm gonna you know, that the insights happens and emerges from the interconnection of the data domains right? It does not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a Federated model and based on a computational model. Like once we have this powerful self serve platform, we can computational e automate a lot of governance decisions. Um, that security decisions and policy decisions that applies to you know, this fabric of mesh not just a single domain or not in a centralized. Also, really. As you mentioned that the most important component of the emissions distribution of ownership and distribution of architecture and data the rest of them is to solve all the problems that come with that. >>So very powerful guys. We actually have a picture of what Jamaat just described. Bring up, bring up figure three, if you would tell me it. Essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure, which you kind of show here in this figure, data infrastructure is a platform down below. And you know what I love about this Jama is it to me, it underscores the data is not the new oil because I could put oil in my car I can put in my house, but I can't put the same court in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, you know, sticking problem, which is that you know the governance which is now not a command and control. It's it's Federated governance. So maybe you could add some thoughts on that. >>Sure, absolutely. It's one of those I think I keep referring to data much as a paradigm shift. And it's not just to make it sound ground and, you know, like, kind of ground and exciting or in court. And it's really because I want to point out, we need to question every moment when we make a decision around how we're going to design security or governance or modeling off the data, we need to reflect and go back and say, um, I applying some of my cognitive biases around how I have worked for the last 40 years, I have seen it work. Or do I do I really need to question. And we do need to question the way we have applied governance. I think at the end of the day, the rule of the data governance and objective remains the same. I mean, we all want quality data accessible to a diverse set of users. And these users now have different personas, like David, Personal data, analyst data, scientists, data application, Um, you know, user, very diverse personal. So at the end of the day, we want quality data accessible to them, um, trustworthy in in an easy consumable way. Um, however, how we get there looks very different in as you mentioned that the governance model in the old world has been very commander control, very centralized. Um, you know, they were responsible for quality. They were responsible for certification off the data, you know, applying making sure the data complies. But also such regulations Make sure you know, data gets discovered and made available in the world of the data mesh. Really. The job of the data governance as a function becomes finding that equilibrium between what decisions need to be um, you know, made and enforced globally. And what decisions need to be made locally so that we can have an interoperable measure. If data sets that can move fast and can change fast like it's really about instead of hardest, you know, kind of putting the putting those systems in a straitjacket of being constant and don't change, embrace, change and continuous change of landscape because that's that's just the reality we can't escape. So the role of governance really the governance model called Federated and Computational. And by that I mean, um, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understand the data that domain really well but also wears that hacks of a product owner. It is an important role that had has to have a representation in the governance. So it's a federation off domains coming together, plus the SMEs and people have, you know, subject matter. Experts who understands the regulations in that environmental understands the data security concerns, but instead off trying to enforce and do this as a central team. They make decisions as what need to be standardized, what need to be enforced. And let's push that into that computational E and in an automated fashion into the into the camp platform itself. For example, instead of trying to do that, you know, be part of the data quality pipeline and inject ourselves as people in that process, let's actually, as a group, define what constitutes quality, like, how do we measure quality? And then let's automate that and let Z codify that into the platform so that every native products will have a C I City pipeline on as part of that pipeline. Those quality metrics gets validated and every day to product needs to publish those SLOC or service level objectives. So you know, whatever we choose as a measure of quality, maybe it's the, you know, the integrity of the data, the delay in the data, the liveliness of it, whatever the are the decisions that you're making, let's codify that. So it's, um, it's really, um, the role of the governance. The objectives of the governance team tried to satisfies the same, but how they do it. It is very, very different. I wrote a new article recently trying to explain the logical architecture that would emerge from applying these principles. And I put a kind of light table to compare and contrast the roll off the You know how we do governance today versus how we will do it differently to just give people a flavor of what does it mean to embrace the centralization? And what does it mean to embrace change and continuous change? Eso hopefully that that that could be helpful. >>Yes, very so many questions I haven't but the point you make it to data quality. Sometimes I feel like quality is the end game. Where is the end game? Should be how fast you could go from idea to monetization with the data service. What happens again? You sort of address this, but what happens to the underlying infrastructure? I mean, spinning a PC to S and S three buckets and my pie torches and tensor flows. And where does that that lives in the business? And who's responsible for that? >>Yeah, that's I'm glad you're asking this question. Maybe because, um, I truly believe we need to re imagine that world. I think there are many pieces that we can use Aziz utilities on foundational pieces, but I but I can see for myself a 5 to 7 year roadmap of building this new tooling. I think, in terms of the ownership, the question around ownership, if that would remains with the platform team, but and perhaps the domain agnostic, technology focused team right that there are providing instead of products themselves. And but the products are the users off those products are data product developers, right? Data domain teams that now have really high expectations in terms of low friction in terms of lead time to create a new data product. Eso We need a new set off tooling, and I think with the language needs to shift from, You know, I need a storage buckets. So I need a storage account. So I need a cluster to run my, you know, spark jobs, too. Here's the declaration of my data products. This is where the data for it will come from. This is the data that I want to serve. These are the policies that I need toe apply in terms of perhaps encryption or access control. Um, go make it happen. Platform, go provision, Everything that I mean so that as a data product developer. All I can focus on is the data itself, representation of semantic and representation of the syntax. And make sure that data meets the quality that I have that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a re imagination and in fact, Andi, there will be a data platform team, the data platform teams that we set up for our clients. In fact, themselves have a favorite of complexity. Internally, they divide into multiple teams multiple planes, eso there would be a plane, as in a group of capabilities that satisfied that data product developer experience, there would be a set of capabilities that deal with those need a greatly underlying utilities. I call it at this point, utilities, because to me that the level of abstraction of the platform is to go higher than where it is. So what we call platform today are a set of utilities will be continuing to using will be continuing to using object storage, will continue using relation of databases and so on so there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that you know enable the mesh level functionality, for example, be able to correlate and connects. And query data from multiple knows. That's a measure level capability to be able to discover and explore the measure data products as a measure of capability. So it would be set of teams as part of platforms with a strong again platform product thinking embedded and product ownership embedded into that. To satisfy the experience of this now business oriented domain data team teams s way have a lot of work to do. >>I could go on. Unfortunately, we're out of time. But I guess my first I want to tell people there's two pieces that you put out so far. One is, uh, how to move beyond a monolithic data lake to a distributed data mesh. You guys should read that in a data mesh principles and logical architectures kind of part two. I guess my last question in the very limited time we have is our organization is ready for this. >>E think the desire is there I've bean overwhelmed with number off large and medium and small and private and public governments and federal, you know, organizations that reached out to us globally. I mean, it's not This is this is a global movement and I'm humbled by the response of the industry. I think they're the desire is there. The pains are really people acknowledge that something needs to change. Here s so that's the first step. I think that awareness isa spreading organizations. They're more and more becoming aware. In fact, many technology providers are reach out to us asking what you know, what shall we do? Because our clients are asking us, You know, people are already asking We need the data vision. We need the tooling to support. It s oh, that awareness is there In terms of the first step of being ready, However, the ingredients of a successful transformation requires top down and bottom up support. So it requires, you know, support from Chief Data Analytics officers or above the most successful clients that we have with data. Make sure the ones that you know the CEOs have made a statement that, you know, we want to change the experience of every single customer using data and we're going to do, we're going to commit to this. So the investment and support, you know, exists from top to all layers. The engineers are excited that maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients. Substance to transformation is to come together. Um, are we really ready for it? I think I think the pioneers, perhaps the innovators. If you think about that innovation, careful. My doctors, probably pioneers and innovators and leaders. Doctors are making making move towards it. And hopefully, as the technology becomes more available, organizations that are less or in, you know, engineering oriented, they don't have the capability in house today, but they can buy it. They would come next. Maybe those are not the ones who aren't quite ready for it because the technology is not readily available. Requires, you know, internal investment today. >>I think you're right on. I think the leaders are gonna lead in hard, and they're gonna show us the path over the next several years. And I think the the end of this decade is gonna be defined a lot differently than the beginning. Jammeh. Thanks so much for coming in. The Cuban. Participate in the >>program. Pleasure head. >>Alright, Keep it right. Everybody went back right after this short break.

Published Date : Jan 22 2021

SUMMARY :

cloud brought to you by silicon angle in 2000 The modern big data movement It's a pleasure to have you on the program. This wonderful to be here. pretty outspoken about the need for a paradigm shift in how we manage our data and our platforms the only way we get access to you know various applications on the Web pages is to So on the left here we're adjusting data from the operational lot of data teams globally just to see, you know, what are the pain points? that's problematic for some of the organizations that you work with and maybe give some examples. And that transformation is that, you know, heavy process, because you fundamentally So let's talk about the answer that you and your colleagues are proposing. the changes we needed to make was always, you know, our fellow Noto, how the architecture was centralized And then you brought in the really important, you know, sticking problem, which is that you know the governance which So at the end of the day, we want quality data accessible to them, um, Where is the end game? And make sure that data meets the quality that I I guess my last question in the very limited time we have is our organization is ready So the investment and support, you know, Participate in the Alright, Keep it right.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

Jean Marc de ConniePERSON

0.99+

Hal VarianPERSON

0.99+

Zhamak DehghaniPERSON

0.99+

New York CityLOCATION

0.99+

John MarkPERSON

0.99+

5QUANTITY

0.99+

Jeff Ham AbakarPERSON

0.99+

two yearQUANTITY

0.99+

two piecesQUANTITY

0.99+

GoogleORGANIZATION

0.99+

JohnPERSON

0.99+

nine monthsQUANTITY

0.99+

2000DATE

0.99+

18 monthsQUANTITY

0.99+

first stepQUANTITY

0.99+

second principleQUANTITY

0.99+

both placesQUANTITY

0.99+

bothQUANTITY

0.99+

OneQUANTITY

0.99+

a yearQUANTITY

0.99+

one partQUANTITY

0.99+

firstQUANTITY

0.99+

Claudette ClouderaPERSON

0.99+

third principleQUANTITY

0.98+

10DATE

0.98+

first principleQUANTITY

0.98+

one domainQUANTITY

0.98+

todayDATE

0.98+

LeePERSON

0.98+

one phraseQUANTITY

0.98+

three phasesQUANTITY

0.98+

CubanOTHER

0.98+

JammehPERSON

0.97+

7 yearQUANTITY

0.97+

MawrPERSON

0.97+

JamaatPERSON

0.97+

last decadeDATE

0.97+

Maurin MawrPERSON

0.94+

single domainQUANTITY

0.92+

one thingQUANTITY

0.91+

ThoughtWorksORGANIZATION

0.9+

oneQUANTITY

0.9+

nineQUANTITY

0.9+

theCUBEORGANIZATION

0.89+

endDATE

0.88+

last few decadesDATE

0.87+

one placeQUANTITY

0.87+

Second Hadoop WorldEVENT

0.86+

threeOTHER

0.85+

C. E. OORGANIZATION

0.84+

this decadeDATE

0.84+

SirisTITLE

0.83+

coming decadeDATE

0.83+

AndiPERSON

0.81+

ChamakhPERSON

0.8+

three bucketsQUANTITY

0.77+

JamaPERSON

0.77+

CubanPERSON

0.76+

AzizORGANIZATION

0.72+

yearsDATE

0.72+

first classQUANTITY

0.72+

last 40DATE

0.67+

single customerQUANTITY

0.66+

part twoOTHER

0.66+

lastDATE

0.66+

CloudTITLE

0.56+

2021DATE

0.55+

next 10 yearsDATE

0.54+

HadoopEVENT

0.53+

following yearDATE

0.53+

yearsQUANTITY

0.51+

CubeORGANIZATION

0.5+

NotoORGANIZATION

0.45+

CubePERSON

0.39+

CubeCOMMERCIAL_ITEM

0.26+

Zhamak Dehghani, Director of Emerging Technologies at ThoughtWorks


 

(bright music) >> In 2009, Hal Varian, Google's Chief Economist said that statisticians would be the sexiest job in the coming decade. The modern big data movement really took off later in the following year, after the second Hadoop World, which was hosted by Cloudera, in New York city. Jeff Hama Bachar, famously declared to me and John Furrie, in "theCUBE," that the best minds of his generation were trying to figure out how to get people to click on ads. And he said that sucks. The industry was abuzz with the realization that data was the new competitive weapon. Hadoop was heralded as the new data management paradigm. Now what actually transpired over the next 10 years was only a small handful of companies could really master the complexities of big data and attract the data science talent, really necessary to realize massive returns. As well, back then, cloud was in the early stages of its adoption. When you think about it at the beginning of the last decade, and as the years passed, more and more data got moved to the cloud, and the number of data sources absolutely exploded, experimentation accelerated, as did the pace of change. Complexity just overwhelmed big data infrastructures and data teams, leading to a continuous stream of incremental technical improvements designed to try and keep pace, things like data lakes, data hubs, new open source projects, new tools, which piled on even more complexity. And as we reported, we believe what's needed is a complete bit flip and how we approach data architectures. Our next guest is Zhamak Dehgani, who is the Director of Emerging Technologies at ThoughtWorks. Zhamak is a software engineer, architect, thought leader and advisor, to some of the world's most prominent enterprises. She's in my view, one of the foremost advocates for rethinking and changing the way we create and manage data architectures, favoring a decentralized over monolithic structure, and elevating domain knowledge as a primary criterion, and how we organize so-called big data teams and platforms. Zhamak, welcome to the cube, it's a pleasure to have you on the program. >> Hi David, it's wonderful to be here. >> Okay. So you're pretty outspoken about the need for a paradigm shift, in how we manage our data, and our platforms at scale. Why do you feel we need such a radical change? What's your thoughts there? >> Well, I think if you just look back over the last decades, you gave us a summary of what happened since 2010. But even if we got it before then, what we have done over the last few decades is basically repeating, and as you mentioned, incrementally improving how we manage data, based on certain assumptions around, as you mentioned, centralization. Data has to be in one place so we can get value from it. But if you look at the parallel movement of our industry in general, since the birth of internet, we are actually moving towards decentralization. If we think today, like if in this move data side, if we said, the only way web would work, the only way we get access to various applications on the web or pages is to centralize it, we would laugh at that idea, but for some reason, we don't question that when it comes to data, right? So I think it's time to embrace the complexity that comes with the growth of number of sources, the proliferation of sources and consumptions models, embrace the distribution of sources of data, that they're not just within one part of organization. They're not just within even bounds of organizations. They're beyond the bounds of organization, and then look back and say, okay, if that's the trend of our industry in general, given the fabric of compensation and data that we put in globally in place, then how the architecture and technology and organizational structure incentives need to move, to embrace that complexity. And to me, that requires a paradigm shift. A full stack from how we organize our organizations, how we organize our teams, how we put a technology in place to look at it from a decentralized angle. >> Okay, so let's unpack that a little bit. I mean, you've spoken about and written today's big architecture, and you've basically just mentioned that it's flawed. So I want to bring up, I love your diagrams, you have a simple diagram, guys if you could bring up figure one. So on the left here, we're adjusting data from the operational systems, and other enterprise data sets. And of course, external data, we cleanse it, you've got to do the quality thing, and then serve them up to the business. So what's wrong with that picture that we just described, and give granted it's a simplified form. >> Yeah. Quite a few things. So, and I would flip the question maybe back to you or the audience. If we said that there are so many sources of the data and actually data comes from systems and from teams that are very diverse in terms of domains, right? Domain. If you just think about, I don't know, retail, the E-Commerce versus auto management, versus customer. These are very diverse domains. The data comes from many different diverse domains, and then we expect to put them under the control of a centralized team, a centralized system. And I know that centralization probably, if you zoom out is centralized, if you zoom in it's compartmentalized based on functions, and we can talk about that. And we assume that the centralized model, will be getting that data, making sense of it, cleansing and transforming it, then to satisfy a need of very diverse set of consumers without really understanding the domains because the teams responsible for it are not close to the source of the data. So there is a bit of a cognitive gap and domain understanding gap, without really understanding how the data is going to be used. I've talked to numerous, when we came to this, I came up with the idea. I talked to a lot of data teams globally, just to see, what are the pain points? How are they doing it? And one thing that was evident in all of those conversations, that they actually didn't know, after they built these pipelines and put the data in, whether the data warehouse tables or linked, they didn't know how the data was being used. But yet they're responsible for making the data available for this diverse set of use cases. So essentially system and monolithic system, often is a bottleneck. So what you find is that a lot of the teams are struggling with satisfying the needs of the consumers, are struggling with really understanding the data, the domain knowledge is lost, there is a loss of understanding and kind of it in that transformation, often we end up training machine learning models on data, that is not really representative of the reality of the business, and then we put them to production and they don't work because the semantic and the syntax of the data gets lost within that translation. So, and we are struggling with finding people to manage a centralized system because still the technology's fairly, in my opinion, fairly low level and exposes the users of those technology sets and let's say they warehouse a lot of complexity. So in summary, I think it's a bottleneck, it's not going to satisfy the pace of change or pace of innovation, and the availability of sources. It's disconnected and fragmented, even though there's centralized, it's disconnected and fragmented from where the data comes from and where the data gets used, and is managed by a team of hyper specialized people, they're struggling to understand the actual value of the data, the actual format of the data. So it's not going to get us where our aspirations, our ambitions need to be. >> Yeah, so the big data platform is essentially, I think you call it context agnostic. And so as data becomes more important in our lives, you've got all these new data sources injected into the system, experimentation as we said, the cloud becomes much, much easier. So one of the blockers that you've cited and you just mentioned it, is you've got these hyper specialized roles, the data engineer, the quality engineer, data scientist. And it's a losery. I mean, it's like an illusion. These guys, they seemingly they're independent, and can scale independently, but I think you've made the point that in fact, they can't. That a change in a data source has an effect across the entire data life cycle, entire data pipeline. So maybe you could add some some color to why that's problematic for some of the organizations that you work with, and maybe give some examples. >> Yeah, absolutely. So in fact initially, the hypothesis around data mesh came from a series of requests that we received from our both large scale and progressive clients, and progressive in terms of their investment in data architecture. So these were clients that were larger scale, they had diverse and rich set of domain, some of them were big technology, tech companies, some of them were big retail companies, big healthcare companies. So they had that diversity of the data and a number of the sources of the domains. They had invested for quite a few years in generations, of they had multi-generations of PROPRICER data warehouses on prem that were moving to cloud. They had moved through the various revisions of the Hadoop clusters, and they were moving to that to cloud, and then the challenges that they were facing were simply... If I want to just simplify it in one phrase, they we're not getting value from the data that they were collecting. They were continuously struggling to shift the culture because there was so much friction between all of these three phases of both consumption of the data, then transformation and making it available. Consumption from sources and then providing it and serving it to the consumer. So that whole process was full of friction. Everybody was unhappy. So it's bottom line is that you're collecting all this data, there is delay, there is lack of trust in the data itself, because the data is not representative of the reality, it's gone through the transformation, but people that didn't understand really what the data was got delayed. And so there's no trust, it's hard to get to the data. Ultimately, it's hard to create value from the data, and people are working really hard and under a lot of pressure, but it's still struggling. So we often, our solutions, like we are... Technologies, we will often point out to technology. So we go. Okay, this version of some proprietary data warehouse we're using is not the right thing. We should go to the cloud and that certainly will solve our problem, right? Or warehouse wasn't a good one, let's make a data Lake version. So instead of extracting and then transforming and loading into the database, and that transformation is that heavy process because you fundamentally made an assumption using warehouses that if I transform this data into this multidimensional perfectly designed schema, that then everybody can draw on whatever query they want, that's going to solve everybody's problem. But in reality, it doesn't because you are delayed and there is no universal model that serves everybody's need, everybody needs are diverse. Data scientists necessarily don't like the perfectly modeled data, they're for both signals and the noise. So then we've just gone from ATLs to let's say now to Lake, which is... Okay, let's move the transformation to the last mile. Let's just get load the data into the object stores and sort of semi-structured files and get the data scientists use it, but they still struggling because of the problems that we mentioned. So then what is the solution? What is the solution? Well, next generation data platform. Let's put it on the cloud. And we saw clients that actually had gone through a year or multiple years of migration to the cloud but it was great, 18 months, I've seen nine months migrations of the warehouse versus two year migrations of various data sources to the cloud. But ultimately the result is the same, unsatisfied, frustrated data users, data providers with lack of ability to innovate quickly on relevant data and have an experience that they deserve to have, have a delightful experience of discovering and exploring data that they trust. And all of that was still amiss. So something else more fundamentally needed to change than just the technology. >> So the linchpin to your scenario is this notion of context. And you pointed out, you made the other observation that "Look we've made our operational systems context aware but our data platforms are not." And like CRM system sales guys are very comfortable with what's in the CRMs system. They own the data. So let's talk about the answer that you and your colleagues are proposing. You're essentially flipping the architecture whereby those domain knowledge workers, the builders if you will, of data products or data services, they are now first-class citizens in the data flow, and they're injecting by design domain knowledge into the system. So I want to put up another one of your charts guys, bring up the figure two there. It talks about convergence. She showed data distributed, domain driven architecture, the self-serve platform design, and this notion of product thinking. So maybe you could explain why this approach is so desirable in your view. >> Sure. The motivation and inspirations for that approach came from studying what has happened over the last few decades in operational systems. We had a very similar problem prior to microservices with monolithic systems. One of the things systems where the bottleneck, the changes we needed to make was always on vertical now to how the architecture was centralized. And we found a nice niche. And I'm not saying this is a perfect way of decoupling your monolith, but it's a way that currently where we are in our journey to become data driven, it is a nice place to be, which is distribution or a decomposition of your system as well as organization. I think whenever we talk about systems, we've got to talk about people and teams that are responsible for managing those systems. So the decomposition of the systems and the teams, and the data around domains. Because that's how today we are decoupling our business, right? We are decoupling our businesses around domains, and that's a good thing. And what does that do really for us? What it does is it localizes change to the bounded context of that business. It creates clear boundary and interfaces and contracts between the rest of the universe of the organization, and that particular team, so removes the friction that often we have for both managing the change, and both serving data or capability. So if the first principle of data meshes, let's decouple this world of analytical data the same to mirror. The same way we have decoupled our systems and teams, and business. Why data is any different. And the moment you do that, so the moment you bring the ownership to people who understands the data best, then you get questions that well, how is that any different from silos of disconnected databases that we have today and nobody can get to the data? So then the rest of the principles is really to address all of the challenges that comes with this first principle of decomposition around domain context. And the second principle is, well, we have to expect a certain level of quality and accountability, and responsibility for the teams that provide the data. So let's bring products thinking and treating data as a product, to the data that these teams now share, and let's put accountability around it. We need a new set of incentives and metrics for domain teams to share the data, we need to have a new set of kind of quality metrics that define what it means for the data to be a product, and we can go through that conversation perhaps later. So then the second principle is, okay, the teams now that are responsible, the domain teams responsible for their analytical data need to provide that data with a certain level of quality and assurance. Let's call that a product, and bring product thinking to that. And then the next question you get asked off at work by CIO or CTO is the people who build the infrastructure and spend the money. They say, well, "It's actually quite complex to manage big data, now where we want everybody, every independent team to manage the full stack of storage and computation and pipelines and access control and all of that." Well, we've solved that problem in operational world. And that requires really a new level of platform thinking to provide infrastructure and tooling to the domain teams to now be able to manage and serve their big data, and I think that requires re-imagining the world of our tooling and technology. But for now, let's just assume that we need a new level of abstraction to hide away a ton of complexity that unnecessarily people get exposed to. And that's the third principle of creating self-serve infrastructure to allow autonomous teams to build their domains. But then the last pillar, the last fundamental pillar is okay, once he distributed a problem into smaller problems that you found yourself with another set of problems, which is how I'm going to connect this data. The insights happens and emerges from the interconnection of the data domains, right? It's just not necessarily locked into one domain. So the concerns around interoperability and standardization and getting value as a result of composition and interconnection of these domains requires a new approach to governance. And we have to think about governance very differently based on a federated model. And based on a computational model. Like once we have this powerful self-serve platform, we can computationally automate a lot of covenants decisions and security decisions, and policy decisions, that applies to this fabric of mesh, not just a single domain or not in a centralized. So really, as you mentioned, the most important component of the data mesh is distribution of ownership and distribution of architecture in data, the rest of them is to solve all the problems that come with that. >> So, very powerful. And guys, we actually have a picture of what Zhamak just described. Bring up figure three, if you would. So I mean, essentially, you're advocating for the pushing of the pipeline and all its various functions into the lines of business and abstracting that complexity of the underlying infrastructure which you kind of show here in this figure, data infrastructure as a platform down below. And you know why I love about this, Zhamak, is, to me it underscores the data is not the new oil. Because I can put oil in my car, I can put it in my house but I can't put the same code in both places. But I think you call it polyglot data, which is really different forms, batch or whatever. But the same data doesn't follow the laws of scarcity. I can use the same data for many, many uses, and that's what this sort of graphic shows. And then you brought in the really important, sticking problem, which is that the governance which is now not a command and control, it's federated governance. So maybe you could add some thoughts on that. >> Sure, absolutely. It's one of those, I think I keep referring to data mesh as a paradigm shift, and it's not just to make it sound grand and like kind of grand and exciting or important, it's really because I want to point out, we need to question every moment when we make a decision around, how we're going to design security, or governance or modeling of the data. We need to reflect and go back and say, "Am I applying some of my cognitive biases around how I have worked for the last 40 years?" I've seen it work? Or "Do I do I really need to question?" And do need to question the way we have applied governance. I think at the end of the day, the role of the data governance and the objective remains the same. I mean, we all want quality data accessible to a diverse set of users and its users now know have different personas, like data persona, data analysts, data scientists, data application user. These are very diverse personas. So at the end of the day, we want quality data accessible to them, trustworthy in an easy consumable way. However, how we get there looks very different in as you mentioned that the governance model in the old world has been very command and control, very centralized. They were responsible for quality, they were responsible for certification of the data, applying and making sure the data complies with all sorts of regulations, make sure data gets discovered and made available. In the world of data mesh, really the job of the data governance as a function becomes finding the equilibrium between what decisions need to be made and enforced globally, and what decisions need to be made locally so that we can have an interoperable mesh of data sets that can move fast and can change fast. It's really about, instead of kind of putting those systems in a straight jacket of being constantly and don't change, embrace change, and continuous change of landscape because that's just the reality we can't escape. So the role of governance really, the modern governance model I called federated and computational. And by that I mean, every domain needs to have a representative in the governance team. So the role of the data or domain data product owner who really were understands that domain really well, but also wears that hats of the product owner. It's an important role that has to have a representation in the governance. So it's a federation of domains coming together. Plus the SMEs, and people have Subject Matter Experts who understand the regulations in that environment, who understands the data security concerns. But instead of trying to enforce and do this as a central team, they make decisions as what needs to be standardized. What needs to be enforced. And let's push that into that computationally and in an automated fashion into the platform itself, For example. Instead of trying to be part of the data quality pipeline and inject ourselves as people in that process, let's actually as a group, define what constitutes quality. How do we measure quality? And then let's automate that, and let's codify that into the platform, so that every day the products will have a CICD pipeline, and as part of that pipeline, law's quality metrics gets validated, and every day to product needs to publish those SLOs or Service Level Objectives, or whatever we choose as a measure of quality, maybe it's the integrity of the data, or the delay in the data, the liveliness of the data, whatever are the decisions that you're making. Let's codify that. So it's really the objectives of the governance team trying to satisfies the same, but how they do it, it's very, very different. And I wrote a new article recently, trying to explain the logical architecture that would emerge from applying these principles, and I put a kind of a light table to compare and contrast how we do governance today, versus how we'll do it differently, to just give people a flavor of what does it mean to embrace decentralization, and what does it mean to embrace change, and continuous change. So hopefully that could be helpful. >> Yes. There's so many questions I have. But the point you make it too on data quality, sometimes I feel like quality is the end game, Where the end game should be how fast you can go from idea to monetization with a data service. What happens again? And you've sort of addressed this, but what happens to the underlying infrastructure? I mean, spinning up EC2s and S3 buckets, and MyPytorches and TensorFlows. That lives in the business, and who's responding for that? >> Yeah, that's why I'm glad you're asking this question, David, because I truly believe we need to reimagine that world. I think there are many pieces that we can use as utilities are foundational pieces, but I can see for myself at five to seven year road map building this new tooling. I think in terms of the ownership, the question around ownership, that would remain with the platform team, but I don't perhaps a domain agnostic technology focused team, right? That there are providing a set of products themselves, but the users of those products are data product developers, right? Data domain teams that now have really high expectations, in terms of low friction, in terms of a lead time to create a new data products. So we need a new set of tooling and I think the language needs to shift from I need a storage bucket, or I need a storage account, to I need a cluster to run my spark jobs. Too, here's the declaration of my data products. This is where the data file will come from, this is a data that I want to serve, these are the policies that I need to apply in terms of perhaps encryption or access control, go make it happen platform, go provision everything that I need, so that as a data product developer, all I can focus on is the data itself. Representation of semantic and representation of the syntax, and make sure that data meets the quality that I have to assure and it's available. The rest of provisioning of everything that sits underneath will have to get taken care of by the platform. And that's what I mean by requires a reimagination. And there will be a data platform team. The data platform teams that we set up for our clients, in fact themselves have a fair bit of complexity internally, they divide into multiple teams, multiple planes. So there would be a plane, as in a group of capabilities that satisfied that data product developer experience. There would be a set of capabilities that deal with those nitty gritty underlying utilities, I call them (indistinct) utilities because to me, the level of abstraction of the platform needs to go higher than where it is. So what we call platform today are a set of utilities we'll be continuing to using. We'll be continuing to using object storage, we will continue to using relational databases and so on. So there will be a plane and a group of people responsible for that. There will be a group of people responsible for capabilities that enable the mesh level functionality, for example, be able to correlate and connect and query data from multiple nodes, that's a mesh level capability, to be able to discover and explore the mesh of data products, that's the mesh of capability. So it would be a set of teams as part of platform. So we use a strong, again, products thinking embedded in a product and ownership embedded into that to satisfy the experience of this now business oriented domain data teams. So we have a lot of work to do. >> I could go on, unfortunately, we're out of time, but I guess, first of all, I want to tell people there's two pieces that you've put out so far. One is how to move beyond a Monolithic Data Lake to a distributed data mesh. You guys should read that in the "Data Mesh Principles and Logical Architecture," is kind of part two. I guess my last question in the very limited time we have is are organizations ready for this? >> I think how the desire is there. I've been overwhelmed with the number of large and medium and small and private and public, and governments and federal organizations that reached out to us globally. I mean, this is a global movement and I'm humbled by the response of the industry. I think, the desire is there, the pains are real, people acknowledge that something needs to change here. So that's the first step. I think awareness is spreading, organizations are more and more becoming aware, in fact, many technology providers are reaching to us asking what shall we do because our clients are asking us, people are already asking, we need the data mesh and we need the tooling to support it. So that awareness is there in terms of the first step of being ready. However, the ingredients of a successful transformation requires top-down and bottom-up support. So it requires support from chief data analytics officers, all above, the most successful clients that we have with data mesh are the ones that, the CEOs have made a statement that, "We'd want to change the experience of every single customer using data, and we're going to commit to this." So the investment and support exists from top to all layers, the engineers are excited, the maybe perhaps the traditional data teams are open to change. So there are a lot of ingredients of transformations that come together. Are we really ready for it? I think the pioneers, perhaps, the innovators if you think about that innovation curve of adopters, probably pioneers and innovators and lead adopters are making moves towards it, and hopefully as the technology becomes more available, organizations that are less engineering oriented, they don't have the capability in-house today, but they can buy it, they would come next. Maybe those are not the ones who are quite ready for it because the technology is not readily available and requires internal investments to make. >> I think you're right on. I think the leaders are going to lean in hard and they're going to show us the path over the next several years. And I think that the end of this decade is going to be defined a lot differently than the beginning. Zhamak, thanks so much for coming to "theCUBE" and participating in the program. >> Thank you for hosting me, David. >> Pleasure having you. >> It's been wonderful. >> All right, keep it right there everybody, we'll be back right after this short break. (slow music)

Published Date : Dec 23 2020

SUMMARY :

and attract the data science and our platforms at scale. and data that we put in globally in place, So on the left here, we're adjusting data how the data is going to be used. So one of the blockers that you've cited and a number of the So the linchpin to your scenario for the data to be a product, is that the governance So at the end of the day, we But the point you make and make sure that data meets the quality in the "Data Mesh Principles and hopefully as the technology and participating in the program. after this short break.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

DavidPERSON

0.99+

MichaelPERSON

0.99+

Marc LemirePERSON

0.99+

Chris O'BrienPERSON

0.99+

VerizonORGANIZATION

0.99+

HilaryPERSON

0.99+

MarkPERSON

0.99+

Dave VellantePERSON

0.99+

Ildiko VancsaPERSON

0.99+

JohnPERSON

0.99+

Alan CohenPERSON

0.99+

Lisa MartinPERSON

0.99+

John TroyerPERSON

0.99+

RajivPERSON

0.99+

EuropeLOCATION

0.99+

Stefan RennerPERSON

0.99+

IldikoPERSON

0.99+

Mark LohmeyerPERSON

0.99+

JJ DavisPERSON

0.99+

IBMORGANIZATION

0.99+

BethPERSON

0.99+

Jon BakkePERSON

0.99+

John FarrierPERSON

0.99+

BoeingORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Dave NicholsonPERSON

0.99+

Cassandra GarberPERSON

0.99+

Peter McKayPERSON

0.99+

CiscoORGANIZATION

0.99+

Dave BrownPERSON

0.99+

Beth CohenPERSON

0.99+

Stu MinimanPERSON

0.99+

John WallsPERSON

0.99+

Seth DobrinPERSON

0.99+

SeattleLOCATION

0.99+

5QUANTITY

0.99+

Hal VarianPERSON

0.99+

JJPERSON

0.99+

Jen SaavedraPERSON

0.99+

Michael LoomisPERSON

0.99+

LisaPERSON

0.99+

JonPERSON

0.99+

Rajiv RamaswamiPERSON

0.99+

StefanPERSON

0.99+

Gary Conway, Automation Anywhere | CUBEConversation, August 2019


 

(upbeat music) >> From our studios in the heart of Silicon Valley, Palo Alto, California, this is a CUBEConversation. >> Hello everyone, welcome to Palo Alto's CUBE studios. I'm John Furrier, host of theCUBE. We're here for a special CUBEConversation as part of our new brand of tech leader series as well as Extracting the Signal From the Noise. We're here with Gary Conway, the CMO of Automation Anywhere, a hot startup, heavily funded, attacking a whole new market segment, that's kind of changing the game of value in digital, obviously, RPA, robotic process automation, is the buzz word. It's actually real, it's happening, we're seeing a lot of success in companies there. It's changing the way business is operated, business is structured, and value is created. Gary, thanks for joining me. >> My pleasure. >> So we covered your event, Automation Anywhere. You guys are essentially doing very very well, heavily funded, growing like crazy. RPA is one of the fastest growing segments in this next generation digital culture. You're seeing a lot of companies coming out attacking this. What's your perspective, why is RPA so important, why is it so hot? >> It's a pretty simple reason, actually. You know, the truth of the matter is that companies are now, because of RPA, able to automate parts of their business processes, or entire processes that they were never able to automate before, and they can do it with RPA at a relatively little cost compared to a lot of other technologies out there, especially from the big ERP vendors. We say that, and we really believe this, and we're finding this to be true, that since the onset of automation about 30 years ago, from the big technology companies, only about 20% of the processes that businesses manage now are actually automated. The 80% of them that are not automated are pretty much done by human beings, you know. Millions of human beings employed to manage those back office processes. RPA is enabling companies to actually automate more of those processes than ever before. >> Before we get started, just quickly define, what is RPA for the folks that are learning for the first time, because we're now seeing the concept really penetrating mainstream right now. It's becoming, frankly, a topic that's being discussed across most of the largest enterprises and small business, what is RPA? >> So RPA means robotic process automation. So think of them as robots that are built as predesigned software bots that you can plug into any business process, and it'll automate a part of that process, or the entire process, by just plugging it in. It actually is capable of observing what human beings do, remembering what human beings do, and then repeating that again and again and again, only in a fraction of a second. That's the easiest way to think of it. >> So when I think of robots, I think of like a machine, you know, moving things around, from, like, manufacturing, whatnot. It's beyond that, it's not just robots, it's software as well, and this is the key in all this. >> It is software, I mean. >> It is software, it's not robots. >> RPA is only software. >> It's only software. >> I think most people, when they think of robotics, do think of, you know, mechanical robots used in manufacturing. That's not what RPA is. RPA is robotics that is only constructed with preconfigured software. >> I want to get your take on the impact to business and how leaders are adapting to this, but first I want to get to the mainstream topic that is trying to be figured out, and the classic one is technology's going to automate my jobs away, and the example that I use is retail. Most people go to retail, and they think, you know, whether it's a person out of college, or someone working in retail, that oh my god, a robot's going to show up and move stuff on the shelves, and eliminate those jobs. It's not so much robots, per se, it's Amazon that's going to impact in retail. We know what Amazon and Walmart has done to commerce. So that's already happening, retail's impacted. It's not so much that jobs are going away, they're just changing. That's our opinion. Can you share your opinion on the impact of software automation to jobs? >> We agree that jobs are not going away. They will change, but I always tell people when I'm asked this question that there's not been one technology that's ever been introduced that has actually done anything but create more jobs, and I always use the example of the PC. You know, I'm old enough to remember when the PC was introduced, the headlines were what will people do with all this additional time? You know, people were predicting a three day work week because of all the efficiencies that would be created by the PCs, and in fact the opposite has happened. Technology actually makes people more productive, and when they're more productive they're capable of doing more things. So with the automation of certain things that people happen to be doing now, those people are being upskilled, they are being redeployed to other jobs, as we've seen in the past, and actually, more jobs are being created. >> You know, we cover a lot of the Big Data space going back to 2010 when we first started theCUBE, at Hadoop World, which that kind of had its course, but ultimately Big Data, which became AI, you know the bank teller example, you know the ATM was going to kill the branches, when in reality there's been even more branch offices-- >> That's what we're seeing, yeah. >> than ever before. So again, I think the argument is pretty clear from the data and the trend, technology is actually helping create new jobs, but not the jobs maybe that there were once there. That seems to be the big debate, so we agree with you on that. Now we applied some of our, not RPA, but we had some technology that applied to all of our videos that we did with you at your event, and a couple things came out of the entity extraction. I want to share with you, I want to get your reaction. Business hubs, human versus machines, complex problems, digital colleagues, digital worker, new potential applications, digital native companies, supply chain, system integrators, labor platforms, AI assistance, inefficiencies, and machine learning. These are key words that really kind of point to the next generation. This is essentially the language of your company. What's your reaction to that? >> Well, I'm not sure it's the language of our company as much as it's the language that people are using to determine what role they will play in the future, and what role, how they will impact their businesses going into the future. So these are not our terms, these are terms that exist in the space right now as people try to determine for themselves the role they will play in defining the future and how they will use technology to make their businesses more efficient. >> And companies are using cloud, for instance, to kind of reshape. We had a big conversation yesterday around, you know, do I want to be in the business of managing data centers, or be in the business of managing my business with technology. These concepts are interesting from an industry standpoint. Business hubs. Good concept, I get that. Digital worker. This is the impact that you guys are enabling. What's the managerial leadership role as an executive or a worker in these new cultural shifts? Because, as this is being enabled, new value is being created. Digital is enabling that. How does someone manage all this? What do you guys see, how do you see that playing out? >> Look, I think that whenever things are changing, and things are changing dramatically in business today, the only way to manage it is a day at a time. You can't project yourself so far into the future that you trip over the things that are immediately facing you now. So my suggestion would always be to evaluate options every day, every week, and make decisions when it's the right time to make decisions for your business. But let's go back to one of the terms you described, digital worker. So a digital worker in our view is actually available in what we call our bot store, which is a bot that is actually preconfigured to have skillsets that you would require. So let's just say you need an order-to-cash person, person who understands that, and it's a part of an automated process. The idea is that you would be able to download a digital worker with similar skills, and plug that bot into your process, and it would begin to work with, I would say, the skillsets of somebody who understands the order-to-cash process. That's really what a digital worker is. Now imagine that, in the future, and that future is not that far away, where every human being will be working side by side with a digital worker, so that the human being can offload the repetitive things that a digital something could actually do for them, and that digital worker would take on the task-based stuff, freeing up the individual to use their creativity to create higher order value for the business. That's really what we mean by digital worker and the importance of a digital colleague, for example. >> I think that it's a profound statement, and I think this is one of the cultural shifts that I see that this next generation workforce and leaders have to get their arms around, and in watching folks in Washington, D.C., we've been covering a lot of the procurement changes going on in government and businesses. There's a leveling up going on in the IQ of organizations, because that is a profound statement. Now we saw that with DevOps in cloud. You know, you talk to tech people, if you're doing the repetitive task more than three times, automate it. You're getting at something a little bit different. You're not just automating, you're adding intelligence to it, so what I like about the process automation area, is it's not just an undifferentiated, heavy lifting, mundane task. Yes it is, but there's an era of machine learning, you're seeing intelligence being applied to it, so it's truly becoming an augmentation to a human. That's kind of what I hear you saying. Do you agree with that, and is that something that you guys see happening, and what does that actually mean for the enterprise? >> No, I do agree with it, and we are at various stages of that evolution. But like anything else in business, and in life, you don't just flip a switch and all of a sudden people migrate to that new model, that's not how life really works. We evolve to those things, and I think what we're seeing is a very fast evolution to exactly what you just described. >> I want to get your thoughts on operationalizing new technology. You know, obviously, being an entrepreneur, I've done a bunch of startups, and the startup ethos is come on a narrow entry, get a landing area, and then sequence to the broader market opportunity. There's a lot of entrepreneurial ethos involved in how to operationalize something new like RPA, because you can't just, you know, shut down the old and bring in the new, there's a method there. This is a challenge in any new technology. How do you guys see this playing out? Because you guys are on the front end, bringing real value to the table, but people might want to get more aspirational and then get the reality. How do you get into the point of going into someone and saying I love what you guys do, what's the playbook, what do I do next? This is the challenge, can you share your thoughts on how an executive or a business can operationalize these benefits? >> So we have a lot of customers, 1800 customers, unique customers, and 2800 entities around the world that are using the software now. And I think that each of them had one thing in common. They started in bite-sized chunks. They said we're going to try this, and what's happening with RPA, which is one of the reasons it's growing so fast, is that once you try it, once you implement a few bots to automate the things that you weren't able to automate before, it starts ramping like this, right? It has a very very fast ramp-up. So you realize some successes in the processes that you begin to automate that you've never automated before. And the more you do it, the more you learn from it. The more you learn from it, the more you want to do it, the more processes you identify that could be automated, and should be automated, and what starts happening in most companies is they start adopting much much faster once they understand the benefits of it. And the benefits to business is driving higher levels of efficiencies, and reducing costs dramatically. >> So the tie to value is fast. >> Right, the value is very fast, compared to-- >> And that's driving the ramp-up, to your point. >> And that's driving the map. >> The flywheel kicks in, you start with a process that's known, and you automate it, wow, that's good, do it again, do it again. >> Correct. Well, do it again, and do it with more processes, right? And the other unique thing about this technology is human beings, once they understand the advantages of automating things that other human beings may have to do manually, most of those people who have been doing them manually will say I want more of that. We should be automating this, we should be automating that, and it actually makes them much more productive, and it makes them feel as if they are delivering higher value to the business themselves, and what an amazing human dynamic that is. >> You know, I was talking to Dave Vellante about this, we were talking about the TAM, the total adjustment market, for RPA, we're like, I think it's just in the trillions because with digital, everything is connected, so you can measure everything. Everything is ultimately a supply chain, whether it's network effect for internet, whether it's, you know, some process with cryptocurrency, whether it's blockchain or a process with cybersecurity, digital is pretty much connected, it's pretty much a supply chain. Some of them are more formed than others. This seems to be the entry point that most people would go to. Do they go to the supply chains first, or, better yet, what's the use cases that you see as the low hanging fruit that people come in on and automate? Is it simple supply chain stuff that's known, or are they applying it as they grow to other areas? >> It's very broad, but the fastest adoption, especially beginning about two years ago, were from the companies in industries like banking, other financial services, insurance, healthcare, manufacturing, which is supply chain, as you rightly point out. Those businesses that tend to be earlier adopters of technology have also become earlier adopters of RPA. But what we're finding now is it's now, because of the results that these businesses have demonstrated, and because digital native competitors are actually coming into the space and threatening what are sometimes referred to as legacy businesses, businesses are not delaying the investments they're making so that they can actually become more competitive, and when you think about that, it's not just the efficiencies that these technologies like RPA drive, but it's the ability to make businesses acutely more competitive than they've ever been before. >> That's a great angle, competitive strategy has always been one of those things where, you know, the cloud native world or digital native world was like oh yeah, pick one feature, innovate, and you can go beat an incumbent. The incumbent now has leverage in the marketplace, whether it's physical presence or other assets. Using RPA gives them a way to level up, so to speak. >> Level up, for sure. So let's just take something we're all familiar with, right? You can now go on your phone, and you can have a car at your house to take you somewhere in about four minutes in most cities, right? If you have an issue, you can solve that issue on your phone as well. You don't have to call anybody, you just solve it on your phone. These ride share companies have made it so simple, it's almost as if there's no such thing anymore as a front office or a back office. Digital native companies have brought those things together, and now there's one office. So that immediacy is what legacy companies are actually competing against, and if those companies don't adopt this kind of automation to make more efficient those processes and narrow the gap between customer facing and back office, they won't be able to compete. >> Yeah, they can turn a liability into an advantage, with software. Big big bullish on the software, I think the competitive landscape also is interesting, I'd like your thoughts on. There seems to be a battlefield, at least from my perspective, my opinion is that, okay, RPA software is out there, it's going to grow really fast. The competitive battle will be around intelligence. How do you guys view the competitive levers? How do you guys compete, what's the advantage? Is it intelligence, is it being more intelligent, is it more operational, what's the advantage you guys see vis-a-vis the competition? >> Yeah, so we're actually seeing a sort of a bringing together of technology, what we have considered to be strictly technology, and what's being described broadly now as artificial intelligence. Artificial intelligence is still evolving. Everybody has his own definition of what it really is, but what we're seeing, and I think in other sectors we're seeing the same thing, is now the merging of things that have truly been technology with things that are perceived to be artificial intelligence, and they're beginning to come together. What that will look like five years from now, nobody knows. What it'll look like 10 years from now, no one can even conceive of, but we're seeing that dynamic in place now, and this is the beginning. >> It's a great wave, excited to have you on and share your insights, Gary. It's great stuff you guys are doing over there at Automation Anywhere, love the, we love this wave, I think it's going to be relevant. My final question for you, though, is little bit different. You know, you're at a cocktail party, you're at a friend's house, you're at a confab, and you see people that aren't in the business, and they're like Gary, I need to get, I need to be more competitive. What do I do, what is this RPA thing, how do I change my culture, how do I get my people and my process aligned with software, what's the playbook, what's your advice? >> So what I would say is, get started as quickly as possible, because if you delay too long, you will be left behind. So that's would be my first bit of advice. The other, it would be to start slowly. Learn as quickly as you can. Don't worry about automating things that are hard to automate, go to the things that are easy to automate. Companies find that when they address those things first, they're actually able to drive more success faster, and then they will look for more and more opportunities based on what they've learned and the success that they've derived, and that's what happens to create this ramp effect, where it becomes almost viral-like. Where you have one process that works great, you automate that, you automate another one, you automate five more, 10 more, and before you know it, believe it or not, we have customers that are implementing more than 3000 bots over the last year and a half, and that's how they started. >> Get rid of the mundane work, you've got happy people, HR is happy, you've got more revenue coming in, you're more competitive as a business, this is a good value proposition. It's an easy sale. >> Nothing's easy, but it has a huge appeal. >> Gary, thanks so much for coming on and sharing your insights around RPA, appreciate it and congratulations on your success. >> Thank you. >> This is CUBEConversation, and I'm John Furrier here in Palo Alto, thanks for watching. (upbeat music)

Published Date : Aug 1 2019

SUMMARY :

in the heart of Silicon Valley, that's kind of changing the game of value in digital, RPA is one of the fastest growing segments that since the onset of automation about 30 years ago, across most of the largest enterprises and small business, that you can plug into any business process, you know, moving things around, do think of, you know, Most people go to retail, and they think, you know, because of all the efficiencies that would be created that we did with you at your event, and what role, how they will impact their businesses This is the impact that you guys are enabling. The idea is that you would be able to download That's kind of what I hear you saying. what you just described. This is the challenge, can you share your thoughts And the more you do it, the more you learn from it. and you automate it, wow, that's good, and what an amazing human dynamic that is. so you can measure everything. and when you think about that, and you can go beat an incumbent. and you can have a car at your house to take you somewhere How do you guys view the competitive levers? and they're beginning to come together. and you see people that aren't in the business, and the success that they've derived, Get rid of the mundane work, you've got happy people, and sharing your insights around RPA, This is CUBEConversation, and I'm John Furrier

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

AmazonORGANIZATION

0.99+

GaryPERSON

0.99+

August 2019DATE

0.99+

WalmartORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

John FurrierPERSON

0.99+

Silicon ValleyLOCATION

0.99+

80%QUANTITY

0.99+

1800 customersQUANTITY

0.99+

Washington, D.C.LOCATION

0.99+

2800 entitiesQUANTITY

0.99+

2010DATE

0.99+

Gary ConwayPERSON

0.99+

yesterdayDATE

0.99+

one officeQUANTITY

0.99+

eachQUANTITY

0.99+

more than 3000 botsQUANTITY

0.99+

oneQUANTITY

0.99+

one thingQUANTITY

0.99+

first timeQUANTITY

0.98+

one processQUANTITY

0.98+

CUBEConversationEVENT

0.98+

about 20%QUANTITY

0.97+

more than three timesQUANTITY

0.96+

10 moreQUANTITY

0.96+

about four minutesQUANTITY

0.95+

firstQUANTITY

0.95+

todayDATE

0.93+

one technologyQUANTITY

0.93+

about 30 years agoDATE

0.92+

RPATITLE

0.9+

Palo Alto, CaliforniaLOCATION

0.9+

a dayQUANTITY

0.88+

about two years agoDATE

0.88+

first bitQUANTITY

0.87+

CUBEORGANIZATION

0.86+

10 yearsQUANTITY

0.84+

Hadoop WorldORGANIZATION

0.84+

last year and a halfDATE

0.83+

CUBEConversationORGANIZATION

0.82+

three day work weekQUANTITY

0.82+

the Signal From the NoiseTITLE

0.8+

Millions of human beingsQUANTITY

0.78+

five moreQUANTITY

0.77+

Automation AnywhereORGANIZATION

0.74+

onceQUANTITY

0.74+

trillionsQUANTITY

0.73+

five yearsQUANTITY

0.71+

AutomationORGANIZATION

0.67+

of a secondQUANTITY

0.63+

theCUBEORGANIZATION

0.62+

coupleQUANTITY

0.6+

TAMORGANIZATION

0.59+

Anjul Bhambhri, Adobe | Adobe Summit 2019


 

>> Live from Las Vegas. It's the queue covering Adobe Summit twenty nineteen brought to you by Adobe. >> Hey, welcome back, everyone. Cube live coverage here in Las Vegas for Adobe sum of twenty nineteen. I'm John for which have Frick. Where he with a cube alumni that had job for three years. And you'LL Bhambri, Vice president of Platform Engineering at Adobe. Great to see you. Thanks for coming by. >> Thank you. >> Let's talk. Engineering. That was your line on the keynote. Great Kino today, by the way, super impressed with content. I'm washing that slides you're presenting, like were to cloud company. I'm failing my Amazon reinvent here. You guys built a really cool platform. Take us through. This was your mission. That's true. So take us through your journey. So how'd we get here? How did you get this beautiful platform? >> So, you know, we've been at it for a few years, and as you know, we've seen CEOs and see emos late. That their focus is to really deliver, you know, delightful experiences to their customers. And not just once, but throughout the journey off the customer. Right? Delight your customer. Every step of the way is what you'LL hear from Adobe from our customers. And we are really helping them to do that. And obviously, in order to do that, there is on, as you well know, that data is behind everything to do with experiences as well. There is a lot ofthe interaction of data and bringing it all together to really understand that holistic view of the customer is super important. And, you know, as you've been this realist, you know, the holistic view of the customer. It's not that you just ended once, and you forget about it, right? You have to build this in real time because the interactions that customers are having with brands are to wear through mobile devices to the apse that they're using off the those brands. And the businesses have to understand that whole journey off the customers and understand what their preferences are. Write what? You know what they like, what they don't like and be able to keeping like that context really during the journey. Whether they're coming to their Web site for the first time are they are repeat, customers be able to give them the right experience at every touch point. And that's where you need all of this data, which is a lot of data. So so you know, We've been on this big data journey on me personally, even, you know, for a long time. But the scale that I've seen here I had not seen before >> our IBM conscious when you weren't IBM prior from Hadoop World, you had your eye on this big data trend. Now, at Adobe, when you have really data coming in with apple cases out in the market place to put a platform together. Hard task. But I want to ask you specific question around that. Looking at the architecture slide you have and analytics cloud and add Cloud a marketing cloud in the commerce cloud. They all have Marcus that they have to address and be highly effective as almost appear placed in alone. But now, integrating across each other now with the journey that you guys were put together is difficult. I know that from a computer science background. How does how did you guys look at that? Architecturally, what were some of the guiding principles around building that? Because you don't want to compromise the capabilities of those functional elements. So you decompose and I get that. How did you put it all together? What was the key guiding principle around. >> Yeah, so that's a really good question, because I mean, Adobe has bean delivering applications, right? Like you said, whether it's around analytics, our marketing cloud or advertising. And now we obviously just acquired the commerce cloud on DH. When you look at the common stuff around all of this, it's data, right? Data being captured, two different channels, data that needs to be curated, you know, having a common data dictionary so that, you know, things mean the same on DH, even though they're captured two different channels. So gathering this data curating this data, organizing it for that holistic view of the customer organizing it so that you can do B I, and reporting on that data is all something that we pull together in the platform there. Now it becomes that whether it is you're doing analytics on this right, which could be a B I and the putting all your doing I and Melander is to do your next best action. All your targeting these customers with personalized content. You're doing it on that single version of the truth, which is the real time customer profile that powers all of these different clouds. So that it's not like when you do reporting you have one view ofthe a customer. But when you're trying to show them personalized content, half the view is lost because the data was siloed. So we've gone past all of that. There's no data silos now, right? >> Real time customer profile is literally being updated all the time. That's the key in great, exciting part about it is a curious >> kind of philosophically. And execution is like you've been in this space for a long time, and one of the jokes I left shares, you know, we used to make decisions based on a sampling of something that happened in the past. Now you know, we can make decisions based on all of the data that's happening now, but at the same time, your challenges, that source's heir changing all the time. The speed of the input is changing all the time, and the expected return on your reaction is shortening all the time. So from from just a date, a professional and I'm sure it's super exciting and super scary to move that paradigm shift to you got to deliver the right thing right now >> and you know, one of the key things field is that as all of this data was being gathered, right, obviously this data has to be gathered with these events are occurring. So if you look at glands, their customers are global. They are transacting browsing, whether it's on where mobile devices with that land globally around the world. That means data has to be collected from these globally distributed edges. And it has to be brought in processed in real time pending that profile. And as the data keeps coming, the profile is updated right? And and you can't have stained a dying, they're right, because otherwise, you know you are action ing based on something that happened five minutes ago. You know how we've seen that you buy something and you're still getting ads off that same product that you buy even a day or two days late? >> Already bought ten anymore. Ten. >> So that's because that bland has a stale profile off you, right? But if they had the real time customer profile, then there's no way that they would be delivering our action ing based on that stale information. So just like the data was being gathered from edges even when we have to deliver the experiences right. This is where edge computing comes into the picture, right? So we are also taking. So when you look at the whole architecture of the platform, yes, it's based on the cloud and you know it's a big data stack. It's completely assassin offering. But there is also a big edge computing part of the platform, which is where all the hard data is collected. Process and action and to your point, trade, like as we build, say, predictive models on Ex Best action on the data that's on the cloud. The scoring off the models has to happen on the edges where the events are crying. So this is a complicated engineering problem. But that's why I guess we love it. >> Big smile. So the data is critical. So about how adobes changed over the past few years because you guys did clown. I heard the nuance. I heard that keynote, you know, reading through the names of the lines. Is that it? It's hard to get data right at the beginning. Yeah, get cloud right now. You got data rights. Take us through that point because this is where I think the key to success is how to make that data work. Because if you're gonna have open AP eyes and open data integrity, that data right database, it's a time Siri's aircraft dated. A lot of different applications might choose certain technology. Yes, you have to deal with that. How, how important is the texture on that? >> So So that's why that's a great question that, you know, from a platform standpoint, our goal is that we have to be able to answer the questions with the right laden see or speed as well as relevancy, right? So when we talk really time, it's about it's Leighton sees. You know, when you talk to engineers, they only talk agency. But it's not that right. It's needn't see and relevancy. So in order to depending on. Like if it's more like B I r. Reporting kind off questions or queries, you need to organize the data certainly for, you know, single lookups off customers, right? You have to organize the data differently, and that's where our I'd be comes into the picture that how do we partition and organize this data to meet the needs ofthe both operational as well as the more, you know, like analytical kind ofthe workloads. So we support both and to your point, also that, you know, then we need a sequel database where there's no sequel database are a graft database. I mean, those are choices we make, but on top, they're providing FBI's. So we're abstracting all of that from the user. And you know how where we direct question, that's all R ight, but their applications are not going to break because they're writing to the FBI's. So as technologies advance underneath, we make those choices, but again so that they're getting the right agency and relevancy. >> So in the cloud game, we used to talk about this when you when you're on the Cuban way, an IBM the devil's movement was full tilt and they use the term infrastructure is code. Uh, so you're kind of getting out. I want to get your reaction to this Is that if applications and workloads are the use, cases are gonna determine the date of structures, data architecture and Leighton see relevance equation isn't. Then there's a new kind of infrastructures code emerging. Is that data as code? So, or maybe it's this should that workloads dictate what type of data diversity and Leighton see relevance is needed Or is that come from the network again? The question is, workloads are kind of in charge, I guess. What? I'm trying to get out. So >> I Yeah, I would say that, you know, as a platform, you have to support all of these workloads, right? So which means that from an architecture standpoint, we have to make sure that whether it's analytical, kindof a question or workload like B. I reporting whether it is, you know, more like an operational kind ofthe question around, You know that you want to just do a quick question around. You know, what did this customer by or what John's action happened? The underneath data structures and databases we have to pick the right ones so that way are able to support both >> the expectations, the expected yes, the expectations of the workload. >> It is. >> You're running commerce. Leighton Seon Relevance. Low latent. She's going to be in the milliseconds or >> gut ache >> and relevance. Gus, have a high bar there, too. Analytics query for a B. I tool might be, if every second so again, this is a huge Delta in terms of capabilities, and I think that will happen on the flies hard. Yes. How do you guys do that was sauce. >> Yeah, so that's That's the, you know, underlying technology that you know the way we are bending, that is, so that you can support both of those and wait with the customers were sticking to that. They wants equal access to the data they're getting. That's equal access now, depending on the kind ofthe queries, whether they, Paula's B I and reporting are more like transactional kind of things in nature. That's the that. Those are the right technical choices that we're making behind the scenes so that the user, those on our lab print right, because they can really focus on the insights that they're getting and really making decisions based on that inside and not get caught into how to bend all of these different pieces so that they can support both of these work clothes. The other thing is that you know a lot off the time that has Bean spent an I T. Has Bean to figure out all of this so that the CEO can support the line of business like the CMO now by, you know, Adobe taking. Get off this all this. It's heavy lifting. That idea had to do. I think that, you know it will be able to meet the requirements of the line of business much faster. And there's going to be, you know, the agility that is needed to support the business. I think that's really our goal in how we support the CEOs so that they don't worry about all this technology, all the data management, how to collect all this data from globally distributed edges. I mean, that's the partnership that we are, you know, bending with the CEOs so that we help them in their journey off, really helping their line of business deliver the best experiences >> on Jewel. Great to see you having so much fun, Toby. Thank you. What's it like there? Tell us, what's it like working in a job? You got a platform? Certainly. There's a lot of hard problems to solve. So you got that on the engineering side, tell us what the cultures like they're >> doing is a fantastic company. I mean, I just love every bit every every minute that I spend here is fantastic. It's, you know, great people open culture open to new ideas on DH. You know, I guess, uh, >> all the >> creative cloud you know has got the straight of it. Eve itches in fused in people. So it's just it's it's just being a blast and and, you know, people recognize them. Barton's off how data is so critical to delivering those delightful experiences, and it's very rewarding to just see how focused everybody is in the company to really help businesses delight their customers. So it's zygo >> system is great, but the developer ecosystem What's your reaction to that of the >> I mean Adobe Io is I don't know. I feel, you know, Yeah, So that's so if you think of all the creators that work with Adobe products and build their applications, I mean, the ecosystem is very rich. So combined creatives on the data and I t I mean >> so we should call the marketing native like cloud native accomplice of developers, developers. It's coming together >> on DH because >> cats living together I mean, this is >> called wait. Call them that experience maker's late. So we are really bringing experience makers, developers, data, scientists all together >> It's a whole new level for a >> whole new level. It's thanks >> for coming on. Sharing the insights. Cube coverage live here, and it will be some in Las Vegas. I'm John for your jefe. Rick, Stay with us. We're here for two days. We're in day one of wall to wall coverage at Adobe Summit. We write back.

Published Date : Mar 26 2019

SUMMARY :

Adobe Summit twenty nineteen brought to you by Adobe. Great to see you. How did you get this beautiful platform? to really deliver, you know, delightful experiences to their customers. the journey that you guys were put together is difficult. having a common data dictionary so that, you know, things mean the same That's the key in and one of the jokes I left shares, you know, we used to make decisions based on a sampling of something and you know, one of the key things field is that as So when you look at the whole architecture of the platform, you know, reading through the names of the lines. as the more, you know, like analytical So in the cloud game, we used to talk about this when you when you're on the Cuban way, I Yeah, I would say that, you know, as a platform, you have to support She's going to be in the milliseconds How do you guys do that was sauce. And there's going to be, you know, the agility that is needed to support the business. Great to see you having so much fun, Toby. It's, you know, great people you know, people recognize them. I feel, you know, Yeah, so we should call the marketing native like cloud native accomplice of developers, So we are really bringing experience makers, developers, It's thanks Sharing the insights.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

Anjul BhambhriPERSON

0.99+

RickPERSON

0.99+

FBIORGANIZATION

0.99+

AdobeORGANIZATION

0.99+

three yearsQUANTITY

0.99+

two daysQUANTITY

0.99+

Las VegasLOCATION

0.99+

IBMORGANIZATION

0.99+

tenQUANTITY

0.99+

SiriTITLE

0.99+

TenQUANTITY

0.99+

AmazonORGANIZATION

0.99+

LeightonORGANIZATION

0.99+

a dayQUANTITY

0.99+

bothQUANTITY

0.99+

five minutes agoDATE

0.99+

appleORGANIZATION

0.99+

oneQUANTITY

0.98+

Adobe SummitEVENT

0.98+

BartonPERSON

0.98+

TobyPERSON

0.98+

todayDATE

0.98+

first timeQUANTITY

0.98+

Hadoop WorldORGANIZATION

0.98+

two daysQUANTITY

0.97+

singleQUANTITY

0.97+

BeanPERSON

0.97+

Adobe Summit 2019EVENT

0.94+

two different channelsQUANTITY

0.94+

GusPERSON

0.9+

Vice presidentPERSON

0.9+

BhambriPERSON

0.89+

half the viewQUANTITY

0.86+

twenty nineteenQUANTITY

0.83+

MarcusPERSON

0.82+

onceQUANTITY

0.8+

FrickPERSON

0.74+

Adobe Summit twenty nineteenEVENT

0.74+

single versioQUANTITY

0.69+

past few yearsDATE

0.66+

Paula'sORGANIZATION

0.59+

Leighton SeonPERSON

0.57+

EvePERSON

0.53+

EngineeringPERSON

0.53+

CubanLOCATION

0.52+

DeltaORGANIZATION

0.5+

JewelLOCATION

0.5+

CubeTITLE

0.4+

MelanderORGANIZATION

0.38+

Jim Franklin & Anant Chintamaneni | theCUBE NYC 2018


 

>> Live from New York. It's theCUBE. Covering theCUBE New York City, 2018. Brought to you by SiliconANGLE Media, and it's ecosystem partners. >> I'm John Furrier with Peter Burris, our next two guests are Jim Franklin with Dell EMC Director of Product Management Anant Chintamaneni, who is the Vice President of Products at BlueData. Welcome to theCUBE, good to see you. >> Thanks, John. >> Thank you. >> Thanks for coming on. >> I've been following BlueData since the founding. Great company, and the founders are great. Great teams, so thanks for coming on and sharing what's going on, I appreciate it. >> It's a pleasure, thanks for the opportunity. >> So Jim, talk about the Dell relationship with BlueData. What are you guys doing? You have the Dell-ready solutions. How is that related now, because you've seen this industry with us over the years morph. It's really now about, the set-up days are over, it's about proof points. >> That's right. >> AI and machine learning are driving the signal, which is saying, 'We need results'. There's action on the developer's side, there's action on the deployment, people want ROI, that's the main focus. >> That's right. That's right, and we've seen this journey happen from the new batch processing days, and we're seeing that customer base mature and come along, so the reason why we partnered with BlueData is, you have to have those softwares, you have to have the contenders. They have to have the algorithms, and things like that, in order to make this real. So it's been a great partnership with BlueData, it's dated back actually a little farther back than some may realize, all the way to 2015, believe it or not, when we used to incorporate BlueData with Isilon. So it's been actually a pretty positive partnership. >> Now we've talked with you guys in the past, you guys were on the cutting edge, this was back when Docker containers were fashionable, but now containers have become so proliferated out there, it's not just Docker, containerization has been the wave. Now, Kubernetes on top of it is really bringing in the orchestration. This is really making the storage and the network so much more valuable with workloads, whether respective workloads, and AI is a part of that. How do you guys navigate those waters now? What's the BlueData update, how are you guys taking advantage of that big wave? >> I think, great observation, re-embrace Docker containers, even before actually Docker was even formed as a company by that time, and Kubernetes was just getting launched, so we saw the value of Docker containers very early on, in terms of being able to obviously provide the agility, elasticity, but also, from a packaging of applications perspective, as we all know it's a very dynamic environment, and today, I think we are very happy to know that, with Kubernetes being a household name now, especially a tech company, so the way we're navigating this is, we have a turnkey product, which has containerization, and then now we are taking our value proposition of big data and AI and lifecycle management and bringing it to Kubernetes with an open source project that we launched called Cube Director under our umbrella. So, we're all about bringing stateful applications like Hadoop, AI, ML to the community and to our customer base, which is some of the largest financial services in health care customers. >> So the container revolution has certainly groped developers, and developers have always had a history of chasing after the next cool technology, and for good reason, it's not like just chasing after... Developers tend not to just chase after the shiny thing, they chased after the most productive thing, and they start using it, and they start learning about it, and they make themselves valuable, and they build more valuable applications as a result. But there's this interesting meshing of creators, makers, in the software world, between the development community and the data science community. How are data scientists, who you must be spending a fair amount of time with, starting to adopt containers, what are they looking at? Are they even aware of this, as you try to help these communities come together? >> We absolutely talk to the data scientists and they're the drivers of determining what applications they want to consume for the different news cases. But, at the end of the day, the person who has to deliver these applications, you know data scientists care about time to value, getting the environment quickly all prepared so they can access the right data sets. So, in many ways, most of our customers, many of them are unaware that there's actually containers under the hood. >> So this is the data scientists. >> The data scientists, but the actual administrators and the system administrators were making these tools available, are using containers as a way to accelerate the way they package the software, which has a whole bunch of dependent libraries, and there's a lot of complexity our there. So they're simplifying all that and providing the environment as quickly as possible. >> And in so doing, making sure that whatever workloads are put together, can scaled, can be combined differently and recombined differently, based on requirements of the data scientists. So the data scientist sees the tool... >> Yeah. >> The tool is manifest as, in concert with some of these new container related technologies, and then the whole CICD process supports the data scientist >> The other thing to think about though, is that this also allows freedom of choice, and we were discussing off camera before, these developers want to pick out what they want to pick out what they want to work with, they don't want to have to be locked in. So with containers, you can also speed that deployment but give them freedom to choose the tools that make them best productive. That'll make them much happier, and probably much more efficient. >> So there's a separation under the data science tools, and the developer tools, but they end up all supporting the same basic objective. So how does the infrastructure play in this, because the challenge of big data for the last five years as John and I both know, is that a lot of people conflated. The outcome of data science, the outcome of big data, with the process of standing up clusters, and lining up Hadoop, and if they failed on the infrastructure, they said it was a failure overall. So how you making the infrastructure really simple, and line up with this time of value? >> Well, the reality is, we all need food and water. IT still needs server and storage in order to work. But at the end of the day, the abstraction has to be there just like VMware in the early days, clouds, containers with BlueData is just another way to create a layer of abstraction. But this one is in the context of what the data scientist is trying to get done, and that's the key to why we partnered with BlueData and why we delivered big data as a service. >> So at that point, what's the update from Dell EMC and Dell, in particular, Analytics? Obviously you guys work with a lot of customers, have challenges, how are you solving those problems? What are those problems? Because we know there's some AI rumors, big Dell event coming up, there's rumors of a lot of AI involved, I'm speculating there's going to be probably a new kind of hardware device and software. What's the state of the analytics today? >> I think a lot of the customers we talked about, they were born in that batch processing, that Hadoop space we just talked about. I think they largely got that right, they've largely got that figured out, but now we're seeing proliferation of AI tools, proliferation of sandbox environments, and you're psyched to see a little bit of silo behavior happening, so what we're trying to do is that IT shop is trying to dispatch those environments, dispatch with some speed, with some agility. They want to have it at the right economic model as well, so we're trying to strike a better balance, say 'Hey, I've invested in all this infrastructure already, I need to modernize it, and that I also need to offer it up in a way that data scientists can consume it'. Oh, by the way, we're starting to see them start to hire more and more of these data scientists. Well, you don't want your data scientists, this very expensive, intelligent resource, sitting there doing data mining, data cleansing, detail offloads, we want them actually doing modeling and analytics. So we find that a lot of times right now as you're doing an operational change, the operational mindset as you're starting to hire these very expensive people to do this very good work, at the corest of the data, but they need to get productive in the way that you hired them to be productive. >> So what is this ready solution, can you just explain what that is? Is it a program, is it a hardware, is it a solution? What is the ready solution? >> Generally speaking, what we do as a division is we look for value workloads, just generally speaking, not necessarily in batch processing, or AI, or applications, and we try and create an environment that solves that customer challenge, typically they're very complex, SAP, Oracle Database, it's AI, my goodness. Very difficult. >> Variety of tools, using hives, no sequel, all this stuff's going on. >> Cassandra, you've got Tensorflow, so we try fit together a set of knowledge experts, that's the key, the intellectual property of our engineers, and their deep knowledge expertise in a certain area. So for AI, we have a sight of them back at the shop, they're in the lab, and this is what they do, and they're serving up these models, they're putting data through its paces, they're doing the work of a data scientist. They are data scientists. >> And so this is where BlueData comes in. You guys are part of this abstraction layer in the ready solutions. Offering? Is that how it works? >> Yeah, we are the software that enables the self-service experience, the multitenancy, that the consumers of the ready solution would want in terms of being able to onboard multiple different groups of users, lines of business, so you could have a user that wants to run basic spark, cluster, spark jobs, or you could have another user group that's using Tensorflow, or accelerated by a special type of CPU or GPU, and so you can have them all on the same infrastructure. >> One of the things Peter and I were talking about, Dave Vellante, who was here, he's at another event right now getting some content but, one of the things we observed was, we saw this awhile ago so it's not new to us but certainly we're seeing the impact at this event. Hadoop World, there's now called Strata Data NYC, is that we hear words like Kubernetes, and Multi Cloud, and Istio for the first time. At this event. This is the impact of the Cloud. The Cloud has essentially leveled the Hadoop World, certainly there's some Hadoop activity going on there, people have clusters, there's standing up infrastructure for analytical infrastructures that do analytics, obviously AI drives that, but now you have the Cloud being a power base. Changing that analytics infrastructure. How has it impacted you guys? BlueData, how are you guys impacted by the Cloud? Tailwind for you guys? Helpful? Good? >> You described it well, it is a tailwind. This space is about the data, not where the data lives necessarily, but the robustness of the data. So whether that's in the Cloud, whether that's on Premise, whether that's on Premise in your own private Cloud, I think anywhere where there's data that can be gathered, modeled, and new insights being pulled out of, this is wonderful, so as we ditched data, whether it's born in the Cloud or born on Premise, this is actually an accelerant to the solutions that we built together. >> As BlueData, we're all in on the Cloud, we support all the three major Cloud providers that was the big announcement that we made this week, we're generally available for AWS, GCP, and Azure, and, in particular, we start with customers who weren't born in the Cloud, so we're talking about some of the large financial services >> We had Barclays UK here who we nominated, they won the Cloud Era Data Impact Award, and what they're actually going through right now, is they started on Prem, they have these really packaged certified technology stacks, whether they are Cloud Era Hadoop, whether they are Anaconda for data science, and what they're trying to do right now is, they're obviously getting value from that on Premise with BlueData, and now they want to leverage the Cloud. They want to be able to extend into the Cloud. So, we as a company have made our product a hybrid Cloud-ready platform, so it can span on Prem as well as multiple Clouds, and you have the ability to move the workloads from one to the other, depending on data gravity, SLA considerations. >> Compliancy. >> I think it's one more thing, I want to test this with you guys, John, and that is, analytics is, I don't want to call it inert, or passive, but analytics has always been about getting the right data to human beings so they can make decisions, and now we're seeing, because of AI, the distinction that we draw between analytics and AI is, AI is about taking action on the data, it's about having a consequential action, as a result of the data, so in many respects, NCL, Kubernetes, a lot of these are not only do some interesting things for the infrastructure associated with big data, but they also facilitate the incorporation of new causes of applications, that act on behalf of the brand. >> Here's the other thing I'll add to it, there's a time element here. It used to be we were passive, and it was in the past, and you're trying to project forward, that's no longer the case. You can do it right now. Exactly. >> In many respects, the history of the computing industry can be drawn in this way, you focused on the past, and then with spreadsheets in the 80s and personal computing, you focused on getting everybody to agree on the future, and now, it's about getting action to happen right now. >> At the moment it happens. >> And that's why there's so much action. We're passed the set-up phase, and I think this is why we're hearing, seeing machine learning being so popular because it's like, people want to take action there's a demand, that's a signal that it's time to show where the ROI is and get action done. Clearly we see that. >> We're capitalists, right? We're all trying to figure out how to make money in these spaces. >> Certainly there's a lot of movement, and Cloud has proven that spinning up an instance concept has been a great thing, and certainly analytics. It's okay to have these workloads, but how do you tie it together? So, I want to ask you, because you guys have been involved in containers, Cloud has certainly been a tailwind, we agree with you 100 percent on that. What is the relevance of Kubernetes and Istio? You're starting to see these new trends. Kubernetes, Istio, Cupflow. Higher level microservices with all kinds of stateful and stateless dynamics. I call it API 2.0, it's a whole other generation of abstractions that are going on, that are creating some goodness for people. What is the impact, in your opinion, of Kubernetes and this new revolution? >> I think the impact of Kubernetes is, I just gave a talk here yesterday, called Hadoop-la About Kubernetes. We were thinking very deeply about this. We're thinking deeply about this. So I think Kubernetes, if you look at the genesis, it's all about stateless applications, and I think as new applications are being written folks are thinking about writing them in a manner that are decomposed, stateless, microservices, things like Cupflow. When you write it like that, Kubernetes fits in very well, and you get all the benefits of auto-scaling, and so control a pattern, and ultimately Kubernetes is this finite state machine-type model where you describe what the state should be, and it will work and crank towards making it towards that state. I think it's a little bit harder for stateful applications, and I think that's where we believe that the Kubernetes community has to do a lot more work, and folks like BlueData are going to contribute to that work which is, how do you bring stateful applications like Hadoop where there's a lot of interdependent services, they're not necessarily microservices, they're actually almost close to monolithic applications. So I think new applications, new AI ML tooling that's going to come out, they're going to be very conscious of how they're running in a Cloud world today that folks weren't aware of seven or eight years ago, so it's really going to make a huge difference. And I think things like Istio are going to make a huge difference because you can start in the cloud and maybe now expand on to Prem. So there's going to be some interesting dynamics. >> Without hopping management frameworks, absolutely. >> And this is really critical, you just nailed it. Stateful is where ML will shine, if you can then cross the chasma to the on Premise where the workloads can have state sharing. >> Right. >> Scales beautifully. It's a whole other level. >> Right. You're going to the data into the action, or the activity, you're going to have to move the processing to the data, and you want to have nonetheless, a common, seamless management development framework so that you have the choices about where you do those things. >> Absolutely. >> Great stuff. We can do a whole Cube segment just on that. We love talking about these new dynamics going on. We'll see you in CF CupCon coming up in Seattle. Great to have you guys on. Thanks, and congratulations on the relationship between BlueData and Dell EMC and Ready Solutions. This is Cube, with the Ready Solutions here. New York City, talking about big data and the impact, the future of AI, all things stateful, stateless, Cloud and all. It's theCUBE bringing you all the action. Stay with us for more after this short break.

Published Date : Sep 13 2018

SUMMARY :

Brought to you by SiliconANGLE Media, Welcome to theCUBE, good to see you. Great company, and the founders are great. So Jim, talk about the Dell relationship with BlueData. AI and machine learning are driving the signal, so the reason why we partnered with BlueData is, What's the BlueData update, how are you guys and bringing it to Kubernetes with an open source project and the data science community. But, at the end of the day, the person who has to deliver and the system administrators So the data scientist sees the tool... So with containers, you can also speed that deployment So how does the infrastructure play in this, But at the end of the day, the abstraction has to be there What's the state of the analytics today? in the way that you hired them to be productive. and we try and create an environment that all this stuff's going on. that's the key, the intellectual property of our engineers, in the ready solutions. and so you can have them all on the same infrastructure. Kubernetes, and Multi Cloud, and Istio for the first time. but the robustness of the data. and you have the ability to move the workloads I want to test this with you guys, John, Here's the other thing I'll add to it, and personal computing, you focused on getting everybody to We're passed the set-up phase, and I think this is why how to make money in these spaces. we agree with you 100 percent on that. the Kubernetes community has to do a lot more work, And this is really critical, you just nailed it. It's a whole other level. so that you have the choices and the impact, the future of AI,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Anant ChintamaneniPERSON

0.99+

Peter BurrisPERSON

0.99+

Jim FranklinPERSON

0.99+

JohnPERSON

0.99+

BlueDataORGANIZATION

0.99+

DellORGANIZATION

0.99+

PeterPERSON

0.99+

JimPERSON

0.99+

2015DATE

0.99+

New YorkLOCATION

0.99+

100 percentQUANTITY

0.99+

John FurrierPERSON

0.99+

New York CityLOCATION

0.99+

Ready SolutionsORGANIZATION

0.99+

SeattleLOCATION

0.99+

yesterdayDATE

0.99+

Dell EMCORGANIZATION

0.99+

Barclays UKORGANIZATION

0.99+

first timeQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

todayDATE

0.99+

OneQUANTITY

0.98+

bothQUANTITY

0.98+

AWSORGANIZATION

0.98+

this weekDATE

0.97+

CF CupConEVENT

0.97+

oneQUANTITY

0.97+

CassandraPERSON

0.97+

sevenDATE

0.96+

two guestsQUANTITY

0.96+

IsilonORGANIZATION

0.96+

80sDATE

0.96+

NCLORGANIZATION

0.96+

SAPORGANIZATION

0.95+

API 2.0OTHER

0.92+

AnacondaORGANIZATION

0.92+

Cloud Era HadoopTITLE

0.91+

NYCLOCATION

0.91+

HadoopTITLE

0.91+

eight years agoDATE

0.91+

PremORGANIZATION

0.9+

CupflowTITLE

0.89+

PremiseTITLE

0.89+

KubernetesTITLE

0.88+

one more thingQUANTITY

0.88+

IstioORGANIZATION

0.87+

DockerTITLE

0.85+

DockerORGANIZATION

0.85+

CupflowORGANIZATION

0.84+

CubeORGANIZATION

0.83+

last five yearsDATE

0.82+

CloudTITLE

0.8+

KubernetesORGANIZATION

0.8+

Oracle DatabaseORGANIZATION

0.79+

2018DATE

0.79+

CloudsTITLE

0.78+

GCPORGANIZATION

0.77+

theCUBEORGANIZATION

0.76+

Cloud Era Data Impact AwardEVENT

0.74+

CubePERSON

0.73+

Influencer Panel | theCUBE NYC 2018


 

- [Announcer] Live, from New York, it's theCUBE. Covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media, and its ecosystem partners. - Hello everyone, welcome back to CUBE NYC. This is a CUBE special presentation of something that we've done now for the past couple of years. IBM has sponsored an influencer panel on some of the hottest topics in the industry, and of course, there's no hotter topic right now than AI. So, we've got nine of the top influencers in the AI space, and we're in Hell's Kitchen, and it's going to get hot in here. (laughing) And these guys, we're going to cover the gamut. So, first of all, folks, thanks so much for joining us today, really, as John said earlier, we love the collaboration with you all, and we'll definitely see you on social after the fact. I'm Dave Vellante, with my cohost for this session, Peter Burris, and again, thank you to IBM for sponsoring this and organizing this. IBM has a big event down here, in conjunction with Strata, called Change the Game, Winning with AI. We run theCUBE NYC, we've been here all week. So, here's the format. I'm going to kick it off, and then we'll see where it goes. So, I'm going to introduce each of the panelists, and then ask you guys to answer a question, I'm sorry, first, tell us a little bit about yourself, briefly, and then answer one of the following questions. Two big themes that have come up this week. One has been, because this is our ninth year covering what used to be Hadoop World, which kind of morphed into big data. Question is, AI, big data, same wine, new bottle? Or is it really substantive, and driving business value? So, that's one question to ponder. The other one is, you've heard the term, the phrase, data is the new oil. Is data really the new oil? Wonder what you think about that? Okay, so, Chris Penn, let's start with you. Chris is cofounder of Trust Insight, long time CUBE alum, and friend. Thanks for coming on. Tell us a little bit about yourself, and then pick one of those questions. - Sure, we're a data science consulting firm. We're an IBM business partner. When it comes to "data is the new oil," I love that expression because it's completely accurate. Crude oil is useless, you have to extract it out of the ground, refine it, and then bring it to distribution. Data is the same way, where you have to have developers and data architects get the data out. You need data scientists and tools, like Watson Studio, to refine it, and then you need to put it into production, and that's where marketing technologists, technologists, business analytics folks, and tools like Watson Machine Learning help bring the data and make it useful. - Okay, great, thank you. Tony Flath is a tech and media consultant, focus on cloud and cyber security, welcome. - Thank you. - Tell us a little bit about yourself and your thoughts on one of those questions. - Sure thing, well, thanks so much for having us on this show, really appreciate it. My background is in cloud, cyber security, and certainly in emerging tech with artificial intelligence. Certainly touched it from a cyber security play, how you can use machine learning, machine control, for better controlling security across the gamut. But I'll touch on your question about wine, is it a new bottle, new wine? Where does this come from, from artificial intelligence? And I really see it as a whole new wine that is coming along. When you look at emerging technology, and you look at all the deep learning that's happening, it's going just beyond being able to machine learn and know what's happening, it's making some meaning to that data. And things are being done with that data, from robotics, from automation, from all kinds of different things, where we're at a point in society where data, our technology is getting beyond us. Prior to this, it's always been command and control. You control data from a keyboard. Well, this is passing us. So, my passion and perspective on this is, the humanization of it, of IT. How do you ensure that people are in that process, right? - Excellent, and we're going to come back and talk about that. - Thanks so much. - Carla Gentry, @DataNerd? Great to see you live, as opposed to just in the ether on Twitter. Data scientist, and owner of Analytical Solution. Welcome, your thoughts? - Thank you for having us. Mine is, is data the new oil? And I'd like to rephrase that is, data equals human lives. So, with all the other artificial intelligence and everything that's going on, and all the algorithms and models that's being created, we have to think about things being biased, being fair, and understand that this data has impacts on people's lives. - Great. Steve Ardire, my paisan. - Paisan. - AI startup adviser, welcome, thanks for coming to theCUBE. - Thanks Dave. So, uh, my first career was geology, and I view AI as the new oil, but data is the new oil, but AI is the refinery. I've used that many times before. In fact, really, I've moved from just AI to augmented intelligence. So, augmented intelligence is really the way forward. This was a presentation I gave at IBM Think last spring, has almost 100,000 impressions right now, and the fundamental reason why is machines can attend to vastly more information than humans, but you still need humans in the loop, and we can talk about what they're bringing in terms of common sense reasoning, because big data does the who, what, when, and where, but not the why, and why is really the Holy Grail for causal analysis and reasoning. - Excellent, Bob Hayes, Business Over Broadway, welcome, great to see you again. - Thanks for having me. So, my background is in psychology, industrial psychology, and I'm interested in things like customer experience, data science, machine learning, so forth. And I'll answer the question around big data versus AI. And I think there's other terms we could talk about, big data, data science, machine learning, AI. And to me, it's kind of all the same. It's always been about analytics, and getting value from your data, big, small, what have you. And there's subtle differences among those terms. Machine learning is just about making a prediction, and knowing if things are classified correctly. Data science is more about understanding why things work, and understanding maybe the ethics behind it, what variables are predicting that outcome. But still, it's all the same thing, it's all about using data in a way that we can get value from that, as a society, in residences. - Excellent, thank you. Theo Lau, founder of Unconventional Ventures. What's your story? - Yeah, so, my background is driving technology innovation. So, together with my partner, what our work does is we work with organizations to try to help them leverage technology to drive systematic financial wellness. We connect founders, startup founders, with funders, we help them get money in the ecosystem. We also work with them to look at, how do we leverage emerging technology to do something good for the society. So, very much on point to what Bob was saying about. So when I look at AI, it is not new, right, it's been around for quite a while. But what's different is the amount of technological power that we have allow us to do so much more than what we were able to do before. And so, what my mantra is, great ideas can come from anywhere in the society, but it's our job to be able to leverage technology to shine a spotlight on people who can use this to do something different, to help seniors in our country to do better in their financial planning. - Okay, so, in your mind, it's not just a same wine, new bottle, it's more substantive than that. - [Theo] It's more substantive, it's a much better bottle. - Karen Lopez, senior project manager for Architect InfoAdvisors, welcome. - Thank you. So, I'm DataChick on twitter, and so that kind of tells my focus is that I'm here, I also call myself a data evangelist, and that means I'm there at organizations helping stand up for the data, because to me, that's the proxy for standing up for the people, and the places and the events that that data describes. That means I have a focus on security, data privacy and protection as well. And I'm going to kind of combine your two questions about whether data is the new wine bottle, I think is the combination. Oh, see, now I'm talking about alcohol. (laughing) But anyway, you know, all analogies are imperfect, so whether we say it's the new wine, or, you know, same wine, or whether it's oil, is that the analogy's good for both of them, but unlike oil, the amount of data's just growing like crazy, and the oil, we know at some point, I kind of doubt that we're going to hit peak data where we have not enough data, like we're going to do with oil. But that says to me that, how did we get here with big data, with machine learning and AI? And from my point of view, as someone who's been focused on data for 35 years, we have hit this perfect storm of open source technologies, cloud architectures and cloud services, data innovation, that if we didn't have those, we wouldn't be talking about large machine learning and deep learning-type things. So, because we have all these things coming together at the same time, we're now at explosions of data, which means we also have to protect them, and protect the people from doing harm with data, we need to do data for good things, and all of that. - Great, definite differences, we're not running out of data, data's like the terrible tribbles. (laughing) - Yes, but it's very cuddly, data is. - Yeah, cuddly data. Mark Lynd, founder of Relevant Track? - That's right. - I like the name. What's your story? - Well, thank you, and it actually plays into what my interest is. It's mainly around AI in enterprise operations and cyber security. You know, these teams that are in enterprise operations both, it can be sales, marketing, all the way through the organization, as well as cyber security, they're often under-sourced. And they need, what Steve pointed out, they need augmented intelligence, they need to take AI, the big data, all the information they have, and make use of that in a way where they're able to, even though they're under-sourced, make some use and some value for the organization, you know, make better use of the resources they have to grow and support the strategic goals of the organization. And oftentimes, when you get to budgeting, it doesn't really align, you know, you're short people, you're short time, but the data continues to grow, as Karen pointed out. So, when you take those together, using AI to augment, provided augmented intelligence, to help them get through that data, make real tangible decisions based on information versus just raw data, especially around cyber security, which is a big hit right now, is really a great place to be, and there's a lot of stuff going on, and a lot of exciting stuff in that area. - Great, thank you. Kevin L. Jackson, author and founder of GovCloud. GovCloud, that's big. - Yeah, GovCloud Network. Thank you very much for having me on the show. Up and working on cloud computing, initially in the federal government, with the intelligence community, as they adopted cloud computing for a lot of the nation's major missions. And what has happened is now I'm working a lot with commercial organizations and with the security of that data. And I'm going to sort of, on your questions, piggyback on Karen. There was a time when you would get a couple of bottles of wine, and they would come in, and you would savor that wine, and sip it, and it would take a few days to get through it, and you would enjoy it. The problem now is that you don't get a couple of bottles of wine into your house, you get two or three tankers of data. So, it's not that it's a new wine, you're just getting a lot of it. And the infrastructures that you need, before you could have a couple of computers, and a couple of people, now you need cloud, you need automated infrastructures, you need huge capabilities, and artificial intelligence and AI, it's what we can use as the tool on top of these huge infrastructures to drink that, you know. - Fire hose of wine. - Fire hose of wine. (laughs) - Everybody's having a good time. - Everybody's having a great time. (laughs) - Yeah, things are booming right now. Excellent, well, thank you all for those intros. Peter, I want to ask you a question. So, I heard there's some similarities and some definite differences with regard to data being the new oil. You have a perspective on this, and I wonder if you could inject it into the conversation. - Sure, so, the perspective that we take in a lot of conversations, a lot of folks here in theCUBE, what we've learned, and I'll kind of answer both questions a little bit. First off, on the question of data as the new oil, we definitely think that data is the new asset that business is going to be built on, in fact, our perspective is that there really is a difference between business and digital business, and that difference is data as an asset. And if you want to understand data transformation, you understand the degree to which businesses reinstitutionalizing work, reorganizing its people, reestablishing its mission around what you can do with data as an asset. The difference between data and oil is that oil still follows the economics of scarcity. Data is one of those things, you can copy it, you can share it, you can easily corrupt it, you can mess it up, you can do all kinds of awful things with it if you're not careful. And it's that core fundamental proposition that as an asset, when we think about cyber security, we think, in many respects, that is the approach to how we can go about privatizing data so that we can predict who's actually going to be able to appropriate returns on it. So, it's a good analogy, but as you said, it's not entirely perfect, but it's not perfect in a really fundamental way. It's not following the laws of scarcity, and that has an enormous effect. - In other words, I could put oil in my car, or I could put oil in my house, but I can't put the same oil in both. - Can't put it in both places. And now, the issue of the wine, I think it's, we think that it is, in fact, it is a new wine, and very simple abstraction, or generalization we come up with is the issue of agency. That analytics has historically not taken on agency, it hasn't acted on behalf of the brand. AI is going to act on behalf of the brand. Now, you're going to need both of them, you can't separate them. - A lot of implications there in terms of bias. - Absolutely. - In terms of privacy. You have a thought, here, Chris? - Well, the scarcity is our compute power, and our ability for us to process it. I mean, it's the same as oil, there's a ton of oil under the ground, right, we can't get to it as efficiently, or without severe environmental consequences to use it. Yeah, when you use it, it's transformed, but our scarcity is compute power, and our ability to use it intelligently. - Or even when you find it. I have data, I can apply it to six different applications, I have oil, I can apply it to one, and that's going to matter in how we think about work. - But one thing I'd like to add, sort of, you're talking about data as an asset. The issue we're having right now is we're trying to learn how to manage that asset. Artificial intelligence is a way of managing that asset, and that's important if you're going to use and leverage big data. - Yeah, but see, everybody's talking about the quantity, the quantity, it's not always the quantity. You know, we can have just oodles and oodles of data, but if it's not clean data, if it's not alphanumeric data, which is what's needed for machine learning. So, having lots of data is great, but you have to think about the signal versus the noise. So, sometimes you get so much data, you're looking at over-fitting, sometimes you get so much data, you're looking at biases within the data. So, it's not the amount of data, it's the, now that we have all of this data, making sure that we look at relevant data, to make sure we look at clean data. - One more thought, and we have a lot to cover, I want to get inside your big brain. - I was just thinking about it from a cyber security perspective, one of my customers, they were looking at the data that just comes from the perimeter, your firewalls, routers, all of that, and then not even looking internally, just the perimeter alone, and the amount of data being pulled off of those. And then trying to correlate that data so it makes some type of business sense, or they can determine if there's incidents that may happen, and take a predictive action, or threats that might be there because they haven't taken a certain action prior, it's overwhelming to them. So, having AI now, to be able to go through the logs to look at, and there's so many different types of data that come to those logs, but being able to pull that information, as well as looking at end points, and all that, and people's houses, which are an extension of the network oftentimes, it's an amazing amount of data, and they're only looking at a small portion today because they know, there's not enough resources, there's not enough trained people to do all that work. So, AI is doing a wonderful way of doing that. And some of the tools now are starting to mature and be sophisticated enough where they provide that augmented intelligence that Steve talked about earlier. - So, it's complicated. There's infrastructure, there's security, there's a lot of software, there's skills, and on and on. At IBM Think this year, Ginni Rometty talked about, there were a couple of themes, one was augmented intelligence, that was something that was clear. She also talked a lot about privacy, and you own your data, etc. One of the things that struck me was her discussion about incumbent disruptors. So, if you look at the top five companies, roughly, Facebook with fake news has dropped down a little bit, but top five companies in terms of market cap in the US. They're data companies, all right. Apple just hit a trillion, Amazon, Google, etc. How do those incumbents close the gap? Is that concept of incumbent disruptors actually something that is being put into practice? I mean, you guys work with a lot of practitioners. How are they going to close that gap with the data haves, meaning data at their core of their business, versus the data have-nots, it's not that they don't have a lot of data, but it's in silos, it's hard to get to? - Yeah, I got one more thing, so, you know, these companies, and whoever's going to be big next is, you have a digital persona, whether you want it or not. So, if you live in a farm out in the middle of Oklahoma, you still have a digital persona, people are collecting data on you, they're putting profiles of you, and the big companies know about you, and people that first interact with you, they're going to know that you have this digital persona. Personal AI, when AI from these companies could be used simply and easily, from a personal deal, to fill in those gaps, and to have a digital persona that supports your family, your growth, both personal and professional growth, and those type of things, there's a lot of applications for AI on a personal, enterprise, even small business, that have not been done yet, but the data is being collected now. So, you talk about the oil, the oil is being built right now, lots, and lots, and lots of it. It's the applications to use that, and turn that into something personally, professionally, educationally, powerful, that's what's missing. But it's coming. - Thank you, so, I'll add to that, and in answer to your question you raised. So, one example we always used in banking is, if you look at the big banks, right, and then you look at from a consumer perspective, and there's a lot of talk about Amazon being a bank. But the thing is, Amazon doesn't need to be a bank, they provide banking services, from a consumer perspective they don't really care if you're a bank or you're not a bank, but what's different between Amazon and some of the banks is that Amazon, like you say, has a lot of data, and they know how to make use of the data to offer something as relevant that consumers want. Whereas banks, they have a lot of data, but they're all silos, right. So, it's not just a matter of whether or not you have the data, it's also, can you actually access it and make something useful out of it so that you can create something that consumers want? Because otherwise, you're just a pipe. - Totally agree, like, when you look at it from a perspective of, there's a lot of terms out there, digital transformation is thrown out so much, right, and go to cloud, and you migrate to cloud, and you're going to take everything over, but really, when you look at it, and you both touched on it, it's the economics. You have to look at the data from an economics perspective, and how do you make some kind of way to take this data meaningful to your customers, that's going to work effectively for them, that they're going to drive? So, when you look at the big, big cloud providers, I think the push in things that's going to happen in the next few years is there's just going to be a bigger migration to public cloud. So then, between those, they have to differentiate themselves. Obvious is artificial intelligence, in a way that makes it easy to aggregate data from across platforms, to aggregate data from multi-cloud, effectively. To use that data in a meaningful way that's going to drive, not only better decisions for your business, and better outcomes, but drives our opportunities for customers, drives opportunities for employees and how they work. We're at a really interesting point in technology where we get to tell technology what to do. It's going beyond us, it's no longer what we're telling it to do, it's going to go beyond us. So, how we effectively manage that is going to be where we see that data flow, and those big five or big four, really take that to the next level. - Now, one of the things that Ginni Rometty said was, I forget the exact step, but it was like, 80% of the data, is not searchable. Kind of implying that it's sitting somewhere behind a firewall, presumably on somebody's premises. So, it was kind of interesting. You're talking about, certainly, a lot of momentum for public cloud, but at the same time, a lot of data is going to stay where it is. - Yeah, we're assuming that a lot of this data is just sitting there, available and ready, and we look at the desperate, or disparate kind of database situation, where you have 29 databases, and two of them have unique quantifiers that tie together, and the rest of them don't. So, there's nothing that you can do with that data. So, artificial intelligence is just that, it's artificial intelligence, so, they know, that's machine learning, that's natural language, that's classification, there's a lot of different parts of that that are moving, but we also have to have IT, good data infrastructure, master data management, compliance, there's so many moving parts to this, that it's not just about the data anymore. - I want to ask Steve to chime in here, go ahead. - Yeah, so, we also have to change the mentality that it's not just enterprise data. There's data on the web, the biggest thing is Internet of Things, the amount of sensor data will make the current data look like chump change. So, data is moving faster, okay. And this is where the sophistication of machine learning needs to kick in, going from just mostly supervised-learning today, to unsupervised learning. And in order to really get into, as I said, big data, and credible AI does the who, what, where, when, and how, but not the why. And this is really the Holy Grail to crack, and it's actually under a new moniker, it's called explainable AI, because it moves beyond just correlation into root cause analysis. Once we have that, then you have the means to be able to tap into augmented intelligence, where humans are working with the machines. - Karen, please. - Yeah, so, one of the things, like what Carla was saying, and what a lot of us had said, I like to think of the advent of ML technologies and AI are going to help me as a data architect to love my data better, right? So, that includes protecting it, but also, when you say that 80% of the data is unsearchable, it's not just an access problem, it's that no one knows what it was, what the sovereignty was, what the metadata was, what the quality was, or why there's huge anomalies in it. So, my favorite story about this is, in the 1980s, about, I forget the exact number, but like, 8 million children disappeared out of the US in April, at April 15th. And that was when the IRS enacted a rule that, in order to have a dependent, a deduction for a dependent on your tax returns, they had to have a valid social security number, and people who had accidentally miscounted their children and over-claimed them, (laughter) over the years them, stopped doing that. Well, some days it does feel like you have eight children running around. (laughter) - Agreed. - When, when that rule came about, literally, and they're not all children, because they're dependents, but literally millions of children disappeared off the face of the earth in April, but if you were doing analytics, or AI and ML, and you don't know that this anomaly happened, I can imagine in a hundred years, someone is saying some catastrophic event happened in April, 1983. (laughter) And what caused that, was it healthcare? Was it a meteor? Was it the clown attacking them? - That's where I was going. - Right. So, those are really important things that I want to use AI and ML to help me, not only document and capture that stuff, but to provide that information to the people, the data scientists and the analysts that are using the data. - Great story, thank you. Bob, you got a thought? You got the mic, go, jump in here. - Well, yeah, I do have a thought, actually. I was talking about, what Karen was talking about. I think it's really important that, not only that we understand AI, and machine learning, and data science, but that the regular folks and companies understand that, at the basic level. Because those are the people who will ask the questions, or who know what questions to ask of the data. And if they don't have the tools, and the knowledge of how to get access to that data, or even how to pose a question, then that data is going to be less valuable, I think, to companies. And the more that everybody knows about data, even people in congress. Remember when Zuckerberg talked about? (laughter) - That was scary. - How do you make money? It's like, we all know this. But, we need to educate the masses on just basic data analytics. - We could have an hour-long panel on that. - Yeah, absolutely. - Peter, you and I were talking about, we had a couple of questions, sort of, how far can we take artificial intelligence? How far should we? You know, so that brings in to the conversation of ethics, and bias, why don't you pick it up? - Yeah, so, one of the crucial things that we all are implying is that, at some point in time, AI is going to become a feature of the operations of our homes, our businesses. And as these technologies get more powerful, and they diffuse, and know about how to use them, diffuses more broadly, and you put more options into the hands of more people, the question slowly starts to turn from can we do it, to should we do it? And, one of the issues that I introduce is that I think the difference between big data and AI, specifically, is this notion of agency. The AI will act on behalf of, perhaps you, or it will act on behalf of your business. And that conversation is not being had, today. It's being had in arguments between Elon Musk and Mark Zuckerberg, which pretty quickly get pretty boring. (laughing) At the end of the day, the real question is, should this machine, whether in concert with others, or not, be acting on behalf of me, on behalf of my business, or, and when I say on behalf of me, I'm also talking about privacy. Because Facebook is acting on behalf of me, it's not just what's going on in my home. So, the question of, can it be done? A lot of things can be done, and an increasing number of things will be able to be done. We got to start having a conversation about should it be done? - So, humans exhibit tribal behavior, they exhibit bias. Their machine's going to pick that up, go ahead, please. - Yeah, one thing that sort of tag onto agency of artificial intelligence. Every industry, every business is now about identifying information and data sources, and their appropriate sinks, and learning how to draw value out of connecting the sources with the sinks. Artificial intelligence enables you to identify those sources and sinks, and when it gets agency, it will be able to make decisions on your behalf about what data is good, what data means, and who it should be. - What actions are good. - Well, what actions are good. - And what data was used to make those actions. - Absolutely. - And was that the right data, and is there bias of data? And all the way down, all the turtles down. - So, all this, the data pedigree will be driven by the agency of artificial intelligence, and this is a big issue. - It's really fundamental to understand and educate people on, there are four fundamental types of bias, so there's, in machine learning, there's intentional bias, "Hey, we're going to make "the algorithm generate a certain outcome "regardless of what the data says." There's the source of the data itself, historical data that's trained on the models built on flawed data, the model will behave in a flawed way. There's target source, which is, for example, we know that if you pull data from a certain social network, that network itself has an inherent bias. No matter how representative you try to make the data, it's still going to have flaws in it. Or, if you pull healthcare data about, for example, African-Americans from the US healthcare system, because of societal biases, that data will always be flawed. And then there's tool bias, there's limitations to what the tools can do, and so we will intentionally exclude some kinds of data, or not use it because we don't know how to, our tools are not able to, and if we don't teach people what those biases are, they won't know to look for them, and I know. - Yeah, it's like, one of the things that we were talking about before, I mean, artificial intelligence is not going to just create itself, it's lines of code, it's input, and it spits out output. So, if it learns from these learning sets, we don't want AI to become another buzzword. We don't want everybody to be an "AR guru" that has no idea what AI is. It takes months, and months, and months for these machines to learn. These learning sets are so very important, because that input is how this machine, think of it as your child, and that's basically the way artificial intelligence is learning, like your child. You're feeding it these learning sets, and then eventually it will make its own decisions. So, we know from some of us having children that you teach them the best that you can, but then later on, when they're doing their own thing, they're really, it's like a little myna bird, they've heard everything that you've said. (laughing) Not only the things that you said to them directly, but the things that you said indirectly. - Well, there are some very good AI researchers that might disagree with that metaphor, exactly. (laughing) But, having said that, what I think is very interesting about this conversation is that this notion of bias, one of the things that fascinates me about where AI goes, are we going to find a situation where tribalism more deeply infects business? Because we know that human beings do not seek out the best information, they seek out information that reinforces their beliefs. And that happens in business today. My line of business versus your line of business, engineering versus sales, that happens today, but it happens at a planning level, and when we start talking about AI, we have to put the appropriate dampers, understand the biases, so that we don't end up with deep tribalism inside of business. Because AI could have the deleterious effect that it actually starts ripping apart organizations. - Well, input is data, and then the output is, could be a lot of things. - Could be a lot of things. - And that's where I said data equals human lives. So that we look at the case in New York where the penal system was using this artificial intelligence to make choices on people that were released from prison, and they saw that that was a miserable failure, because that people that release actually re-offended, some committed murder and other things. So, I mean, it's, it's more than what anybody really thinks. It's not just, oh, well, we'll just train the machines, and a couple of weeks later they're good, we never have to touch them again. These things have to be continuously tweaked. So, just because you built an algorithm or a model doesn't mean you're done. You got to go back later, and continue to tweak these models. - Mark, you got the mic. - Yeah, no, I think one thing we've talked a lot about the data that's collected, but what about the data that's not collected? Incomplete profiles, incomplete datasets, that's a form of bias, and sometimes that's the worst. Because they'll fill that in, right, and then you can get some bias, but there's also a real issue for that around cyber security. Logs are not always complete, things are not always done, and when things are doing that, people make assumptions based on what they've collected, not what they didn't collect. So, when they're looking at this, and they're using the AI on it, that's only on the data collected, not on that that wasn't collected. So, if something is down for a little while, and no data's collected off that, the assumption is, well, it was down, or it was impacted, or there was a breach, or whatever, it could be any of those. So, you got to, there's still this human need, there's still the need for humans to look at the data and realize that there is the bias in there, there is, we're just looking at what data was collected, and you're going to have to make your own thoughts around that, and assumptions on how to actually use that data before you go make those decisions that can impact lots of people, at a human level, enterprise's profitability, things like that. And too often, people think of AI, when it comes out of there, that's the word. Well, it's not the word. - Can I ask a question about this? - Please. - Does that mean that we shouldn't act? - It does not. - Okay. - So, where's the fine line? - Yeah, I think. - Going back to this notion of can we do it, or should we do it? Should we act? - Yeah, I think you should do it, but you should use it for what it is. It's augmenting, it's helping you, assisting you to make a valued or good decision. And hopefully it's a better decision than you would've made without it. - I think it's great, I think also, your answer's right too, that you have to iterate faster, and faster, and faster, and discover sources of information, or sources of data that you're not currently using, and, that's why this thing starts getting really important. - I think you touch on a really good point about, should you or shouldn't you? You look at Google, and you look at the data that they've been using, and some of that out there, from a digital twin perspective, is not being approved, or not authorized, and even once they've made changes, it's still floating around out there. Where do you know where it is? So, there's this dilemma of, how do you have a digital twin that you want to have, and is going to work for you, and is going to do things for you to make your life easier, to do these things, mundane tasks, whatever? But how do you also control it to do things you don't want it to do? - Ad-based business models are inherently evil. (laughing) - Well, there's incentives to appropriate our data, and so, are things like blockchain potentially going to give users the ability to control their data? We'll see. - No, I, I'm sorry, but that's actually a really important point. The idea of consensus algorithms, whether it's blockchain or not, blockchain includes games, and something along those lines, whether it's Byzantine fault tolerance, or whether it's Paxos, consensus-based algorithms are going to be really, really important. Parts of this conversation, because the data's going to be more distributed, and you're going to have more elements participating in it. And so, something that allows, especially in the machine-to-machine world, which is a lot of what we're talking about right here, you may not have blockchain, because there's no need for a sense of incentive, which is what blockchain can help provide. - And there's no middleman. - And, well, all right, but there's really, the thing that makes blockchain so powerful is it liberates new classes of applications. But for a lot of the stuff that we're talking about, you can use a very powerful consensus algorithm without having a game side, and do some really amazing things at scale. - So, looking at blockchain, that's a great thing to bring up, right. I think what's inherently wrong with the way we do things today, and the whole overall design of technology, whether it be on-prem, or off-prem, is both the lock and key is behind the same wall. Whether that wall is in a cloud, or behind a firewall. So, really, when there is an audit, or when there is a forensics, it always comes down to a sysadmin, or something else, and the system administrator will have the finger pointed at them, because it all resides, you can edit it, you can augment it, or you can do things with it that you can't really determine. Now, take, as an example, blockchain, where you've got really the source of truth. Now you can take and have the lock in one place, and the key in another place. So that's certainly going to be interesting to see how that unfolds. - So, one of the things, it's good that, we've hit a lot of buzzwords, right now, right? (laughing) AI, and ML, block. - Bingo. - We got the blockchain bingo, yeah, yeah. So, one of the things is, you also brought up, I mean, ethics and everything, and one of the things that I've noticed over the last year or so is that, as I attend briefings or demos, everyone is now claiming that their product is AI or ML-enabled, or blockchain-enabled. And when you try to get answers to the questions, what you really find out is that some things are being pushed as, because they have if-then statements somewhere in their code, and therefore that's artificial intelligence or machine learning. - [Peter] At least it's not "go-to." (laughing) - Yeah, you're that experienced as well. (laughing) So, I mean, this is part of the thing you try to do as a practitioner, as an analyst, as an influencer, is trying to, you know, the hype of it all. And recently, I attended one where they said they use blockchain, and I couldn't figure it out, and it turns out they use GUIDs to identify things, and that's not blockchain, it's an identifier. (laughing) So, one of the ethics things that I think we, as an enterprise community, have to deal with, is the over-promising of AI, and ML, and deep learning, and recognition. It's not, I don't really consider it visual recognition services if they just look for red pixels. I mean, that's not quite the same thing. Yet, this is also making things much harder for your average CIO, or worse, CFO, to understand whether they're getting any value from these technologies. - Old bottle. - Old bottle, right. - And I wonder if the data companies, like that you talked about, or the top five, I'm more concerned about their nearly, or actual $1 trillion valuations having an impact on their ability of other companies to disrupt or enter into the field more so than their data technologies. Again, we're coming to another perfect storm of the companies that have data as their asset, even though it's still not on their financial statements, which is another indicator whether it's really an asset, is that, do we need to think about the terms of AI, about whose hands it's in, and who's, like, once one large trillion-dollar company decides that you are not a profitable company, how many other companies are going to buy that data and make that decision about you? - Well, and for the first time in business history, I think, this is true, we're seeing, because of digital, because it's data, you're seeing tech companies traverse industries, get into, whether it's content, or music, or publishing, or groceries, and that's powerful, and that's awful scary. - If you're a manger, one of the things your ownership is asking you to do is to reduce asset specificities, so that their capital could be applied to more productive uses. Data reduces asset specificities. It brings into question the whole notion of vertical industry. You're absolutely right. But you know, one quick question I got for you, playing off of this is, again, it goes back to this notion of can we do it, and should we do it? I find it interesting, if you look at those top five, all data companies, but all of them are very different business models, or they can classify the two different business models. Apple is transactional, Microsoft is transactional, Google is ad-based, Facebook is ad-based, before the fake news stuff. Amazon's kind of playing it both sides. - Yeah, they're kind of all on a collision course though, aren't they? - But, well, that's what's going to be interesting. I think, at some point in time, the "can we do it, should we do it" question is, brands are going to be identified by whether or not they have gone through that process of thinking about, should we do it, and say no. Apple is clearly, for example, incorporating that into their brand. - Well, Silicon Valley, broadly defined, if I include Seattle, and maybe Armlock, not so much IBM. But they've got a dual disruption agenda, they've always disrupted horizontal tech. Now they're disrupting vertical industries. - I was actually just going to pick up on what she was talking about, we were talking about buzzword, right. So, one we haven't heard yet is voice. Voice is another big buzzword right now, when you couple that with IoT and AI, here you go, bingo, do I got three points? (laughing) Voice recognition, voice technology, so all of the smart speakers, if you think about that in the world, there are 7,000 languages being spoken, but yet if you look at Google Home, you look at Siri, you look at any of the devices, I would challenge you, it would have a lot of problem understanding my accent, and even when my British accent creeps out, or it would have trouble understanding seniors, because the way they talk, it's very different than a typical 25-year-old person living in Silicon Valley, right. So, how do we solve that, especially going forward? We're seeing voice technology is going to be so more prominent in our homes, we're going to have it in the cars, we have it in the kitchen, it does everything, it listens to everything that we are talking about, not talking about, and records it. And to your point, is it going to start making decisions on our behalf, but then my question is, how much does it actually understand us? - So, I just want one short story. Siri can't translate a word that I ask it to translate into French, because my phone's set to Canadian English, and that's not supported. So I live in a bilingual French English country, and it can't translate. - But what this is really bringing up is if you look at society, and culture, what's legal, what's ethical, changes across the years. What was right 200 years ago is not right now, and what was right 50 years ago is not right now. - It changes across countries. - It changes across countries, it changes across regions. So, what does this mean when our AI has agency? How do we make ethical AI if we don't even know how to manage the change of what's right and what's wrong in human society? - One of the most important questions we have to worry about, right? - Absolutely. - But it also says one more thing, just before we go on. It also says that the issue of economies of scale, in the cloud. - Yes. - Are going to be strongly impacted, not just by how big you can build your data centers, but some of those regulatory issues that are going to influence strongly what constitutes good experience, good law, good acting on my behalf, agency. - And one thing that's underappreciated in the marketplace right now is the impact of data sovereignty, if you get back to data, countries are now recognizing the importance of managing that data, and they're implementing data sovereignty rules. Everyone talks about California issuing a new law that's aligned with GDPR, and you know what that meant. There are 30 other states in the United States alone that are modifying their laws to address this issue. - Steve. - So, um, so, we got a number of years, no matter what Ray Kurzweil says, until we get to artificial general intelligence. - The singularity's not so near? (laughing) - You know that he's changed the date over the last 10 years. - I did know it. - Quite a bit. And I don't even prognosticate where it's going to be. But really, where we're at right now, I keep coming back to, is that's why augmented intelligence is really going to be the new rage, humans working with machines. One of the hot topics, and the reason I chose to speak about it is, is the future of work. I don't care if you're a millennial, mid-career, or a baby boomer, people are paranoid. As machines get smarter, if your job is routine cognitive, yes, you have a higher propensity to be automated. So, this really shifts a number of things. A, you have to be a lifelong learner, you've got to learn new skillsets. And the dynamics are changing fast. Now, this is also a great equalizer for emerging startups, and even in SMBs. As the AI improves, they can become more nimble. So back to your point regarding colossal trillion dollar, wait a second, there's going to be quite a sea change going on right now, and regarding demographics, in 2020, millennials take over as the majority of the workforce, by 2025 it's 75%. - Great news. (laughing) - As a baby boomer, I try my damnedest to stay relevant. - Yeah, surround yourself with millennials is the takeaway there. - Or retire. (laughs) - Not yet. - One thing I think, this goes back to what Karen was saying, if you want a basic standard to put around the stuff, look at the old ISO 38500 framework. Business strategy, technology strategy. You have risk, compliance, change management, operations, and most importantly, the balance sheet in the financials. AI and what Tony was saying, digital transformation, if it's of meaning, it belongs on a balance sheet, and should factor into how you value your company. All the cyber security, and all of the compliance, and all of the regulation, is all stuff, this framework exists, so look it up, and every time you start some kind of new machine learning project, or data sense project, say, have we checked the box on each of these standards that's within this machine? And if you haven't, maybe slow down and do your homework. - To see a day when data is going to be valued on the balance sheet. - It is. - It's already valued as part of the current, but it's good will. - Certainly market value, as we were just talking about. - Well, we're talking about all of the companies that have opted in, right. There's tens of thousands of small businesses just in this region alone that are opt-out. They're small family businesses, or businesses that really aren't even technology-aware. But data's being collected about them, it's being on Yelp, they're being rated, they're being reviewed, the success to their business is out of their hands. And I think what's really going to be interesting is, you look at the big data, you look at AI, you look at things like that, blockchain may even be a potential for some of that, because of mutability, but it's when all of those businesses, when the technology becomes a cost, it's cost-prohibitive now, for a lot of them, or they just don't want to do it, and they're proudly opt-out. In fact, we talked about that last night at dinner. But when they opt-in, the company that can do that, and can reach out to them in a way that is economically feasible, and bring them back in, where they control their data, where they control their information, and they do it in such a way where it helps them build their business, and it may be a generational business that's been passed on. Those kind of things are going to make a big impact, not only on the cloud, but the data being stored in the cloud, the AI, the applications that you talked about earlier, we talked about that. And that's where this bias, and some of these other things are going to have a tremendous impact if they're not dealt with now, at least ethically. - Well, I feel like we just got started, we're out of time. Time for a couple more comments, and then officially we have to wrap up. - Yeah, I had one thing to say, I mean, really, Henry Ford, and the creation of the automobile, back in the early 1900s, changed everything, because now we're no longer stuck in the country, we can get away from our parents, we can date without grandma and grandpa setting on the porch with us. (laughing) We can take long trips, so now we're looked at, we've sprawled out, we're not all living in the country anymore, and it changed America. So, AI has that same capabilities, it will automate mundane routine tasks that nobody wanted to do anyway. So, a lot of that will change things, but it's not going to be any different than the way things changed in the early 1900s. - It's like you were saying, constant reinvention. - I think that's a great point, let me make one observation on that. Every period of significant industrial change was preceded by the formation, a period of formation of new assets that nobody knew what to do with. Whether it was, what do we do, you know, industrial manufacturing, it was row houses with long shafts tied to an engine that was coal-fired, and drove a bunch of looms. Same thing, railroads, large factories for Henry Ford, before he figured out how to do an information-based notion of mass production. This is the period of asset formation for the next generation of social structures. - Those ship-makers are going to be all over these cars, I mean, you're going to have augmented reality right there, on your windshield. - Karen, bring it home. Give us the drop-the-mic moment. (laughing) - No pressure. - Your AV guys are not happy with that. So, I think the, it all comes down to, it's a people problem, a challenge, let's say that. The whole AI ML thing, people, it's a legal compliance thing. Enterprises are going to struggle with trying to meet five billion different types of compliance rules around data and its uses, about enforcement, because ROI is going to make risk of incarceration as well as return on investment, and we'll have to manage both of those. I think businesses are struggling with a lot of this complexity, and you just opened a whole bunch of questions that we didn't really have solid, "Oh, you can fix it by doing this." So, it's important that we think of this new world of data focus, data-driven, everything like that, is that the entire IT and business community needs to realize that focusing on data means we have to change how we do things and how we think about it, but we also have some of the same old challenges there. - Well, I have a feeling we're going to be talking about this for quite some time. What a great way to wrap up CUBE NYC here, our third day of activities down here at 37 Pillars, or Mercantile 37. Thank you all so much for joining us today. - Thank you. - Really, wonderful insights, really appreciate it, now, all this content is going to be available on theCUBE.net. We are exposing our video cloud, and our video search engine, so you'll be able to search our entire corpus of data. I can't wait to start searching and clipping up this session. Again, thank you so much, and thank you for watching. We'll see you next time.

Published Date : Sep 13 2018

SUMMARY :

- Well, and for the first

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ChrisPERSON

0.99+

StevePERSON

0.99+

Mark LyndPERSON

0.99+

KarenPERSON

0.99+

Karen LopezPERSON

0.99+

JohnPERSON

0.99+

Steve ArdirePERSON

0.99+

AmazonORGANIZATION

0.99+

BobPERSON

0.99+

Peter BurrisPERSON

0.99+

Dave VellantePERSON

0.99+

Chris PennPERSON

0.99+

GoogleORGANIZATION

0.99+

Carla GentryPERSON

0.99+

DavePERSON

0.99+

Theo LauPERSON

0.99+

CarlaPERSON

0.99+

Kevin L. JacksonPERSON

0.99+

MicrosoftORGANIZATION

0.99+

IBMORGANIZATION

0.99+

PeterPERSON

0.99+

Tony FlathPERSON

0.99+

TonyPERSON

0.99+

April, 1983DATE

0.99+

AppleORGANIZATION

0.99+

Silicon ValleyLOCATION

0.99+

Ray KurzweilPERSON

0.99+

ZuckerbergPERSON

0.99+

New YorkLOCATION

0.99+

FacebookORGANIZATION

0.99+

2020DATE

0.99+

twoQUANTITY

0.99+

75%QUANTITY

0.99+

Ginni RomettyPERSON

0.99+

Bob HayesPERSON

0.99+

80%QUANTITY

0.99+

GovCloudORGANIZATION

0.99+

35 yearsQUANTITY

0.99+

2025DATE

0.99+

OklahomaLOCATION

0.99+

Mark ZuckerbergPERSON

0.99+

USLOCATION

0.99+

two questionsQUANTITY

0.99+

United StatesLOCATION

0.99+

AprilDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

29 databasesQUANTITY

0.99+

MarkPERSON

0.99+

7,000 languagesQUANTITY

0.99+

five billionQUANTITY

0.99+

Elon MuskPERSON

0.99+

1980sDATE

0.99+

Unconventional VenturesORGANIZATION

0.99+

IRSORGANIZATION

0.99+

SiriTITLE

0.99+

eight childrenQUANTITY

0.99+

bothQUANTITY

0.99+

oneQUANTITY

0.99+

ArmlockORGANIZATION

0.99+

FrenchOTHER

0.99+

Trust InsightORGANIZATION

0.99+

ninth yearQUANTITY

0.99+

congressORGANIZATION

0.99+

first timeQUANTITY

0.99+

PaisanPERSON

0.99+

Yaron Haviv, Iguazio | theCUBE NYC 2018


 

Live from New York It's theCUBE! Covering theCUBE New York City 2018 Brought to you by Silicon Angle Media and it's ecosystem partners >> Hey welcome back and we're live in theCUBE in New York city. It's our 2nd day of two days of coverage CUBE NYC. The hashtag CUBENYC Formerly Big data NYC renamed because it's about big data, it's about the server, it's about Cooper _________'s multi-cloud data. It's all about data, and that's the fundamental change in the industry. Our next guest is Yaron Haviv, who's the CTO of Iguazio, key alumni, always coming out with some good commentary smart analysis. Kind of a guest host as well as an industry participant supplier. Welcome back to theCUBE. Good to see you. >> Thank you John. >> Love having you on theCUBE because you always bring some good insight and we appreciate that. Thank you so much. First, before we get into some of the comments because I really want to delve into comments that David Richards said a few years ago, CEO of RenDisco. He said, "Cloud's going to kill Hadoop". And people were looking at him like, "Oh my God, who is this heretic? He's crazy. What is he talking about?" But you might not need Hadoop, if you can run server less Spark, Tensorflow.... You talk about this off camera. Is Hadoop going to be the open stack of the big data world? >> I don't think cloud necessary killed Hadoop, although it is working on that, you know because you go to Amazon and you know, you can consume a bunch of services and you don't really need to think about Hadoop. I think cloud native serve is starting to kill Hadoop, cause Hadoop is three layers, you know, it's a file system, it's DFS, and then you have server scheduling Yarn, then you have applications starting with map produce and then you evolve into things like Spark. Okay, so, file system I don't really need in the cloud. I use Asfree, I can use a database as a service, as you know, pretty efficient way of storing data. For scheduling, Kubernetes is a much more generic way of scheduling workloads and not confined to Spark and specific workloads. I can run with Dancerflow, I can run with data science tools, etc., just containerize. So essentially, why would I need Hadoop? If I can take the traditional tools people are now evolving in and using like Jupiter Notebooks, Spark, Dancerflow, you know, those packages with Kubernetes on top of a database as a service and some object store, I have a much easier stack to work with. And I could mobilize that whether it's in the cloud, you know on different vendors. >> Scale is important too. How do you scale it? >> Of course, you have independent scaling between data and computation, unlike Hadoop. So I can just go to Google, and use Vquery, or use, you know, DynamoDB on Amazon or Redchick, or whatever and automatically scale it down and then, you know >> That's a unique position, so essentially, Hadoop versus Kubernetes is a top-line story. And wouldn't that be ironic for Google, because Google essentially created Map Produce and Coudera ran with it and went public, but when we're talking about 2008 timeframe, 2009 timeframe, back when ventures with cloud were just emerging in the mainstream. So wouldn't it be ironic Kubernetes, which is being driven by Google, ends up taking over Hadoop? In terms of running things on Kubernetes and cloud eight on Visa Vis on premise with Hadoop. >> The poster is tend to give this comment about Google, but essentially Yahoo started Hadoop. Google started the technology  and couple of years after Hadoop started, with Google they essentially moved to a different architecture, with something called Percolator. So Google's not too associated with Hadoop. They're not really using this approach for a long time. >> Well they wrote the map-produced paper and the internal conversations we report on theCUBE about Google was, they just let that go. And Yahoo grabbed it. (cross-conversation) >> The companies that had the most experience were the first to leave. And I think it may respect what you're saying. As the marketplace realizes the outcomes of the dubious associate with, they will find other ways of achieving those outcomes. It might be more depth. >> There's also a fundamental shift in the consumption where Hadoop was about a ranking pages in a batch form. You know, just collecting logs and ranking pages, okay. The chances that people have today revolve around applying AI to business application. It needs to be a lot more concurring, transactional, real-time ish, you know? It's nothing to do with Hadoop, okay? So that's why you'll see more and more workers, mobilizing different black server functions, into service pre-canned services, etc. And Kubernetes playing a good role here is providing the trend. Transport for migrating workloads across cloud providers, because I can use GKE, the Google Kubenetes, or Amazon Kubernetes, or Azure Kubernetes, and I could write a similar application and deploy it on any cloud, or on Clam on my own private cluster. It makes the infrastructure agnostic really application focused. >> Question about Kubernetes we heard on theCUBE earlier, the VP of Project BlueData said that Kubernetes ecosystem and community needs to do a better job with Stapla, they nailed Stapflalis, Stafle application support is something that they need help on. Do you agree with that comment, and then if so, what alternatives do you have for customers who care about Stafe? >> They should use our product (laughing) >> (mumbling) Is Kubernetes struggling there? And if so, talk about your product >> So, I think that our challenge is rounded that there are many solutions in that. I think that they are attacking it from a different approach Many of them are essentially providing some block storage to different containers on really cloud 90. What you want to be able is to have multiple containers access the same data. That means either sharing through file systems, for objects or through databases because one container is generating, for example, ingestion or __________. Another container is manipulating that same data. A third container may look for something in the data, and generate a trigger or an action. So you need shared access to data from those containers. >> The rest of the data synchronizes all three of those things. >> Yes because the data is the form of state. The form of state cannot be associated with the same container, which is what most of where I am very active and sincere in those committees, and you have all the storage guys in the committees, and they think the block story just drag solution. Cause they still think like virtual machines, okay? But the general idea is that if you think about Kubernetes is like the new OS, where you have many processes, they're just scattered around. In OS, the way for us to share state between processes an OS, is whether through files, or through databases, in those form. And that's really what >> Threads and databases as a positive engagement. >> So essentially I gave maybe two years ago, a session at KubeCon in Europe about what we're doing on storing state. It's really high-performance access from those container processes to our database. Impersonate objects, files, streams or time series data, etc And then essentially, all those workloads just mount on top of and we can all share stape. We can even control the access for each >> Do you think you nailed the stape problem? >> Yes, by the way, we have a managed service. Anyone could go today to our cloud, to our website, that's in our cloud. It gets it's own Kubernetes cluster, a provision within less than 10 minutes, five to 10 minutes. With all of those services pre-integrated with Spark, Presto, ______________, real-time, these services functions. All that pre-configured on it's own time. I figured all of these- >> 100% compatible with Kubernetes, it's a good investment >> Well we're just expanding it to Kubernetes stripes, now it's working on them, Amazon Kubernetes, EKS I think, we're working on AKS and GK. We partner with Azure and Google. And we're also building an ad solution that is essentially exactly the same stock. Can run on an edge appliance in a factory. You can essentially mobilize data and functions back and forth. So you can go and develop your work loads, your application in the cloud, test it under simulation, push a single button and teleport the artifacts into the edge factory. >> So is it like a real-time Kubernetes? >> Yes, it's a real-time Kubernetes. >> If you _______like the things we're doing, it's all real-time. >> Talk about real-time in the database world because you mentioned time-series databases. You give objects store versus blog. Talk about time series. You're talking about data that is very relevant in the moment. And also understanding time series data. And then, it's important post-event, if you will, meaning How do you store it? Do you care? I mean, it's important to manage the time series. At the same time, it might not be as valuable as other data, or valuable at certain points and time, which changes it's relationship to how it's stored and how it's used. Talk about the dynamic of time series.. >> Figured it out in the last six or 12 months that since real-time is about time series. Everything you think about real-time censored data, even video is a time-series of frames, okay And what everyone wants to do is just huge amount of time series. They want to cross-correlate it, because for example, you think about stock tickers you know, the stock has an impact from news feeds or Twitter feeds, or of a company or a segment. So essentially, what they need to do is something called multi-volume analysis of multiple time series to be able to extract some meaning, and then decide if you want to sell or buy a stock, as in vacation example. And there is a huge gap in the solution in that market, because most of the time series databases were designed for operational databases, you know, things that monitor apps. Nothing that injects millions of data points per second, and cross-correlates and run real-time AI analytics. Ah, so we've essentially extended because we have a programmable database essentially under the hoop. We've extended it to support time series data with about 50 to 1 compression ratio, compared to some other solutions. You know we've break with the customer, we've done sizing, they told them us they need half a pitabyte. After a small sizing exercise, about 10 to 20 terabytes of storage for the same data they stored in Kassandra for 500 terabytes. No huge ingestion rates, and what's very important, we can do an in-flight with all those cross-correlations, so, that's something that's working very well for us. >> This could help on smart mobility. Kenex 5G comes on, certainly. Intelligent edge. >> So the customers we have, these cases that we applied right now is in financial services, two or three main applications. One is tick data and analytics, everyone wants to be smarter learning on how to buy and sell stocks or manage risk, the second one is infrastructure, monitoring, critical infrastructure, monitoring is SLA monitoring is be able to monitor network devices, latencies, applications, you now, transaction rate, or that, be able to predict potential failures or escalation We have similar applications; we have about three Telco customers using it for real-time time. Series analytics are metric data, cybersecurity attacks, congestion avoidance, SLA management, and also automotive. Fleet management, file linking, they are also essentially feeding huge data sets of time series analytics. They're running cross-correlation and AI logic, so now they can generate triggers. Now apply to Hadoop. What does Hadoop have anything to do with those kinds of applications? They cannot feed huge amounts of datasets, they cannot react in real-time, doesn't store time-series efficiently. >> Hapoop (laughing) >> You said that. >> Yeah. That's good. >> One, I know we don't have a lot of time left. We're running out of time, but I want to make sure we get this out here. How are you engaging with customers? You guys got great technical support. We can vouch for the tech chops that you guys have. We seen the solution. If it's compatible to Kubernetes, certainly this is an alternative to have really great analytical infrastructure. Cloud native, goodness of your building, You do PFC's, they go to your website, and how do you engage, how do you get deals? How do people work with you? >> So because now we have a cloud service, so also we engage through the cloud. Mainly, we're going after customers and leads, or from webinars and activities on the internet, and we sort of follow-up with those customers, we know >> Direct sales? >> Direct sales, but through lead generation mechanism. Marketplace activity, Amazon, Azure, >> Partnerships with Azure and Google now. And Azure joint selling activities. They can actually resale and get compensated. Our solution is an edge for Azure. Working on similar solution for Google. Very focused on retailers. That's the current market focus of since you think about stores that have a single supermarket will have more than a 1,000 cameras. Okay, just because they're monitoring shelves in real-time, think about Amazon go, kind of replication. Real-time inventory management. You cannot push a 1,000 camera feeds into the cloud. In order to analyze it then decide on inventory level. Proactive action, so, those are the kind of applications. >> So bigger deals, you've had some big deals. >> Yes, we're really not a raspberry pie-kind of solution. That's where the bigger customers >> Got it. Yaron, thank you so much. The CTO of Iguazio Check him out. It's actually been great commentary. The Hadoop versus Kubernetes narrative. Love to explore that further with you. Stay with us for more coverage after this short break. We're live in day 2 of CUBE NYC. Par Strata, Hadoop Strata, Hadoop World. CUBE Hadoop World, whatever you want to call it. It's all because of the data. We'll bring it to ya. Stay with us for more after this short break. (upbeat music)

Published Date : Sep 13 2018

SUMMARY :

It's all about data, and that's the fundamental change Love having you on theCUBE because you always and then you evolve into things like Spark. How do you scale it? and then, you know and cloud eight on Visa Vis on premise with Hadoop. Google started the technology and couple of years and the internal conversations we report on theCUBE The companies that had the most experience It's nothing to do with Hadoop, okay? and then if so, what alternatives do you have for So you need shared access to data from those containers. The rest of the data synchronizes is like the new OS, where you have many processes, We can even control the access for each Yes, by the way, we have a managed service. So you can go and develop your work loads, your application If you And then, it's important post-event, if you will, meaning because most of the time series databases were designed for This could help on smart mobility. So the customers we have, and how do you engage, how do you get deals? and we sort of follow-up with those customers, we know Direct sales, but through lead generation mechanism. since you think about stores that have Yes, we're really not a raspberry pie-kind of solution. It's all because of the data.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JohnPERSON

0.99+

Lisa MartinPERSON

0.99+

Ed MacoskyPERSON

0.99+

Darren AnthonyPERSON

0.99+

Yaron HavivPERSON

0.99+

Mandy DollyPERSON

0.99+

Mandy DhaliwalPERSON

0.99+

David RichardsPERSON

0.99+

Suzi JewettPERSON

0.99+

AmazonORGANIZATION

0.99+

AWSORGANIZATION

0.99+

John FurrierPERSON

0.99+

HPORGANIZATION

0.99+

twoQUANTITY

0.99+

2.9 timesQUANTITY

0.99+

DarrenPERSON

0.99+

GoogleORGANIZATION

0.99+

SuziPERSON

0.99+

Silicon Angle MediaORGANIZATION

0.99+

RenDiscoORGANIZATION

0.99+

2009DATE

0.99+

Suzie JewittPERSON

0.99+

HPEORGANIZATION

0.99+

2022DATE

0.99+

YahooORGANIZATION

0.99+

LisaPERSON

0.99+

2008DATE

0.99+

AKSORGANIZATION

0.99+

Las VegasLOCATION

0.99+

500 terabytesQUANTITY

0.99+

60%QUANTITY

0.99+

2021DATE

0.99+

HadoopTITLE

0.99+

1,000 cameraQUANTITY

0.99+

oneQUANTITY

0.99+

18,000 customersQUANTITY

0.99+

fiveQUANTITY

0.99+

AmsterdamLOCATION

0.99+

2030DATE

0.99+

OneQUANTITY

0.99+

HIPAATITLE

0.99+

tomorrowDATE

0.99+

2026DATE

0.99+

YaronPERSON

0.99+

two daysQUANTITY

0.99+

EuropeLOCATION

0.99+

FirstQUANTITY

0.99+

todayDATE

0.99+

TelcoORGANIZATION

0.99+

bothQUANTITY

0.99+

threeQUANTITY

0.99+

DD, Cisco + Han Yang, Cisco | theCUBE NYC 2018


 

>> Live from New York, It's the CUBE! Covering theCUBE, New York City 2018. Brought to you by SiliconANGLE Media and its Ecosystem partners. >> Welcome back to the live CUBE coverage here in New York City for CUBE NYC, #CubeNYC. This coverage of all things data, all things cloud, all things machine learning here in the big data realm. I'm John Furrier and Dave Vellante. We've got two great guests from Cisco. We got DD who is the Vice President of Data Center Marketing at Cisco, and Han Yang who is the Senior Product Manager at Cisco. Guys, welcome to the Cube. Thanks for coming on again. >> Good to see ya. >> Thanks for having us. >> So obviously one of the things that has come up this year at the Big Data Show, used to be called Hadoop World, Strata Data, now it's called, the latest name. And obviously CUBE NYC, we changed from Big Data NYC to CUBE NYC, because there's a lot more going on. I heard hallway conversations around blockchain, cryptocurrency, Kubernetes has been said on theCUBE already at least a dozen times here today, multicloud. So you're seeing the analytical world try to be, in a way, brought into the dynamics around IT infrastructure operations, both cloud and on premises. So interesting dynamics this year, almost a dev ops kind of culture to analytics. This is a new kind of sign from this community. Your thoughts? >> Absolutely, I think data and analytics is one of those things that's pervasive. Every industry, it doesn't matter. Even at Cisco, I know we're going to talk a little more about the new AI and ML workload, but for the last few years, we've been using AI and ML techniques to improve networking, to improve security, to improve collaboration. So it's everywhere. >> You mean internally, in your own IT? >> Internally, yeah. Not just in IT, in the way we're designing our network equipment. We're storing data that's flowing through the data center, flowing in and out of clouds, and using that data to make better predictions for better networking application performance, security, what have you. >> The first topic I want to talk to you guys about is around the data center. Obviously, you do data center marketing, that's where all the action is. The cloud, obviously, has been all the buzz, people going to the cloud, but Andy Jassy's announcement at VMworld really is a validation that we're seeing, for the first time, hybrid multicloud validated. Amazon announced RDS on VMware on-premises. >> That's right. This is the first time Amazon's ever done anything of this magnitude on-premises. So this is a signal from the customers voting with their wallet that on-premises is a dynamic. The data center is where the data is, that's where the main footprint of IT is. This is important. What's the impact of that dynamic, of data center, where the data is with the option of a cloud. How does that impact data, machine learning, and the things that you guys see as relevant? >> I'll start and Han, feel free to chime in here. So I think those boundaries between this is a data center, and this a cloud, and this is campus, and this is the edge, I think those boundaries are going away. Like you said, data center is where the data is. And it's the ability of our customers to be able to capture that data, process it, curate it, and use it for insight to take decision locally. A drone is a data center that flies, and boat is a data center that floats, right? >> And a cloud is a data center that no one sees. >> That's right. So those boundaries are going away. We at Cisco see this as a continuum. It's the edge cloud continuum. The edge is exploding, right? There's just more and more devices, and those devices are cranking out more data than ever before. Like I said, it's the ability of our customers to harness the data to make more meaningful decisions. So Cisco's take on this is the new architectural approach. It starts with the network, because the network is the one piece that connects everything- every device, every edge, every individual, every cloud. There's a lot of data within the network which we're using to make better decisions. >> I've been pretty close with Cisco over the years, since '95 timeframe. I've had hundreds of meetings, some technical, some kind of business. But I've heard that term edge the network many times over the years. This is not a new concept at Cisco. Edge of the network actually means something in Cisco parlance. The edge of the network >> Yeah. >> that the packets are moving around. So again, this is not a new idea at Cisco. It's just materialized itself in a new way. >> It's not, but what's happening is the edge is just now generating so much data, and if you can use that data, convert it into insight and make decisions, that's the exciting thing. And that's why this whole thing about machine learning and artificial intelligence, it's the data that's being generated by these cameras, these sensors. So that's what is really, really interesting. >> Go ahead, please. >> One of our own studies pointed out that by 2021, there will be 847 zettabytes of information out there, but only 1.3 zettabytes will actually ever make it back to the data center. That just means an opportunity for analytics at the edge to make sense of that information before it ever makes it home. >> What were those numbers again? >> I think it was like 847 zettabytes of information. >> And how much makes it back? >> About 1.3. >> Yeah, there you go. So- >> So a huge compression- >> That confirms your research, Dave. >> We've been saying for a while now that most of the data is going to stay at the edge. There's no reason to move it back. The economics don't support it, the latency doesn't make sense. >> The network cost alone is going to kill you. >> That's right. >> I think you really want to collect it, you want to clean it, and you want to correlate it before ever sending it back. Otherwise, sending that information, of useless information, that status is wonderful. Well that's not very valuable. And 99.9 percent, "things are going well." >> Temperature hasn't changed. (laughs) >> If it really goes wrong, that's when you want to alert or send more information. How did it go bad? Why did it go bad? Those are the more insightful things that you want to send back. >> This is not just for IoT. I mean, cat pictures moving between campuses cost money too, so why not just keep them local, right? But the basic concepts of networking. This is what I want to get in my point, too. You guys have some new announcements around UCS and some of the hardware and the gear and the software. What are some of the new announcements that you're announcing here in New York, and what does it mean for customers? Because they want to know not only speeds and feeds. It's a software-driven world. How does the software relate? How does the gear work? What's the management look like? Where's the control plane? Where's the management plane? Give us all the data. >> I think the biggest issues starts from this. Data scientists, their task is to export different data sources, find out the value. But at the same time, IT is somewhat lagging behind. Because as the data scientists go from data source A to data source B, it could be 3 petabytes of difference. IT is like, 3 petabytes? That's only from Monday through Wednesday? That's a huge infrastructure requirement change. So Cisco's way to help the customer is to make sure that we're able to come out with blueprints. Blueprints enabling the IT team to scale, so that the data scientists can work beyond their own laptop. As they work through the petabytes of data that's come in from all these different sources, they're able to collaborate well together and make sense of that information. Only by scaling with IT helping the data scientists to work the scale, that's the only way they can succeed. So that's why we announced a new server. It's called a C480ML. Happens to have 8 GPUs from Nvidia inside helping customers that want to do that deep learning kind of capabilities. >> What are some of the use cases on these as products? It's got some new data capabilities. What are some of the impacts? >> Some of the things that Han just mentioned. For me, I think the biggest differentiation in our solution is things that we put around the box. So the management layer, right? I mean, this is not going to be one server and one data center. It's going to be multiple of them. You're never going to have one data center. You're going to have multiple data centers. And we've got a really cool management tool called Intersight, and this is supported in Intersight, day one. And Intersight also uses machine learning techniques to look at data from multiple data centers. And that's really where the innovation is. Honestly, I think every vendor is bend sheet metal around the latest chipset, and we've done the same. But the real differentiation is how we manage it, how we use the data for more meaningful insight. I think that's where some of our magic is. >> Can you add some code to that, in terms of infrastructure for AI and ML, how is it different than traditional infrastructures? So is the management different? The sheet metal is not different, you're saying. But what are some of those nuances that we should understand. >> I think especially for deep learning, multiple scientists around the world have pointed that if you're able to use GPUs, they're able to run the deep learning frameworks faster by roughly two waters magnitude. So that's part of the reason why, from an infrastructure perspective, we want to bring in that GPUs. But for the IT teams, we didn't want them to just add yet another infrastructure silo just to support AI or ML. Therefore, we wanted to make sure it fits in with a UCS-managed unified architecture, enabling the IT team to scale but without adding more infrastructures and silos just for that new workload. But having that unified architecture, it helps the IT to be more efficient and, at the same time, is better support of the data scientists. >> The other thing I would add is, again, the things around the box. Look, this industry is still pretty nascent. There is lots of start-ups, there is lots of different solutions, and when we build a server like this, we don't just build a server and toss it over the fence to the customer and say "figure it out." No, we've done validated design guides. With Google, with some of the leading vendors in the space to make sure that everything works as we say it would. And so it's all of those integrations, those partnerships, all the way through our systems integrators, to really understand a customer's AI and ML environment and can fine tune it for the environment. >> So is that really where a lot of the innovation comes from? Doing that hard work to say, "yes, it's going to be a solution that's going to work in this environment. Here's what you have to do to ensure best practice," etc.? Is that right? >> So I think some of our blueprints or validated designs is basically enabling the IT team to scale. Scale their stores, scale their CPU, scale their GPU, and scale their network. But do it in a way so that we work with partners like Hortonworks or Cloudera. So that they're able to take advantage of the data lake. And adding in the GPU so they're able to do the deep learning with Tensorflow, with Pytorch, or whatever curated deep learning framework the data scientists need to be able to get value out of those multiple data sources. These are the kind of solutions that we're putting together, making sure our customers are able to get to that business outcome sooner and faster, not just a-- >> Right, so there's innovation at all altitudes. There's the hardware, there's the integrations, there's the management. So it's innovation. >> So not to go too much into the weeds, but I'm curious. As you introduce these alternate processing units, what is the relationship between traditional CPUs and these GPUs? Are you managing them differently, kind of communicating somehow, or are they sort of fenced off architecturally. I wonder if you could describe that. >> We actually want it to be integrated, because by having it separated and fenced off, well that's an IT infrastructure silo. You're not going to have the same security policy or the storage mechanisms. We want it to be unified so it's easier on IT teams to support the data scientists. So therefore, the latest software is able to manage both CPUs and GPUs, as well as having a new file system. Those are the solutions that we're putting forth, so that ARC-IT folks can scale, our data scientists can succeed. >> So IT's managing a logical block. >> That's right. And even for things like inventory management, or going back and adding patches in the event of some security event, it's so much better to have one integrated system rather than silos of management, which we see in the industry. >> So the hard news is basically UCS for AI and ML workloads? >> That's right. This is our first server custom built ground up to support these deep learning, machine learning workloads. We partnered with Nvidia, with Google. We announced earlier this week, and the phone is ringing constantly. >> I don't want to say godbot. I just said it. (laughs) This is basically the power tool for deep learning. >> Absolutely. >> That's how you guys see it. Well, great. Thanks for coming out. Appreciate it, good to see you guys at Cisco. Again, deep learning dedicated technology around the box, not just the box itself. Ecosystem, Nvidia, good call. Those guys really get the hot GPUs out there. Saw those guys last night, great success they're having. They're a key partner with you guys. >> Absolutely. >> Who else is partnering, real quick before we end the segment? >> We've been partnering with software sci, we partner with folks like Anaconda, with their Anaconda Enterprise, which data scientists love to use as their Python data science framework. We're working with Google, with their Kubeflow, which is open source project integrating Tensorflow on top of Kubernetes. And of course we've been working with folks like Caldera as well as Hortonworks to access the data lake from a big data perspective. >> Yeah, I know you guys didn't get a lot of credit. Google Cloud, we were certainly amplifying it. You guys were co-developing the Google Cloud servers with Google. I know they were announcing it, and you guys had Chuck on stage there with Diane Greene, so it was pretty positive. Good integration with Google can make a >> Absolutely. >> Thanks for coming on theCUBE, thanks, we appreciate the commentary. Cisco here on theCUBE. We're in New York City for theCUBE NYC. This is where the world of data is converging in with IT infrastructure, developers, operators, all running analytics for future business. We'll be back with more coverage, after this short break. (upbeat digital music)

Published Date : Sep 12 2018

SUMMARY :

It's the CUBE! Welcome back to the live CUBE coverage here So obviously one of the things that has come up this year but for the last few years, Not just in IT, in the way we're designing is around the data center. and the things that you guys see as relevant? And it's the ability of our customers to It's the edge cloud continuum. The edge of the network that the packets are moving around. is the edge is just now generating so much data, analytics at the edge Yeah, there you go. that most of the data is going to stay at the edge. I think you really want to collect it, (laughs) Those are the more insightful things and the gear and the software. the data scientists to work the scale, What are some of the use cases on these as products? Some of the things that Han just mentioned. So is the management different? it helps the IT to be more efficient in the space to make sure that everything works So is that really where a lot of the data scientists need to be able to get value There's the hardware, there's the integrations, So not to go too much into the weeds, Those are the solutions that we're putting forth, in the event of some security event, and the phone is ringing constantly. This is basically the power tool for deep learning. Those guys really get the hot GPUs out there. to access the data lake from a big data perspective. the Google Cloud servers with Google. This is where the world of data

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

NvidiaORGANIZATION

0.99+

CiscoORGANIZATION

0.99+

Han YangPERSON

0.99+

GoogleORGANIZATION

0.99+

New YorkLOCATION

0.99+

Diane GreenePERSON

0.99+

AmazonORGANIZATION

0.99+

DavePERSON

0.99+

HortonworksORGANIZATION

0.99+

2021DATE

0.99+

New York CityLOCATION

0.99+

Andy JassyPERSON

0.99+

8 GPUsQUANTITY

0.99+

847 zettabytesQUANTITY

0.99+

John FurrierPERSON

0.99+

99.9 percentQUANTITY

0.99+

MondayDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

3 petabytesQUANTITY

0.99+

AnacondaORGANIZATION

0.99+

WednesdayDATE

0.99+

DDPERSON

0.99+

first timeQUANTITY

0.99+

one serverQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

PythonTITLE

0.99+

first topicQUANTITY

0.99+

one pieceQUANTITY

0.99+

VMworldORGANIZATION

0.99+

'95DATE

0.98+

1.3 zettabytesQUANTITY

0.98+

NYCLOCATION

0.98+

bothQUANTITY

0.98+

oneQUANTITY

0.98+

this yearDATE

0.98+

Big Data ShowEVENT

0.98+

CalderaORGANIZATION

0.98+

two watersQUANTITY

0.97+

todayDATE

0.97+

ChuckPERSON

0.97+

OneQUANTITY

0.97+

Big DataORGANIZATION

0.97+

earlier this weekDATE

0.97+

IntersightORGANIZATION

0.97+

hundreds of meetingsQUANTITY

0.97+

CUBEORGANIZATION

0.97+

first serverQUANTITY

0.97+

last nightDATE

0.95+

one data centerQUANTITY

0.94+

UCSORGANIZATION

0.92+

petabytesQUANTITY

0.92+

two great guestsQUANTITY

0.9+

TensorflowTITLE

0.86+

CUBE NYCORGANIZATION

0.86+

HanPERSON

0.85+

#CubeNYCLOCATION

0.83+

Strata DataORGANIZATION

0.83+

KubeflowTITLE

0.82+

Hadoop WorldORGANIZATION

0.81+

2018DATE

0.8+

Kickoff | theCUBE NYC 2018


 

>> Live from New York, it's theCUBE covering theCUBE New York City 2018. Brought to you by SiliconANGLE Media and its ecosystem partners. (techy music) >> Hello, everyone, welcome to this CUBE special presentation here in New York City for CUBENYC. I'm John Furrier with Dave Vellante. This is our ninth year covering the big data industry, starting with Hadoop World and evolved over the years. This is our ninth year, Dave. We've been covering Hadoop World, Hadoop Summit, Strata Conference, Strata Hadoop. Now it's called Strata Data, I don't know what Strata O'Reilly's going to call it next. As you all know, theCUBE has been present for the creation at the Hadoop big data ecosystem. We're here for our ninth year, certainly a lot's changed. AI's the center of the conversation, and certainly we've seen some horses come in, some haven't come in, and trends have emerged, some gone away, your thoughts. Nine years covering big data. >> Well, John, I remember fondly, vividly, the call that I got. I was in Dallas at a storage networking world show and you called and said, "Hey, we're doing "Hadoop World, get over there," and of course, Hadoop, big data, was the new, hot thing. I told everybody, "I'm leaving." Most of the people said, "What's Hadoop?" Right, so we came, we started covering, it was people like Jeff Hammerbacher, Amr Awadallah, Doug Cutting, who invented Hadoop, Mike Olson, you know, head of Cloudera at the time, and people like Abi Mehda, who at the time was at B of A, and some of the things we learned then that were profound-- >> Yeah. >> As much as Hadoop is sort of on the back burner now and people really aren't talking about it, some of the things that are profound about Hadoop, really, were the idea, the notion of bringing five megabytes of code to a petabyte of data, for example, or the notion of no schema on write. You know, put it into the database and then figure it out. >> Unstructured data. >> Right. >> Object storage. >> And so, that created a state of innovation, of funding. We were talking last night about, you know, many, many years ago at this event this time of the year, concurrent with Strata you would have VCs all over the place. There really aren't a lot of VCs here this year, not a lot of VC parties-- >> Mm-hm. >> As there used to be, so that somewhat waned, but some of the things that we talked about back then, we said that big money and big data is going to be made by the practitioners, not by the vendors, and that's proved true. I mean... >> Yeah. >> The big three Hadoop distro vendors, Cloudera, Hortonworks, and MapR, you know, Cloudera's $2.5 billion valuation, you know, not bad, but it's not a $30, $40 billion value company. The other thing we said is there will be no Red Hat of big data. You said, "Well, the only Red Hat of big data might be "Red Hat," and so, (chuckles) that's basically proved true. >> Yeah. >> And so, I think if we look back we always talked about Hadoop and big data being a reduction, the ROI was a reduction on investment. >> Yeah. >> It was a way to have a cheaper data warehouse, and that's essentially-- Well, what did we get right and wrong? I mean, let's look at some of the trends. I mean, first of all, I think we got pretty much everything right, as you know. We tend to make the calls pretty accurately with theCUBE. Got a lot of data, we look, we have the analytics in our own system, plus we have the research team digging in, so you know, we pretty much get, do a good job. I think one thing that we predicted was that Hadoop certainly would change the game, and that did. We also predicted that there wouldn't be a Red Hat for Hadoop, that was a production. The other prediction was is that we said Hadoop won't kill data warehouses, it didn't, and then data lakes came along. You know my position on data lakes. >> Yeah. >> I've always hated the term. I always liked data ocean because I think it was much more fluidity of the data, so I think we got that one right and data lakes still doesn't look like it's going to be panning out well. I mean, most people that deploy data lakes, it's really either not a core thing or as part of something else and it's turning into a data swamp, so I think the data lake piece is not panning out the way it, people thought it would be. I think one thing we did get right, also, is that data would be the center of the value proposition, and it continues and remains to be, and I think we're seeing that now, and we said data's the development kit back in 2010 when we said data's going to be part of programming. >> Some of the other things, our early data, and we went out and we talked to a lot of practitioners who are the, it was hard to find in the early days. They were just a select few, I mean, other than inside of Google and Yahoo! But what they told us is that things like SQL and the enterprise data warehouse were key components on their big data strategy, so to your point, you know, it wasn't going to kill the EDW, but it was going to surround it. The other thing we called was cloud. Four years ago our data showed clearly that much of this work, the modeling, the big data wrangling, et cetera, was being done in the cloud, and Cloudera, Hortonworks, and MapR, none of them at the time really had a cloud strategy. Today that's all they're talking about is cloud and hybrid cloud. >> Well, it's interesting, I think it was like four years ago, I think, Dave, when we actually were riffing on the notion of, you know, Cloudera's name. It's called Cloudera, you know. If you spell it out, in Cloudera we're in a cloud era, and I think we were very aggressive at that point. I think Amr Awadallah even made a comment on Twitter. He was like, "I don't understand "where you guys are coming from." We were actually saying at the time that Cloudera should actually leverage more cloud at that time, and they didn't. They stayed on their IPO track and they had to because they had everything betted on Impala and this data model that they had and being the business model, and then they went public, but I think clearly cloud is now part of Cloudera's story, and I think that's a good call, and it's not too late for them. It never was too late, but you know, Cloudera has executed. I mean, if you look at what's happened with Cloudera, they were the only game in town. When we started theCUBE we were in their office, as most people know in this industry, that we were there with Cloudera when they had like 17 employees. I thought Cloudera was going to run the table, but then what happened was Hortonworks came out of the Yahoo! That, I think, changed the game and I think in that competitive battle between Hortonworks and Cloudera, in my opinion, changed the industry, because if Hortonworks did not come out of Yahoo! Cloudera would've had an uncontested run. I think the landscape of the ecosystem would look completely different had Hortonworks not competed, because you think about, Dave, they had that competitive battle for years. The Hortonworks-Cloudera battle, and I think it changed the industry. I think it couldn't been a different outcome. If Hortonworks wasn't there, I think Cloudera probably would've taken Hadoop and making it so much more, and I think they wouldn't gotten more done. >> Yeah, and I think the other point we have to make here is complexity really hurt the Hadoop ecosystem, and it was just bespoke, new projects coming out all the time, and you had Cloudera, Hortonworks, and maybe to a lesser extent MapR, doing a lot of the heavy lifting, particularly, you know, Hortonworks and Cloudera. They had to invest a lot of their R&D in making these systems work and integrating them, and you know, complexity just really broke the back of the Hadoop ecosystem, and so then Spark came in, everybody said, "Oh, Spark's going to basically replace Hadoop." You know, yes and no, the people who got Hadoop right, you know, embraced it and they still use it. Spark definitely simplified things, but now the conversation has turned to AI, John. So, I got to ask you, I'm going to use your line on you in kind of the ask-me-anything segment here. AI, is it same wine, new bottle, or is it really substantively different in your opinion? >> I think it's substantively different. I don't think it's the same wine in a new bottle. I'll tell you... Well, it's kind of, it's like the bad wine... (laughs) Is going to be kind of blended in with the good wine, which is now AI. If you look at this industry, the big data industry, if you look at what O'Reilly did with this conference. I think O'Reilly really has not done a good job with the conference of big data. I think they blew it, I think that they made it a, you know, monetization, closed system when the big data business could've been all about AI in a much deeper way. I think AI is subordinate to cloud, and you mentioned cloud earlier. If you look at all the action within the AI segment, Diane Greene talking about it at Google Next, Amazon, AI is a software layer substrate that will be underpinned by the cloud. Cloud will drive more action, you need more compute, that drives more data, more data drives the machine learning, machine learning drives the AI, so I think AI is always going to be dependent upon cloud ends or some sort of high compute resource base, and all the cloud analytics are feeding into these AI models, so I think cloud takes over AI, no doubt, and I think this whole ecosystem of big data gets subsumed under either an AWS, VMworld, Google, and Microsoft Cloud show, and then also I think specialization around data science is going to go off on its own. So, I think you're going to see the breakup of the big data industry as we know it today. Strata Hadoop, Strata Data Conference, that thing's going to crumble into multiple, fractured ecosystems. >> It's already starting to be forked. I think the other thing I want to say about Hadoop is that it actually brought such great awareness to the notion of data, putting data at the core of your company, data and data value, the ability to understand how data at least contributes to the monetization of your company. AI would not be possible without the data. Right, and we've talked about this before. You call it the innovation sandwich. The innovation sandwich, last decade, last three decades, has been Moore's law. The innovation sandwich going forward is data, machine intelligence applied to that data, and cloud for scale, and that's the sandwich of innovation over the next 10 to 20 years. >> Yeah, and I think data is everywhere, so this idea of being a categorical industry segment is a little bit off, I mean, although I know data warehouse is kind of its own category and you're seeing that, but I don't think it's like a Magic Quadrant anymore. Every quadrant has data. >> Mm-hm. >> So, I think data's fundamental, and I think that's why it's going to become a layer within a control plane of either cloud or some other system, I think. I think that's pretty clear, there's no, like, one. You can't buy big data, you can't buy AI. I think you can have AI, you know, things like TensorFlow, but it's going to be a completely... Every layer of the stack is going to be impacted by AI and data. >> And I think the big players are going to infuse their applications and their databases with machine intelligence. You're going to see this, you're certainly, you know, seeing it with IBM, the sort of Watson heavy lift. Clearly Google, Amazon, you know, Facebook, Alibaba, and Microsoft, they're infusing AI throughout their entire set of cloud services and applications and infrastructure, and I think that's good news for the practitioners. People aren't... Most companies aren't going to build their own AI, they're going to buy AI, and that's how they close the gap between the sort of data haves and the data have-nots, and again, I want to emphasize that the fundamental difference, to me anyway, is having data at the core. If you look at the top five companies in terms of market value, US companies, Facebook maybe not so much anymore because of the fake news, though Facebook will be back with it's two billion users, but Apple, Google, Facebook, Amazon, who am I... And Microsoft, those five have put data at the core and they're the most valuable companies in the stock market from a market cap standpoint, why? Because it's a recognition that that intangible value of the data is actually quite valuable, and even though banks and financial institutions are data companies, their data lives in silos. So, these five have put data at the center, surrounded it with human expertise, as opposed to having humans at the center and having data all over the place. So, how do they, how do these companies close the gap? How do the companies in the flyover states close the gap? The way they close the gap, in my view, is they buy technologies that have AI infused in it, and I think the last thing I'll say is I see cloud as the substrate, and AI, and blockchain and other services, as the automation layer on top of it. I think that's going to be the big tailwind for innovation over the next decade. >> Yeah, and obviously the theme of machine learning drives a lot of the conversations here, and that's essentially never going to go away. Machine learning is the core of AI, and I would argue that AI truly doesn't even exist yet. It's machine learning really driving the value, but to put a validation on the fact that cloud is going to be driving AI business is some of the terms in popular conversations we're hearing here in New York around this event and topic, CUBENYC and Strata Conference, is you're hearing Kubernetes and blockchain, and you know, these automation, AI operation kind of conversations. That's an IT conversation, (chuckles) so you know, that's interesting. You've got IT, really, with storage. You've got to store the data, so you can't not talk about workloads and how the data moves with workloads, so you're starting to see data and workloads kind of be tossed in the same conversation, that's a cloud conversation. That is all about multi-cloud. That's why you're seeing Kubernetes, a term I never thought I would be saying at a big data show, but Kubernetes is going to be key for moving workloads around, of which there's data involved. (chuckles) Instrumenting the workloads, data inside the workloads, data driving data. This is where AI and machine learning's going to play, so again, cloud subsumes AI, that's the story, and I think that's going to be the big trend. >> Well, and I think you're right, now. I mean, that's why you're hearing the messaging of hybrid cloud and from the big distro vendors, and the other thing is you're hearing from a lot of the no-SQL database guys, they're bringing ACID compliance, they're bringing enterprise-grade capability, so you're seeing the world is hybrid. You're seeing those two worlds come together, so... >> Their worlds, it's getting leveled in the playing field out there. It's all about enterprise, B2B, AI, cloud, and data. That's theCUBE bringing you the data here. New York City, CUBENYC, that's the hashtag. Stay with us for more coverage live in New York after this short break. (techy music)

Published Date : Sep 12 2018

SUMMARY :

Brought to you by SiliconANGLE Media for the creation at the Hadoop big data ecosystem. and some of the things we learned then some of the things that are profound about Hadoop, We were talking last night about, you know, but some of the things that we talked about back then, You said, "Well, the only Red Hat of big data might be being a reduction, the ROI was a reduction I mean, first of all, I think we got and I think we're seeing that now, and the enterprise data warehouse were key components and I think we were very aggressive at that point. Yeah, and I think the other point and all the cloud analytics are and cloud for scale, and that's the sandwich Yeah, and I think data is everywhere, and I think that's why it's going to become I think that's going to be the big tailwind and I think that's going to be the big trend. and the other thing is you're hearing New York City, CUBENYC, that's the hashtag.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AppleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

Diane GreenePERSON

0.99+

GoogleORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

JohnPERSON

0.99+

AlibabaORGANIZATION

0.99+

DavePERSON

0.99+

Dave VellantePERSON

0.99+

Jeff HammerbacherPERSON

0.99+

$30QUANTITY

0.99+

New YorkLOCATION

0.99+

2010DATE

0.99+

IBMORGANIZATION

0.99+

Doug CuttingPERSON

0.99+

Mike OlsonPERSON

0.99+

HortonworksORGANIZATION

0.99+

DallasLOCATION

0.99+

O'ReillyORGANIZATION

0.99+

YahooORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

fiveQUANTITY

0.99+

AWSORGANIZATION

0.99+

Abi MehdaPERSON

0.99+

John FurrierPERSON

0.99+

New York CityLOCATION

0.99+

$2.5 billionQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

MapRORGANIZATION

0.99+

Amr AwadallahPERSON

0.99+

$40 billionQUANTITY

0.99+

17 employeesQUANTITY

0.99+

VMworldORGANIZATION

0.99+

TodayDATE

0.99+

ImpalaORGANIZATION

0.99+

Nine yearsQUANTITY

0.99+

four years agoDATE

0.98+

last nightDATE

0.98+

last decadeDATE

0.98+

Strata Data ConferenceEVENT

0.98+

Strata ConferenceEVENT

0.98+

Hadoop SummitEVENT

0.98+

ninth yearQUANTITY

0.98+

Four years agoDATE

0.98+

two worldsQUANTITY

0.97+

five companiesQUANTITY

0.97+

todayDATE

0.97+

Strata HadoopEVENT

0.97+

Hadoop WorldEVENT

0.96+

CUBEORGANIZATION

0.96+

Google NextORGANIZATION

0.95+

TwitterORGANIZATION

0.95+

this yearDATE

0.95+

SparkORGANIZATION

0.95+

USLOCATION

0.94+

CUBENYCEVENT

0.94+

Strata O'ReillyORGANIZATION

0.93+

next decadeDATE

0.93+

Bala Chandrasekaran, Dell EMC | Dell EMC: Get Ready For AI


 

(techno music) >> Hey welcome back everybody, Jeff Frick here with theCUBE. We're in Austin, Texas at the Dell EMC HPC and AI Innovation Lab. As you can see behind me, there's racks and racks and racks of gear, where they build all types of vessel configurations around specific applications, whether its Oracle or S.A.P. And more recently a lot more around artificial intelligence, whether it's machine learning, deep learning, so it's a really cool place to be. We're excited to be here. And our next guest is Bala Chandrasekaran. He is in the technical staff as a systems engineer. Bala, welcome! >> Thank you. >> So how do you like playing with all these toys all day long? >> Oh I love it! >> I mean you guys have literally everything in there. A lot more than just Dell EMC gear, but you've got switches and networking gear-- >> Right. >> Everything. >> And not just the gear, it's also all the software components, it's the deep learning libraries, deep learning models, so a whole bunch of things that we can get to play around with. >> Now that's interesting 'cause it's harder to see the software, right? >> Exactly right. >> The software's pumping through all these machines but you guys do all types of really, optimization and configuration, correct? >> Yes, we try to make it easy for the end customer. And the project that I'm working on, machine learning for Hadoop, we try to make things easy for the data scientists. >> Right, so we got all the Hadoop shows, Hadoop World, Hadoop Summit, Strata, Big Data NYC, Silicone Valley, and the knock on Hadoop is always it's too hard, there aren't enough engineers, I can't get enough people to do it myself. It's a cool open source project, but it's not that easy to do. You guys are really helping people solve that problem. >> Yes and what you're saying is true for the infrastructure guys. Now imagine a data scientist, right? So Hadoop cluster accessing it, securing it, is going to be really tough for them. And they shouldn't be worried about it. Right? They should be focused on data science. So those are some of the things that we try to do for them. >> So what are some of the tips and tricks as you build these systems that throw people off all the time that are relatively simple things to fix? And then what are some of the hard stuff where you guys have really applied your expertise to get over those challenges? >> Let me give you a small example. So this is a new project A.I. we hired data scientists. So I walk the data scientist through the lab. He looked at all he cluster and he pulled me aside and said hey you're not going to ask me to work on these things, right? I have no idea how to do these things. So that kind of gives you a sense of what a data scientist should focus on and what what they shouldn't focus on. So some of the things that we do, and some of the things that are probably difficult for them is all the libraries that are needed to run their project, the conflicts between libraries, the dependencies between them. So one of the things that we do deliver this pre-configured engine that you can readily download into our product and run. So data scientist don't have to worry about what library I should use. >> Right. >> They have to worry about the models and accuracy and whatever data science needs to be done, rather than focusing on the infrastructure. >> So you not only package the hardware and the systems, but you've packaged the software distribution and all the kind of surrounding components of that as well. >> Exactly right. Right. >> So when you have the data scientists here talking about the Hadoop cluster, if they didn't want to talk about the hardware and the software, what were you helping them with? How did you engage with the customers here at the lab? >> So the example that I gave is for the data scientist that we newly hired for our team so we had to set up environments for them. so that was the example, but the same thing applies for a customer as well. So again to help them in solving the problem we tried to package some of the things as part of our product and deliver it to them so it's easy for them to deploy and get started on things. >> Now the other piece that's included and again is not in this room is the services -- >> Right. >> And the support so you guys have a full team of professional services. Once you configure and figure out what the optimum solution is for them then you got a team that can actually go deploy it at their actual site. >> So we have packaged things even for our services. So the services would go to the customer side. They would apply the solution and download and deploy our packages and be able to demonstrate how easy it is to think of them as tutorials if you like. So here are the tutorials. Here's how you run various models. So here's how easy it is for you to get started. So that's what they would train the customer on. So there's not just the deployment piece of it but just packaging things for them so they can show customers how to get started quickly, how everything works and kind of of give a green check mark if you will. >> So what are some of your favorite applications that people are using these things for? Do you get involved in the applications stack on the customer side? What are some of the fun use cases that people use in your technology to solve? >> So for the application my project is about mission learning on Hadoop via packaging Cloudera's CDSW that's Cloudera Data Science Workbench as part of the product. So that allows data science access to the Hadoop cluster and abstracting the complexities of the cluster. So they can access the cluster. They can access the data. They can have security without worrying about all the intricacies of the cluster. In addition to that they can create different projects, have different libraries in different projects. So they don't have to conflict with each other and also they can add users to it. They can work collaboratively. So basically choose to help data scientists, software developers, do their job and not worry about the infrastructure. >> Right. >> They should not be. >> Right great. Well Bala it's pretty exciting place to work. I'm sure you're having a ball. >> Yes I am thank you. >> All right. Well thanks for taking a few minutes with us and really enjoyed the conversation. >> I appreciate it thank you. All right he's Bala. I'm Jeff. You're watching theCUBE from Austin, Texas at the Dell EMC High Performance Computing and Artificial Intelligence Labs. Thanks for watching. (techno music)

Published Date : Aug 7 2018

SUMMARY :

He is in the technical staff as a systems engineer. I mean you guys have literally everything in there. And not just the gear, And the project that I'm working on, but it's not that easy to do. So those are some of the things that we try to do for them. So some of the things that we do, They have to worry about the models and accuracy and all the kind of surrounding components of that as well. Right. So the example that I gave is for the data scientist And the support so you guys So the services would go to the customer side. So for the application my project is about Well Bala it's pretty exciting place to work. All right. at the Dell EMC High Performance Computing

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jeff FrickPERSON

0.99+

Bala ChandrasekaranPERSON

0.99+

OracleORGANIZATION

0.99+

JeffPERSON

0.99+

BalaPERSON

0.99+

Austin, TexasLOCATION

0.99+

AI Innovation LabORGANIZATION

0.99+

oneQUANTITY

0.98+

Dell EMC High Performance ComputingORGANIZATION

0.98+

Dell EMCORGANIZATION

0.98+

ClouderaORGANIZATION

0.97+

Dell EMC HPCORGANIZATION

0.96+

HadoopTITLE

0.95+

S.A.P.ORGANIZATION

0.94+

Artificial Intelligence LabsORGANIZATION

0.87+

NYCLOCATION

0.85+

theCUBEORGANIZATION

0.83+

Silicone ValleyLOCATION

0.79+

Hadoop SummitEVENT

0.78+

Big DataEVENT

0.72+

StrataEVENT

0.58+

Hadoop WorldEVENT

0.44+

HadoopORGANIZATION

0.41+

Michael Bennett, Dell EMC | Dell EMC: Get Ready For AI


 

(energetic electronic music) >> Hey, welcome back everybody. Jeff Frick here with The Cube. We're in a very special place. We're in Austin, Texas at the Dell EMC HPC and AI Innovation Lab. High performance computing, artificial intelligence. This is really where it all happens. Where the engineers at Dell EMC are putting together these ready-made solutions for the customers. They got every type of application stack in here, and we're really excited to have our next guest. He's right in the middle of it, he's Michael Bennett, Senior Principal Engineer for Dell EMC. Mike, great to see you. >> Great to see you too. >> So you're working on one particular flavor of the AI solutions, and that's really machine learning with Hadoop. So tell us a little bit about that. >> Sure yeah, the product that I work on is called the Ready Solution for AI Machine Learning with Hadoop, and that product is a Cloudera Hadoop distribution on top of our Dell powered servers. And we've partnered with Intel, who has released a deep learning library, called Big DL, to bring both the traditional machine learning capabilities as well as deep learning capabilities to the product. Product also adds a data science workbench that's released by Cloudera. And this tool allows the customer's data scientists to collaborate together, provides them secure access to the Hadoop cluster, and we think all-around makes a great product to allow customers to gain the power of machine learning and deep learning in their environment, while also kind of reducing some of those overhead complexities that IT often faces with managing multiple environments, providing secure access, things like that. >> Right, cause the big knock always on Hadoop is that it's just hard. It's hard to put in, there aren't enough people, there aren't enough experts. So you guys are really offering a pre-bundled solution that's ready to go? >> Correct, yeah. We've built seven or eight different environments going in the lab at any time to validate different hardware permutations that we may offer of the product as well as, we've been doing this since 2009, so there's a lot of institutional knowledge here at Dell to draw on when building and validating these Hadoop products. Our Dell services team has also been going out installing and setting these up, and our consulting services has been helping customers fit the Hadoop infrastructure into their IT model. >> Right, so is there one basic configuration that you guys have? Or have you found there's two or three different standard-use cases that call for two or three different kinds of standardized solutions? >> We find that most customers are preferring the R7-40XC series. This platform can hold 12 3 1/2" form-factor drives in the front, along with four in the mid-plane, while still providing four SSDs in the back. So customers get a lot of versatility with this. It's also won several Hadoop benchmarking awards. >> And do you find, when you're talking to customers or you're putting this together, that they've tried themselves and they've tried to kind of stitch together and cobble together the open-source proprietary stuff all the way down to network cards and all this other stuff to actually make the solution come together? And it's just really hard, right? >> Yeah, right exactly. What we hear over and over from our product management team is that their interactions with customers, come back with customers saying it's just too hard. They get something that's stable and they come back and they don't know why it's no longer working. They have customized environments that each developer wants for their big data analytics jobs. Things like that. So yeah, overall we're hearing that customers are finding it very complex. >> Right, so we hear time and time again that same thing. And even though we've been going to Hadoop Summit and Hadoop World and Stratus, since 2010. The momentum seems to be a little slower in terms of the hype, but now we're really moving into heavy-duty real time production and that's what you guys are enabling with this ready-made solution. >> So with this product, yeah, we focused on enabling Apache Spark on the Hadoop environment. And that Apache Spark distributed computing has really changed the game as far as what it allows customers to do with their analytics jobs. No longer are we writing things to disc, but multiple transformations are being performed in memory, and that's also a big part of what enables the big DL library that Intel released for the platform to train these deep-learning models. >> Right, cause the Sparks enables the real-time analytics, right? Now you've got streaming data coming into this thing, versus the batch which was kind of the classic play of Hadoop. >> Right and not only do you have streaming data coming in, but Spark also enables you to load your data in memory and perform multiple operations on it. And draw insights that maybe you couldn't before with traditional map-reduce jobs. >> Right, right. So what gets you excited to come to work every day? You've been playing with these big machines. You're in the middle of nerd nirvana I think-- >> Yeah exactly. >> With all of the servers and spin-discs. What gets you up in the morning? What are you excited about, as you see AI get more pervasive within the customers and the solutions that you guys are enabling? >> You know, for me, what's always exciting is trying new things. We've got this huge lab environment with all kinds of lab equipment. So if you want to test a new iteration, let's say tiered HGFS storage with SSDs and traditional hard drives, throw it together in a couple of hours and see what the results are. If we wanted to add new PCIE devices like FPGAs for the inference portion the deep-learning development we can put those in our servers and try them out. So I enjoy that, on top of the validated, thoroughly-worked-through solutions that we offer customers, we can also experiment, play around, and work towards that next generation of technology. >> Right, 'cause any combination of hardware that you basically have at your disposal to try together and test and see what happens? >> Right, exactly. And this is my first time actually working at a OEM, and so I was surprised, not only do we have access to anything that you can see out in the market, but we often receive test and development equipment from partners and vendors, that we can work with and collaborate with to ensure that once the product reaches market it has the features that customers need. >> Right, what's the one thing that trips people up the most? Just some simple little switch configuration that you think is like a minor piece of something, that always seems to get in the way? >> Right, or switches in general. I think that people focus on the application because the switch is so abstracted from what the developer or even somebody troubleshooting the system sees, that oftentimes some misconfiguration or some typo that was entered during the switch configuration process that throws customers off or has somebody scratching their head, wondering why they're not getting the kind of performance that they thought. >> Right, well that's why we need more automation, right? That's what you guys are working on. >> Right yeah exactly. >> Keep the fat-finger typos out of the config settings. >> Right, consistent reproducible. None of that, I did it yesterday and it worked I don't know what changed. >> Right, alright Mike. Well thanks for taking a few minutes out of your day, and don't have too much fun playing with all this gear. >> Awesome, thanks for having me. >> Alright, he's Mike Bennett and I'm Jeff Frick. You're watching The Cube, from Austin Texas at the Dell EMC High Performance Computing and AI Labs. Thanks for watching. (energetic electronic music)

Published Date : Aug 7 2018

SUMMARY :

at the Dell EMC HPC and AI Innovation Lab. of the AI solutions, and that's really that IT often faces with managing multiple environments, Right, cause the big knock always on Hadoop going in the lab at any time to validate in the front, along with four in the mid-plane, is that their interactions with customers, and that's what you guys are enabling has really changed the game as far as what it allows Right, cause the Sparks enables And draw insights that maybe you couldn't before You're in the middle of nerd nirvana I think-- that you guys are enabling? for the inference portion the deep-learning development that you can see out in the market, the kind of performance that they thought. That's what you guys are working on. Right, consistent reproducible. and don't have too much fun playing with all this gear. at the Dell EMC High Performance Computing and AI Labs.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jeff FrickPERSON

0.99+

Michael BennettPERSON

0.99+

twoQUANTITY

0.99+

Mike BennettPERSON

0.99+

DellORGANIZATION

0.99+

sevenQUANTITY

0.99+

MikePERSON

0.99+

Dell EMCORGANIZATION

0.99+

The CubeTITLE

0.99+

yesterdayDATE

0.99+

2010DATE

0.99+

Austin, TexasLOCATION

0.98+

bothQUANTITY

0.98+

Austin TexasLOCATION

0.98+

SparkTITLE

0.98+

2009DATE

0.98+

R7-40XCCOMMERCIAL_ITEM

0.98+

IntelORGANIZATION

0.98+

each developerQUANTITY

0.98+

AI Innovation LabORGANIZATION

0.97+

HadoopTITLE

0.97+

first timeQUANTITY

0.96+

Dell EMC High Performance ComputingORGANIZATION

0.96+

fourQUANTITY

0.95+

oneQUANTITY

0.94+

ApacheORGANIZATION

0.94+

one thingQUANTITY

0.93+

The CubeORGANIZATION

0.92+

12 3 1/2"QUANTITY

0.92+

Dell EMC HPCORGANIZATION

0.9+

three different standard-use casesQUANTITY

0.9+

eight different environmentsQUANTITY

0.89+

three differentQUANTITY

0.88+

StratusORGANIZATION

0.83+

Hadoop WorldORGANIZATION

0.79+

one basic configurationQUANTITY

0.76+

AI LabsORGANIZATION

0.74+

four SSDsQUANTITY

0.73+

ClouderaTITLE

0.71+

Hadoop SummitEVENT

0.69+

hoursQUANTITY

0.67+

Hadoop benchmarking awardsTITLE

0.67+

SparksCOMMERCIAL_ITEM

0.48+

HadoopCOMMERCIAL_ITEM

0.34+

James Markarian, SnapLogic | SnapLogic Innovation Day 2018


 

>> Announcer: From San Mateo, California, it's theCUBE! Covering SnapLogic, Innovation Day, 2018. Brought to you by SnapLogic. >> Hey welcome back everybody, Jeff Frick here with theCUBE. We are in San Mateo, at what they call the crossroads, it's 92 and 101. If you're coming by and probably sitting in a traffic, look up and you'll see SnapLogic. It's their new offices. We're really excited to be here for Innovation Day. We're excited to have this CTO, James Markarian. James, great to see you and I guess, we we last talked was a couple years ago in New York City. >> Yeah that's right, and why was I there? It was like a big data show. >> That's right. >> And we we are two years later talking about big data. >> Big data, big data is fading a little bit, because now big data is really an engine, that's powering this new thing that's so exciting, which is all about analytics, and machine learning, and we're going to eventually stop saying artificial intelligence and say augmented intelligence, 'cause there's really nothing artificial about it. >> Yeah and we might stop saying big data and just talk about data because it's becoming so ubiquitous. >> Jeff: Right. >> I know that big data, it's not necessarily going away but it's sort of how we're thinking about handling it is, like kind of evolved over time, especially in the last couple of years. >> Right. >> That's what we're kind of seeing from our customers. >> 'Cause there's kind of an ingredient now, right? It's no longer this new shiny object now. It's just part of the infrastructure that helps you get everything else done. >> Yeah, and I think when you think about it, from like, an enterprise point of view, that that shift is going from experimentation to operationalizing. I think that the things you look for in experimentation, there's like, one set of things here looking for proving out the overall value, regardless maybe of cost and uptime and other things and as you operationalize you start thinking about other considerations that obviously Enterprise IT has to think about. >> Right, so if you think back to like, Hadoop Summit and Hadoop World who were first cracking their teeth, like in 2010 or around that time frame, one of the big discussions that always comes up and that was before kind of the rise of public cloud, you know which has really taken off over the last several years, there's this kind of ongoing debate between, do you move the data to the compute or do you move the compute to the data? There was always like, this monster data gravity issue which was almost insurmountable and many would say, oh, you're never going to get all your data into the cloud. It's just way too hard and way too expensive. But, now Amazon has Snowball and Snowball isn't big enough. They actually had a diesel truck that'll come and help you come move your data. Amazon rolled that thing across the stage a couple of years ago. The data gravity thing seems to be less and if you think of a world with infinite compute, infinite stored, infinite networking asyndetically approaching zero, not necessarily good news for some vendors out there but that's a world that we're eventually getting to that changes the way that you organize all this stuff. >> Yeah, I think so and so much has changed. I was fortunate to be one of the early speakers, like I used to do Worlds and everything, and I was adamantly proclaiming you know, the destiny of Hadoop as bright and shiny and there's this question about what really happened. I think that there's a kind of a few different variables that kind of shifted at the same time. One, is of course, this like glut of computing in the cloud happened and there are so many variables moving at once. It's like, How much time do you have Jeff? >> Ask them to get a couple more drinks for us. >> Seeing our lovely new headquarters here and one of the things is that there is no big data center. We have a little closet with some of the servers we keep around but mostly, everything we do is on Amazon. You're even looking at things like, commercial real estate is changing because I don't need all the cooling and the power and the space for my data center that I once had. >> Jeff: Right, right. >> I become a lot more space efficient than I used to be and so the cloud is really kind of changing everything. On the data side, you mention this like, interesting philosophical shift, going from I couldn't possibly do it in the cloud to why in the world would we not do things in the cloud. Maybe the one stall word in there being some fears about security. Obviously there's been a lot of breaches. I think that there's still a lot of introspection everyone needs to do about, are my on premise systems actually more secure than some of these cloud providers? It's really not clear that we know the answer to that. In fact, we suspect that some of the cloud providers are actually more secure because they are professionals about it and they have the best practice. >> And a whole lot of money. >> The other thing that happened that you didn't mention, that's approaching infinity and we're not quite there yet, is interconnect speeds. So it used to be the case that I have a bunch of mainframes and I have a tier rating system and I have a high speed interconnect that puts the two together. Now with fiber networks and just in general, you can run super high speed, like WAN. Especially if you don't care quite as much about latency. So if 500 millisecond latency is still okay with you. >> Great. >> You can do a heck of a lot and move a lot to the cloud. In fact, it's so good, that we went from worrying, could I do this in the cloud at all to well, why wouldn't I do somethings in Amazon and some things in Microsoft and some things in Google? Even if it meant replicating my data across all these environments. The backdrop for some of that is, we had a lot of customers and I was thinking that people would approach it this way, they would install on premise Hadoop, whether it's like Apache or Cloud Air or the other vendors and I would hire a bunch of folks that are the administrators and retire terra data and I'm going to put all my ETL jobs on there, etc. It turned out to be a great theory and the practice is real for some folks but it turned out to be moving a lot of things to kind of shifting sands because Hadoop was evolving at the time. A lot of customers were putting a lot of pressure on it, operational pressure. Again, moving from experimentation phase over to like, operational phase. >> Jeff: Right, right. >> When you don't have the uptime guarantee and I can't just hire somebody off the street to administer this, it has to be a very sharp, knowledgeable person that's very expensive, people start saying, what am I really getting from this and can I just dump it all in S3 and apply a bunch of technology there and let Amazon worry about keeping this thing up and running? People start to say, I used to reject that idea and now it's sounding like a very smart idea. >> It's so funny we talk about people processing tech all the time, right? But they call them tech shows, they don't call them people in process shows. >> Right. >> At least not the ones we go to but time and time again I remember talking to some people about the Hadoop situation and there's just like, no Hadoop people. Sometimes technology all day long. There just aren't enough people with the skills to actually implement it. It's probably changed now but I remember that was such a big problem. It's funny you talk about security and cloud security. You know, at AWS, on Tuesday night of Reinvent, they have a special, kind of a technical keynote speak and like, James Hamilton would go. In the amount of resources, and I just remember one talk he gave just on their cabling across the ocean, and the amount of resources that he can bring to bear, relative to any individual company, is so different; much less a mid-tier company or a small company. I mean, you can bring so much more resources, expertise and knowledge. >> Yeah, the economy is a scale, their just there. >> They're just crazy. >> That's right and that why you know, you sort of assume that the cloud sort of, eventually eats everything. >> Right, right. >> So there's no reason to believe this won't be one of those cases. >> So you guys are getting Extreme. So what is Snaplogic Extreme? >> Well, Snaplogic Extreme is kind of like a response to this trend of data moving from on premise to the cloud and there are some interesting dynamics of that movement. First of all, you need to get data into the cloud, first of all and we've been doing that for years. Connect to everything, dump it in S3, ADLS, etc. No problem. The thing we're seeing with cloud computing is like, there's another interesting shift. Not only is it kind of like mess for less, and let Amazon manage all this, and I probably refer to Amazon more than other vendors would appreciate. >> Right, right. They're the leaders so let's call a spade a spade. >> Yeah. >> Certainly Google and Microsoft are out there as well so those are the top three and we've acknowledged that. >> One of the interesting things about it is that you couldn't really adequately achieve on premises is the burstiness of your compute. I run at a steady state where I need, you know, 10 servers or a 100 servers, but every once in a while, I need like, 1,000 or 10,000 servers to apply to something. So what's the on premise model? Rack and stack, 10,000 machines, and it's like waiting for the great pumpkin, waiting for that workload to come that I've been waiting months and months for and maybe it never comes but I've been paying for it. I paid for a software license for the thing that I need to run there. I'm paying for the cabling and the racking and everything and the person administering. Make sure the disks are all operating in the case where it gets used. Now, all of a sudden, we are taking Amazon and they're saying, hey, pay us for what you're using. You can use reserved pricing and pay a lower rate for the things you might actually care about on a consistent basis but then I'm going to allow you to spike, and I'll just run the meter. So this has caused software vendors like us, to look at the way we charge and the way that we deploy our resources and say, hey, that's a very good model. We want to follow that and so we introduced Snaplogic Extreme, which has a few different components. Basically, it enables us to operate in these elastic environments, shift our thinking in pricing so that we don't think about like, node based or god forbid, core based pricing and say like, hey, basically pay us for what you do with your data and don't worry about how many servers it's running on. Let Snaplogic worry about spinning up and spinning down these machines because a lot of these workloads are data integration or application workloads that we know lots about. >> Right. >> So first of all, we manage these ephemeral, what we call ephemeral or elastic clusters. Second of all, the way that we distribute our workload is by generating Spark code currently. We use the same graphic environment that you use for everything but instead of running on our engines, we kind of spit out Spark code on the end that takes advantage of the massive scale out potential for these ephemeral environments. >> Right. >> We've also kind of built this in such a way that it's Spark today but it could be like, Native or some other engine like Flank or other things that come up. We really don't care like what back end engine actually is as long as it can run certain types of data oriented jobs. It's actually like lots of things in one. We combine out data acquisition and distribution capability with this like, massive elastic scale out capability. >> Yeah, it's unbelievable how you can spin that up and then of course, most people forget you need to spin it down after the event. >> James: Yeah, that's right. >> We talked to a great vendor who talked about, you know, my customer spends no money with me on the weekend, zero. >> James: Right. >> And I'm thrilled because they're not using me. When they do use me, then they're buying stuff. I think what's really interesting is how that changes. Also, your relationship with your customer. If you have a recurring revenue model, you have to continue to deliver a value. You have to stay close to your customer. You have to stay engaged because it's not a one time pop and then you send them the 15% or 20% maintenance bill. It's really this ongoing relationship and they're actually gaining value from your products each and every time you use that. It's a very different way. >> Yeah, that's right. I think it creates better relationships because you feel like, what we do is unproportionate to what they do and vise versa, so it has this fundamental fairness about it, if you will. >> Right, it's a good relationship but I want to go down another path before you turn the cameras on. Talk a little bit about the race always between the need for compute and the compute. It used to be personified best with Microsoft and Intel until we come out with a new chip and then Microsoft OS would eat up all the extra capacity and then they'd come up with a new chip and it was an ongoing thing. You made an interesting comment that, especially in the cloud world where the scale of these things is much, much bigger, that ran a world now where the compute and the storage have kind of, outpaced the applications, if you will, and there's an opportunity for the application to catch up. Oh by the way, we have this cool new thing called machine learning and augmented intelligence. I wonder if you could, is that what's going to fill or kind of rebalance the consumption pattern? >> Yeah, it seems that way and I always think about kind of like, compute and software spiraling around each other like a helix. >> Like at one point, one is leading the other and they sort of just, one eventually surpasses the other and then you need innovation on the other side. I think for a while, like if you turn the clock way back to like, when the Pentium was introduced and everyone was like, how are we ever going to use all of the compute power. >> Windows 95, whoo! >> You know, power of like the Pentium. Do I really need to run my spreadsheets 100% faster? There's no business value whatsoever in transacting faster, or like general user interface or like graphical user interfaces or rendering web pages. Then you start seeing this new glut, often led by like researchers first. Like, software applications coming up that use all of this power because in academia you can start saying, what if I did have infinite compute? What would I do differently? You see things, you know like VR and advanced gaming, come up on the consumer side. Then I think the real answer on the business side is AI and ML. The general trend I start thinking of is something I used to talk about, back in the old days, which is conversion of like, having machines work for us instead of us working for machines. The only way we're ever going to get there is by having higher and higher intelligence on the application side so that it kind of intuits more based on what it's seen before and what it knows about you, etc., in terms of the task that needs to get done. Then there's this whole new breed of person that you need in order to wield all that power because like Hadoop, it's not just natural. You don't just have people floating around like, hey, you know, I'm going to be an Uzi expert or a yarn expert. You don't run into people everyday that's like, oh, yeah, I know neural nets well. I'm a gradient descent expert or whatever you're model is. It's really going to drive like, lots of changes I think. >> Right, well hopefully it does and especially like we were talking about earlier, you know, within core curriculums at schools and stuff. We were with Grace Hopper and Brenda Wilkerson, the new head of the Anita Borg organization, was at this Chicago public school district and they're actually starting to make CS a requirement, along with biology and and physics and chemistry and some of these other things. >> Right. >> So we do have a huge, a huge dearth of that but I want to just close out on one last concept before I let you go and you guys are way on top of this. Greg talked about what you just talked about, which is making the computers work for us versus the other way around. That's where the democratization of the power that we heard a lot about the democratization of big data and the tools and now you guys you guys are talking about the democratization of the integration, especially when you have a bunch of cloud based applications that everybody has access to and maybe, needs to stitch together a different way. But when you look at this whole concept of democratization of that power, how do you see that kind of playing out over the next several years? >> Yeah, that's a very big- >> Sorry I didn't bring you a couple of beer before I brought that up. >> Oh no, I got you covered. So it's a very big, interesting question because I think that you know, first of all, it's one of these, god knows, we can't predict with a lot of accuracy how exactly that's going to look because we're sort of juxtaposing two things. One is, part of the initial move to the cloud was the failure to properly democratize data inside the enterprise, for whatever reason, and we didn't do it. Now we have the computer resources and the central, kind of web based access to everything. Great. Now we have Cambridge Analytica and like, Facebook and people really thinking about data privacy and the fact that we want ubiquitous safe access. I think we know how to make things ubiquitous. The question is, do we know how to make it safe and fair so that the right people are using the right data and the right way? It's a little bit like, you know, there's all these cautionary tales out there like, beware of AI and robotics and everything and nobody really thinks about the danger of the data that's there. It's a much more immediate problem and yet it's sort of like the silent killer until some scandal comes up. We start thinking about these different ways we can tackle it. Obviously there's great solutions for tokenization and encryption and everything at the data level but even if you have the access to it, the question is, how do you control that wildfire that could happen as soon as the horse leaves the barn. Maybe not in it's current form, but when you look at things like Blockchain, there's been a lot of predictions about how Blockchain can be used around like, data. I think that this privacy and this curation and tracking of who has the data, who has access to it and can we control it, I think you are looking at even more like, centralized and guarded access to this private data. >> Great, interesting times. >> Yeah, yeah Jeff, for sure. >> Alright James, well thanks for taking a couple of minutes with us. I really enjoyed the conversation. >> Yeah, it's always great. Thanks for having me Jeff. >> It's James on Jeff and you're watching theCUBE We're at the Snaplogic headquarters in San Mateo, California and thanks for watching. (electronic music)

Published Date : May 21 2018

SUMMARY :

Brought to you by SnapLogic. James, great to see you and I guess, Yeah that's right, and why was I there? and we're going to eventually stop saying Yeah and we might stop saying big data especially in the last couple of years. that helps you get everything else done. Yeah, and I think when you think about it, from like, that changes the way that you organize all this stuff. and I was adamantly proclaiming you know, and one of the things is that there is no big data center. On the data side, you mention this like, that puts the two together. and I'm going to put all my ETL jobs on there, etc. and I can't just hire somebody off the street processing tech all the time, right? and the amount of resources that he can bring to bear, That's right and that why you know, So there's no reason to believe So you guys are getting Extreme. First of all, you need to get data into the cloud, They're the leaders so let's call a spade a spade. Certainly Google and Microsoft are out there as well so for the things you might actually care Second of all, the way that we distribute It's actually like lots of things in one. Yeah, it's unbelievable how you can spin that up you know, my customer spends no money you have to continue to deliver a value. I think it creates better relationships because you feel have kind of, outpaced the applications, if you will, Yeah, it seems that way and I always think and then you need innovation on the other side. in terms of the task that needs to get done. and they're actually starting to make CS a requirement, of the integration, especially when you have Sorry I didn't bring you a couple of beer before and fair so that the right people are using I really enjoyed the conversation. Yeah, it's always great. We're at the Snaplogic headquarters in

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JamesPERSON

0.99+

JeffPERSON

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

Jeff FrickPERSON

0.99+

James MarkarianPERSON

0.99+

James HamiltonPERSON

0.99+

GregPERSON

0.99+

GoogleORGANIZATION

0.99+

100 serversQUANTITY

0.99+

15%QUANTITY

0.99+

20%QUANTITY

0.99+

San MateoLOCATION

0.99+

2010DATE

0.99+

AWSORGANIZATION

0.99+

10 serversQUANTITY

0.99+

New York CityLOCATION

0.99+

1,000QUANTITY

0.99+

10,000 machinesQUANTITY

0.99+

Brenda WilkersonPERSON

0.99+

FacebookORGANIZATION

0.99+

SparkTITLE

0.99+

10,000 serversQUANTITY

0.99+

100%QUANTITY

0.99+

IntelORGANIZATION

0.99+

SnapLogicORGANIZATION

0.99+

Tuesday nightDATE

0.99+

San Mateo, CaliforniaLOCATION

0.99+

Windows 95TITLE

0.99+

OneQUANTITY

0.99+

500 millisecondQUANTITY

0.99+

two years laterDATE

0.98+

two thingsQUANTITY

0.98+

SnaplogicORGANIZATION

0.98+

one timeQUANTITY

0.97+

twoQUANTITY

0.97+

oneQUANTITY

0.97+

Innovation DayEVENT

0.97+

SecondQUANTITY

0.96+

Cambridge AnalyticaORGANIZATION

0.96+

ChicagoLOCATION

0.96+

S3TITLE

0.95+

FlankORGANIZATION

0.95+

FirstQUANTITY

0.94+

theCUBEORGANIZATION

0.94+

todayDATE

0.93+

Grace HopperPERSON

0.93+

firstQUANTITY

0.93+

SnapLogic Innovation Day 2018EVENT

0.92+

one pointQUANTITY

0.92+

PentiumCOMMERCIAL_ITEM

0.92+

last couple of yearsDATE

0.9+

one last conceptQUANTITY

0.9+

one talkQUANTITY

0.88+

one setQUANTITY

0.88+

zeroQUANTITY

0.87+

Snaplogic ExtremeORGANIZATION

0.85+

Anita BorgORGANIZATION

0.84+

couple years agoDATE

0.82+

couple of years agoDATE

0.81+

James Markarian, SnapLogic | SnapLogic Innovation Day 2018


 

>> Announcer: From San Mateo, California, it's theCUBE! Covering SnapLogic, Innovation Day, 2018. Brought to you by SnapLogic. >> Hey welcome back everybody, Jeff Frick here with theCUBE. We are in San Mateo, at what they call the crossroads, it's 92 and 101. If you're coming by and probably sitting in a traffic, look up and you'll see SnapLogic. It's their new offices. We're really excited to be here for Innovation Day. We're excited to have this CTO, James Markarian. James, great to see you and I guess, we we last talked was a couple years ago in New York City. >> Yeah that's right, and why was I there? It was like a big data show. >> That's right. >> And we we are two years later talking about big data. >> Big data, big data is fading a little bit, because now big data is really an engine, that's powering this new thing that's so exciting, which is all about analytics, and machine learning, and we're going to eventually stop saying artificial intelligence and say augmented intelligence, 'cause there's really nothing artificial about it. >> Yeah and we might stop saying big data and just talk about data because it's becoming so ubiquitous. >> Jeff: Right. >> I know that big data, it's not necessarily going away but it's sort of how we're thinking about handling it is, like kind of evolved over time, especially in the last couple of years. >> Right. >> That's what we're kind of seeing from our customers. >> 'Cause there's kind of an ingredient now, right? It's no longer this new shiny object now. It's just part of the infrastructure that helps you get everything else done. >> Yeah, and I think when you think about it, from like, an enterprise point of view, that that shift is going from experimentation to operationalizing. I think that the things you look for in experimentation, there's like, one set of things here looking for proving out the overall value, regardless maybe of cost and uptime and other things and as you operationalize you start thinking about other considerations that obviously Enterprise IT has to think about. >> Right, so if you think back to like, Hadoop Summit and Hadoop World who were first cracking their teeth, like in 2010 or around that time frame, one of the big discussions that always comes up and that was before kind of the rise of public cloud, you know which has really taken off over the last several years, there's this kind of ongoing debate between, do you move the data to the compute or do you move the compute to the data? There was always like, this monster data gravity issue which was almost insurmountable and many would say, oh, you're never going to get all your data into the cloud. It's just way too hard and way too expensive. But, now Amazon has Snowball and Snowball isn't big enough. They actually had a diesel truck that'll come and help you come move your data. Amazon rolled that thing across the stage a couple of years ago. The data gravity thing seems to be less and if you think of a world with infinite compute, infinite stored, infinite networking asyndetically approaching zero, not necessarily good news for some vendors out there but that's a world that we're eventually getting to that changes the way that you organize all this stuff. >> Yeah, I think so and so much has changed. I was fortunate to be one of the early speakers, like I used to do Worlds and everything, and I was adamantly proclaiming you know, the destiny of Hadoop as bright and shiny and there's this question about what really happened. I think that there's a kind of a few different variables that kind of shifted at the same time. One, is of course, this like glut of computing in the cloud happened and there are so many variables moving at once. It's like, How much time do you have Jeff? >> Ask them to get a couple more drinks for us. >> Seeing our lovely new headquarters here and one of the things is that there is no big data center. We have a little closet with some of the servers we keep around but mostly, everything we do is on Amazon. You're even looking at things like, commercial real estate is changing because I don't need all the cooling and the power and the space for my data center that I once had. >> Jeff: Right, right. >> I become a lot more space efficient than I used to be and so the cloud is really kind of changing everything. On the data side, you mention this like, interesting philosophical shift, going from I couldn't possibly do it in the cloud to why in the world would we not do things in the cloud. Maybe the one stall word in there being some fears about security. Obviously there's been a lot of breaches. I think that there's still a lot of introspection everyone needs to do about, are my on premise systems actually more secure than some of these cloud providers? It's really not clear that we know the answer to that. In fact, we suspect that some of the cloud providers are actually more secure because they are professionals about it and they have the best practice. >> And a whole lot of money. >> The other thing that happened that you didn't mention, that's approaching infinity and we're not quite there yet, is interconnect speeds. So it used to be the case that I have a bunch of mainframes and I have a tier rating system and I have a high speed interconnect that puts the two together. Now with fiber networks and just in general, you can run super high speed, like WAN. Especially if you don't care quite as much about latency. So if 500 millisecond latency is still okay with you. >> Great. >> You can do a heck of a lot and move a lot to the cloud. In fact, it's so good, that we went from worrying, could I do this in the cloud at all to well, why wouldn't I do somethings in Amazon and some things in Microsoft and some things in Google? Even if it meant replicating my data across all these environments. The backdrop for some of that is, we had a lot of customers and I was thinking that people would approach it this way, they would install on premise Hadoop, whether it's like Apache or Cloud Air or the other vendors and I would hire a bunch of folks that are the administrators and retire terra data and I'm going to put all my ETL jobs on there, etc. It turned out to be a great theory and the practice is real for some folks but it turned out to be moving a lot of things to kind of shifting sands because Hadoop was evolving at the time. A lot of customers were putting a lot of pressure on it, operational pressure. Again, moving from experimentation phase over to like, operational phase. >> Jeff: Right, right. >> When you don't have the uptime guarantee and I can't just hire somebody off the street to administer this, it has to be a very sharp, knowledgeable person that's very expensive, people start saying, what am I really getting from this and can I just dump it all in S3 and apply a bunch of technology there and let Amazon worry about keeping this thing up and running? People start to say, I used to reject that idea and now it's sounding like a very smart idea. >> It's so funny we talk about people processing tech all the time, right? But they call them tech shows, they don't call them people in process shows. >> Right. >> At least not the ones we go to but time and time again I remember talking to some people about the Hadoop situation and there's just like, no Hadoop people. Sometimes technology all day long. There just aren't enough people with the skills to actually implement it. It's probably changed now but I remember that was such a big problem. It's funny you talk about security and cloud security. You know, at AWS, on Tuesday night of Reinvent, they have a special, kind of a technical keynote speak and like, James Hamilton would go. In the amount of resources, and I just remember one talk he gave just on their cabling across the ocean, and the amount of resources that he can bring to bear, relative to any individual company, is so different; much less a mid-tier company or a small company. I mean, you can bring so much more resources, expertise and knowledge. >> Yeah, the economy is a scale, their just there. >> They're just crazy. >> That's right and that why you know, you sort of assume that the cloud sort of, eventually eats everything. >> Right, right. >> So there's no reason to believe this won't be one of those cases. >> So you guys are getting Extreme. So what is Snaplogic Extreme? >> Well, Snaplogic Extreme is kind of like a response to this trend of data moving from on premise to the cloud and there are some interesting dynamics of that movement. First of all, you need to get data into the cloud, first of all and we've been doing that for years. Connect to everything, dump it in S3, ADLS, etc. No problem. The thing we're seeing with cloud computing is like, there's another interesting shift. Not only is it kind of like mess for less, and let Amazon manage all this, and I probably refer to Amazon more than other vendors would appreciate. >> Right, right. They're the leaders so let's call a spade a spade. >> Yeah. >> Certainly Google and Microsoft are out there as well so those are the top three and we've acknowledged that. >> One of the interesting things about it is that you couldn't really adequately achieve on premises is the burstiness of your compute. I run at a steady state where I need, you know, 10 servers or a 100 servers, but every once in a while, I need like, 1,000 or 10,000 servers to apply to something. So what's the on premise model? Rack and stack, 10,000 machines, and it's like waiting for the great pumpkin, waiting for that workload to come that I've been waiting months and months for and maybe it never comes but I've been paying for it. I paid for a software license for the thing that I need to run there. I'm paying for the cabling and the racking and everything and the person administering. Make sure the disks are all operating in the case where it gets used. Now, all of a sudden, we are taking Amazon and they're saying, hey, pay us for what you're using. You can use reserved pricing and pay a lower rate for the things you might actually care about on a consistent basis but then I'm going to allow you to spike, and I'll just run the meter. So this has caused software vendors like us, to look at the way we charge and the way that we deploy our resources and say, hey, that's a very good model. We want to follow that and so we introduced Snaplogic Extreme, which has a few different components. Basically, it enables us to operate in these elastic environments, shift our thinking in pricing so that we don't think about like, node based or god forbid, core based pricing and say like, hey, basically pay us for what you do with your data and don't worry about how many servers it's running on. Let Snaplogic worry about spinning up and spinning down these machines because a lot of these workloads are data integration or application workloads that we know lots about. >> Right. >> So first of all, we manage these ephemeral, what we call ephemeral or elastic clusters. Second of all, the way that we distribute our workload is by generating Spark code currently. We use the same graphic environment that you use for everything but instead of running on our engines, we kind of spit out Spark code on the end that takes advantage of the massive scale out potential for these ephemeral environments. >> Right. >> We've also kind of built this in such a way that it's Spark today but it could be like, Native or some other engine like Flank or other things that come up. We really don't care like what back end engine actually is as long as it can run certain types of data oriented jobs. It's actually like lots of things in one. We combine out data acquisition and distribution capability with this like, massive elastic scale out capability. >> Yeah, it's unbelievable how you can spin that up and then of course, most people forget you need to spin it down after the event. >> James: Yeah, that's right. >> We talked to a great vendor who talked about, you know, my customer spends no money with me on the weekend, zero. >> James: Right. >> And I'm thrilled because they're not using me. When they do use me, then they're buying stuff. I think what's really interesting is how that changes. Also, your relationship with your customer. If you have a recurring revenue model, you have to continue to deliver a value. You have to stay close to your customer. You have to stay engaged because it's not a one time pop and then you send them the 15% or 20% maintenance bill. It's really this ongoing relationship and they're actually gaining value from your products each and every time you use that. It's a very different way. >> Yeah, that's right. I think it creates better relationships because you feel like, what we do is unproportionate to what they do and vise versa, so it has this fundamental fairness about it, if you will. >> Right, it's a good relationship but I want to go down another path before you turn the cameras on. Talk a little bit about the race always between the need for compute and the compute. It used to be personified best with Microsoft and Intel until we come out with a new chip and then Microsoft OS would eat up all the extra capacity and then they'd come up with a new chip and it was an ongoing thing. You made an interesting comment that, especially in the cloud world where the scale of these things is much, much bigger, that ran a world now where the compute and the storage have kind of, outpaced the applications, if you will, and there's an opportunity for the application to catch up. Oh by the way, we have this cool new thing called machine learning and augmented intelligence. I wonder if you could, is that what's going to fill or kind of rebalance the consumption pattern? >> Yeah, it seems that way and I always think about kind of like, compute and software spiraling around each other like a helix. >> Like at one point, one is leading the other and they sort of just, one eventually surpasses the other and then you need innovation on the other side. I think for a while, like if you turn the clock way back to like, when the Pentium was introduced and everyone was like, how are we ever going to use all of the compute power. >> Windows 95, whoo! >> You know, power of like the Pentium. Do I really need to run my spreadsheets 100% faster? There's no business value whatsoever in transacting faster, or like general user interface or like graphical user interfaces or rendering web pages. Then you start seeing this new glut, often led by like researchers first. Like, software applications coming up that use all of this power because in academia you can start saying, what if I did have infinite compute? What would I do differently? You see things, you know like VR and advanced gaming, come up on the consumer side. Then I think the real answer on the business side is AI and ML. The general trend I start thinking of is something I used to talk about, back in the old days, which is conversion of like, having machines work for us instead of us working for machines. The only way we're ever going to get there is by having higher and higher intelligence on the application side so that it kind of intuits more based on what it's seen before and what it knows about you, etc., in terms of the task that needs to get done. Then there's this whole new breed of person that you need in order to wield all that power because like Hadoop, it's not just natural. You don't just have people floating around like, hey, you know, I'm going to be an Uzi expert or a yarn expert. You don't run into people everyday that's like, oh, yeah, I know neural nets well. I'm a gradient descent expert or whatever you're model is. It's really going to drive like, lots of changes I think. >> Right, well hopefully it does and especially like we were talking about earlier, you know, within core curriculums at schools and stuff. We were with Grace Hopper and Brenda Wilkerson, the new head of the Anita Borg organization, was at this Chicago public school district and they're actually starting to make CS a requirement, along with biology and and physics and chemistry and some of these other things. >> Right. >> So we do have a huge, a huge dearth of that but I want to just close out on one last concept before I let you go and you guys are way on top of this. Greg talked about what you just talked about, which is making the computers work for us versus the other way around. That's where the democratization of the power that we heard a lot about the democratization of big data and the tools and now you guys you guys are talking about the democratization of the integration, especially when you have a bunch of cloud based applications that everybody has access to and maybe, needs to stitch together a different way. But when you look at this whole concept of democratization of that power, how do you see that kind of playing out over the next several years? >> Yeah, that's a very big- >> Sorry I didn't bring you a couple of beer before I brought that up. >> Oh no, I got you covered. So it's a very big, interesting question because I think that you know, first of all, it's one of these, god knows, we can't predict with a lot of accuracy how exactly that's going to look because we're sort of juxtaposing two things. One is, part of the initial move to the cloud was the failure to properly democratize data inside the enterprise, for whatever reason, and we didn't do it. Now we have the computer resources and the central, kind of web based access to everything. Great. Now we have Cambridge Analytica and like, Facebook and people really thinking about data privacy and the fact that we want ubiquitous safe access. I think we know how to make things ubiquitous. The question is, do we know how to make it safe and fair so that the right people are using the right data and the right way? It's a little bit like, you know, there's all these cautionary tales out there like, beware of AI and robotics and everything and nobody really thinks about the danger of the data that's there. It's a much more immediate problem and yet it's sort of like the silent killer until some scandal comes up. We start thinking about these different ways we can tackle it. Obviously there's great solutions for tokenization and encryption and everything at the data level but even if you have the access to it, the question is, how do you control that wildfire that could happen as soon as the horse leaves the barn. Maybe not in it's current form, but when you look at things like Blockchain, there's been a lot of predictions about how Blockchain can be used around like, data. I think that this privacy and this curation and tracking of who has the data, who has access to it and can we control it, I think you are looking at even more like, centralized and guarded access to this private data. >> Great, interesting times. >> Yeah, yeah Jeff, for sure. >> Alright James, well thanks for taking a couple of minutes with us. I really enjoyed the conversation. >> Yeah, it's always great. Thanks for having me Jeff. >> It's James on Jeff and you're watching theCUBE We're at the Snaplogic headquarters in San Mateo, California and thanks for watching. (electronic music)

Published Date : May 19 2018

SUMMARY :

Brought to you by SnapLogic. James, great to see you and I guess, Yeah that's right, and why was I there? And we we are two years and we're going to eventually stop saying Yeah and we might stop saying big data especially in the last couple of years. That's what we're kind of It's just part of the infrastructure Yeah, and I think when you and if you think of a world and I was adamantly proclaiming you know, Ask them to get a and one of the things is that and so the cloud is really that puts the two together. and move a lot to the cloud. and apply a bunch of technology there processing tech all the time, right? and the amount of resources Yeah, the economy is a That's right and that why you know, So there's no reason to believe So you guys are getting Extreme. and I probably refer to Amazon They're the leaders so Certainly Google and Microsoft for the things you might actually care Second of all, the way that we distribute It's actually like lots of things in one. you need to spin it down after the event. you know, my customer spends no money you have to continue to deliver a value. about it, if you will. the application to catch up. and software spiraling and then you need innovation person that you need in the new head of the big data and the tools and now you guys you a couple of beer before and fair so that the I really enjoyed the conversation. Yeah, it's always great. We're at the Snaplogic headquarters in

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JamesPERSON

0.99+

JeffPERSON

0.99+

AmazonORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

Jeff FrickPERSON

0.99+

James MarkarianPERSON

0.99+

James HamiltonPERSON

0.99+

GregPERSON

0.99+

GoogleORGANIZATION

0.99+

100 serversQUANTITY

0.99+

15%QUANTITY

0.99+

20%QUANTITY

0.99+

San MateoLOCATION

0.99+

2010DATE

0.99+

AWSORGANIZATION

0.99+

10 serversQUANTITY

0.99+

New York CityLOCATION

0.99+

1,000QUANTITY

0.99+

10,000 machinesQUANTITY

0.99+

Brenda WilkersonPERSON

0.99+

FacebookORGANIZATION

0.99+

SparkTITLE

0.99+

10,000 serversQUANTITY

0.99+

100%QUANTITY

0.99+

IntelORGANIZATION

0.99+

SnapLogicORGANIZATION

0.99+

Tuesday nightDATE

0.99+

San Mateo, CaliforniaLOCATION

0.99+

Windows 95TITLE

0.99+

OneQUANTITY

0.99+

San Mateo, CaliforniaLOCATION

0.99+

500 millisecondQUANTITY

0.99+

two years laterDATE

0.98+

two thingsQUANTITY

0.98+

SnaplogicORGANIZATION

0.98+

one timeQUANTITY

0.97+

twoQUANTITY

0.97+

oneQUANTITY

0.97+

Innovation DayEVENT

0.97+

SecondQUANTITY

0.96+

Cambridge AnalyticaORGANIZATION

0.96+

ChicagoLOCATION

0.96+

S3TITLE

0.95+

FlankORGANIZATION

0.95+

FirstQUANTITY

0.94+

theCUBEORGANIZATION

0.94+

todayDATE

0.93+

Grace HopperPERSON

0.93+

firstQUANTITY

0.93+

SnapLogic Innovation Day 2018EVENT

0.92+

one pointQUANTITY

0.92+

PentiumCOMMERCIAL_ITEM

0.92+

last couple of yearsDATE

0.9+

one last conceptQUANTITY

0.9+

one talkQUANTITY

0.88+

one setQUANTITY

0.88+

zeroQUANTITY

0.87+

Snaplogic ExtremeORGANIZATION

0.85+

Anita BorgORGANIZATION

0.84+

couple years agoDATE

0.82+

couple of years agoDATE

0.81+

Kickoff Day One | Big Data SV 2018


 

>> Speaker: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley. Brought to you by SiliconANGLE Media and its eco-system partners. (soothing electronic music) >> Good morning everybody, and welcome to Big Data SV. My name is Dave Vellante, and this is our 10th big data event, we started in New York City, we've done five now, and this'll be our fifth in Silicon Valley, we've done five in New York City. And we started SiliconANGLE and Wikibon started covering the Big Data space in 2010, we did our first Hadoop World, which was actually the second Hadoop World in New York City. In 2011, we put out the industry's first big data report, and it caught the industry by fire, it was the hot topic. The concept of Hadoop was profound in that the idea was to take five megabytes of code and bring it to a petabyte of data, metaphorically if you will. Because moving data around was so problematic, and that concept really took hold. We asked questions at the time. Who will be the Red Hat of big data? Is this going to be a winner-take-all market? Will this trend, this big data trend, solve the problems that decision support, and business intelligence couldn't solve? We're going to talk about that today, and throughout the week. We've just released Wikibon's big data market study, and big data market shares, and key findings, I'm here with Peter Burris, who heads up the Wikibon research organization, and George Gilbert, who leads our big data research, gentleman, welcome to theCUBE. >> Hi Dave. >> Good to see you guys. >> Good to be here. >> So, we have this open source marketplace, it's been plagued by complexity, competition, the cloud really changed things. Peter, you've been studying this for a while, you just dropped that awesome report on Wikibon.com, what did you find? What were the key trends that you saw in that report? Lay it out for us. >> Well the most important trend is that users are starting drive what happens in the big data universe. For many years, it was the individuals that were primarily responsible for creating a lot of these open source tools, and in the process of creating these open source tools, they solved each other's problems, as opposed to solving user problems. Users then found themselves, or in a process found themselves, building out clusters, deploying Hadoop, really focusing a lot on the infrastructure, which had its pluses and minuses. But what we see happening in the marketplace today really is an emphasis on bifurcation, in the big data space, where we're seeing a continuing focus on the infrastructure elements, and we'll spend a fair amount of time talking about what that means from a hardware database and related technology standpoint, and then, a much more focused, based on user and enterprise experience, of how to turn this into applications that actually have a consequential impact on the business on machine learning, AI, how the pipelines work, how the personnel work, integrating business change and the way business thinks about the role that data's going to play, and that bifurcation is going to carry forward over the next few years, as we gain more experience, and the entire industry is going to go through a process of restructuring itself to serve both sides of those needs. >> Great, so George, I want to ask you, so this is not a winner-take-all market, there is no Red Hat of big data, it certainly is not Cloudera, you know, Hortonworks kind of threw a wrench maybe into some of those plans, and tryin' to play the long game with the pure open source play. The return on investment of big data oftentimes turned out to be a reduction in the denominator, a reduction of investment, if you will. Lowering spending relative to traditional data warehouses. I ask you, you've been following this business for a long time, did the big data promise fail to live up to expectations? >> (laughs) There are multiple layers to that question, and to the answer. I would say that let's offload some data warehousing, processing, was the application that IT could attack to justify their experimentation with big data technologies, which remain notoriously complicated to provision and to manage on PRIM. But as Peter was saying, to get sort of more value out of this investment, we're sort of now bumping up against the complexity of all the data science pipelines, whereas before we were bumping up the complexity of administering these Hadoop clusters, so no we've got the data there, it's kind of hard to manage, but now we have to sort of learn how to apply that using much more sophisticated techniques. It's interesting that you say denominator shrinks, because the cost of operation as you move to the cloud, there are many more options, and they're managed much better, so that cost comes down as people have more cloud options. The last point I would make is I do think packaged applications, whether they're from the big guys, or a lot of vertically focused, or even semi-custom apps from folks like IBM, or Accenture, those are going to be what drives mainstream deployment, to reach hundreds of millions of users of this technology. >> So I would just observe that, in my view, this whole big data trend wasn't a failure, we observed early on that the folks that were going to make the most money in big data were the practitioners, not the vendors. So we made a correct call there. In many respects I look at this as, you know when you paint, you got to prep. I feel like that last eight years has been the preparatory phases, you know, scraping, and getting things ready, getting your house in order, and now Peter, we're setting up for the digital business era, and the digital business era is about data, it's about applying machine intelligence, it's certainly taking advantage of cloud economics. Do you buy that premise? That we're now in a position to actually, many companies anyway, or some companies, to affect digital transformation? >> Well, the whole concept of digital transformation starts with the idea of data, and our observation, here at theCUBE and Wikibon, ultimately, is that the difference between a business and a digital business is, a digital business uses data as an asset, and that has an enormous implications, on operations, how you engage customers, how you institutionalize work, what your relationships are with technology companies, et cetera. But that core concept of using your data differently, and creating value, is absolutely essential, to this notion of big data and all the various things that we're talking about, because big data is the process by which you create business value out of data, that's ultimately what we're trying to do with all this stuff. So, to George's point, if we think about where we've been, and where we're going, in many respects, fundamentally, we're just kind of following almost a normal adoption process. So if we go back 10 years, to Yahoo, Google, and some of the tech companies that initiated a lot of this motion, they had very specific types of problems that they wanted to solve, they had enormous volumes of data that they wanted to use to solve their problem, and they created technology to do so. Where we kind of get hung up is in the diffusion out of those relatively, certainly very challenging, and very rich set of problems, that Facebook, and Yahoo, and everybody else had, as they try to diffuse that technology into other industries, we got caught up in the bumps. We had more failures, and we didn't get the returns we wanted. So, now what's happening is a lot of that domain expertise is coming back in, we're startin' to say, "Now we know "how to solve the problem, we have an approach "to how we're going to solve the problem," and the technology's being snapped into place to solve problems, as opposed to technology being snapped into place, or solve business problems, as opposed to technology being snapped into place to solve the technology problems of big data. >> So we're here talking to Peter Burris and George Gilbert, two analysts at Wikibon, we're here at the Forager, in San Jose, it's at 420 1st Street, and theCUBE has a week long, 1/2 a week long anyway, set of activities going on, we've got an event going on this evening, I think it starts at six o'clock, so come by, we got a breakfast briefing tomorrow, where the Wikibon analysts are laying out their recent market studies, we just dropped two market studies on Wikibon, one is the overall market size, and the other goes into market shares. I want to touch on those briefly. We're lookin' at about a 35 billion dollar market, growing to 100 billion over the next 10 years. As we observed early on, open source software had an effect where, most businesses, most industries start off, software's a big component of it, because of open source, the software revenues were muted in this business, but they're really starting to pick up now, it was a heavily services-oriented business, and still is, about 40%, right? And then software comprises about 30%, and hardware about 29%. You guys see that changing over time, correct? >> Well yeah, and in many respects, again, this is following almost a natural evolution, that's made more interesting by the fact that these are very complex problems, and new types of business problems, but, certainly George has done a lot of research on this, ultimately, what every company that operates in this space should be thinking about is, how is the industry, in aggregate, going to get to 100, to 200 million users in the next decade. Where a user is not someone who's playing with the data, or looking at Tableau, but a user is fundamentally someone who's using an application, or making a decision that's informed by data, that's made possible by these tools. And that's not something that's going to happen at a very, very low, hardware, cluster, database, level. It's going to happen elsewhere, and one of the big trends we see is, that there's going to be a lot of new packaged applications entering into the marketplace, that consume these tools, and make them viable for business to actually use. >> Well George, in 2012, Mike Olson declared it the year of the big data applications, that never happened. The action in software has been around database and software infrastructure, but what do you see in terms of the evolution of that software business? >> Well, continuing on the theme of the bifurcation, it was interesting to hear Peter talk about how the infrastructure that the big tech companies, and internet companies developed as a byproduct of building their own services, that stuff didn't work for mainstream, it didn't even work for most of the sophisticated enterprises, on the infrastructure side, what we're doing now is, we're seeing a convergence, where we're putting those pieces together in a way where they fit easily together enough so admins, mere admin, mortal admins and developers can work with them-- >> With cloud being the ultimate convergence. >> Yes, yes. And I would also say then it's the applications will really take it mainstream. Because even when we fit the platform stuff together, it's not going to be enough to go mainstream. >> Okay, and we got to wrap, but I just wanted to touch on some of the market share stuff that you guys just produced, and we'll be presenting this data tomorrow morning, Thursday morning here at the Forager, it's 420 1st Street, in San Jose. Not surprisingly, IBM came out as the leader, because of the large services component, they got about 8% of that-- >> Well, they play in all parts. >> They play in all, but services they dominate. So IBM, Splunk, actually, who never used the term big data during their ascendancy, they didn't tie into that meme, but they are a big data company-- >> And an example of a packaged application company leading a-- >> Both-- >> Absolutely. >> Both, the platform and app. >> And apps, right. Dell, Oracle, and now if you look at this, that's the overall, if you look at the software top 10, Splunk comes out on top, then Oracle, then IBM, and we'll be getting into that tomorrow morning at the breakfast, Peter Burris, George Gilbert, thanks so much for setting this up, that's for watching, we've got wall-to-wall coverage here, this is day one, Big Data SV. From San Jose, you're watching theCUBE. We'll be right back. (soothing electronic music)

Published Date : Mar 7 2018

SUMMARY :

Brought to you by SiliconANGLE Media and it caught the industry by fire, it was the hot topic. the cloud really changed things. and in the process of creating these open source tools, fail to live up to expectations? and to the answer. and the digital business era is about data, and all the various things that we're talking about, and the other goes into market shares. and one of the big trends we see is, and software infrastructure, but what do you see it's not going to be enough to go mainstream. some of the market share stuff that you guys just produced, they play in all parts. but they are a big data company-- that's the overall, if you look at the software top 10,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

OracleORGANIZATION

0.99+

IBMORGANIZATION

0.99+

GeorgePERSON

0.99+

YahooORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Peter BurrisPERSON

0.99+

Mike OlsonPERSON

0.99+

FacebookORGANIZATION

0.99+

DellORGANIZATION

0.99+

2012DATE

0.99+

George GilbertPERSON

0.99+

PeterPERSON

0.99+

New York CityLOCATION

0.99+

WikibonORGANIZATION

0.99+

San JoseLOCATION

0.99+

2011DATE

0.99+

SiliconANGLEORGANIZATION

0.99+

tomorrow morningDATE

0.99+

SplunkORGANIZATION

0.99+

420 1st StreetLOCATION

0.99+

100 billionQUANTITY

0.99+

2010DATE

0.99+

Silicon ValleyLOCATION

0.99+

fiveQUANTITY

0.99+

100QUANTITY

0.99+

BothQUANTITY

0.99+

tomorrowDATE

0.99+

AccentureORGANIZATION

0.99+

Thursday morningDATE

0.99+

HortonworksORGANIZATION

0.99+

fifthQUANTITY

0.99+

about 29%QUANTITY

0.99+

six o'clockDATE

0.99+

secondQUANTITY

0.99+

about 30%QUANTITY

0.98+

next decadeDATE

0.98+

two analystsQUANTITY

0.98+

ForagerORGANIZATION

0.98+

Day OneQUANTITY

0.97+

oneQUANTITY

0.97+

SiliconANGLE MediaORGANIZATION

0.97+

firstQUANTITY

0.97+

about 40%QUANTITY

0.97+

Wikibon.comORGANIZATION

0.97+

TableauTITLE

0.97+

1/2 a weekQUANTITY

0.96+

both sidesQUANTITY

0.96+

Hadoop WorldEVENT

0.96+

a weekQUANTITY

0.96+

10 yearsQUANTITY

0.96+

theCUBEORGANIZATION

0.96+

todayDATE

0.96+

about 8%QUANTITY

0.95+

200 million usersQUANTITY

0.93+

ClouderaORGANIZATION

0.92+

35 billion dollarQUANTITY

0.92+

this eveningDATE

0.91+

two market studiesQUANTITY

0.9+

five megabytes of codeQUANTITY

0.89+

Red HatORGANIZATION

0.88+

hundreds of millionsQUANTITY

0.87+

SVTITLE

0.85+

Big Data SV 2018EVENT

0.85+

Nenshad Bardoliwalla & Stephanie McReynolds | BigData NYC 2017


 

>> Live from midtown Manhattan, it's theCUBE covering Big Data New York City 2017. Brought to you by Silicon Angle Media and its ecosystem sponsors. (upbeat techno music) >> Welcome back, everyone. Live here in New York, Day Three coverage, winding down for three days of wall to wall coverage theCUBE covering Big Data NYC in conjunction with Strata Data, formerly Strata Hadoop and Hadoop World, all part of the Big Data ecosystem. Our next guest is Nenshad Bardoliwalla Co-Founder and Chief Product Officer of Paxata, hot start up in the space. A lot of kudos. Of course, they launched on theCUBE in 2013 three years ago when we started theCUBE as a separate event from O'Reilly. So, great to see the success. And Stephanie McReynolds, you've been on multiple times, VP of Marketing at Alation. Welcome back, good to see you guys. >> Thank you. >> Happy to be here. >> So, winding down, so great kind of wrap-up segment here in addition to the partnership that you guys have. So, let's first talk about before we get to the wrap-up of the show and kind of bring together the week here and kind of summarize everything. Tell about your partnership you guys have. Paxata, you guys have been doing extremely well. Congratulations. Prakash was talking on theCUBE. Great success. You guys worked hard for it. I'm happy for you. But partnering is everything. Ecosystem is everything. Alation, their collaboration with data. That's there ethos. They're very user-centric. >> Nenshad: Yes. >> From the founders. Seemed like a good fit. What's the deal? >> It's a very natural fit between the two companies. When we started down the path of building new information management capabilities it became very clear that the market had strong need for both finding data, right? What do I actually have? I need an inventory, especially if my data's in Amazon S3, my data is in Azure Blob storage, my data is on-premise in HDFS, my data is in databases, it's all over the place. And I need to be able to find it. And then once I find it, I want to be able to prepare it. And so, one of the things that really drove this partnership was the very common interests that both companies have. And number one, pushing user experience. I love the Alation product. It's very easy to use, it's very intuitive, really it's a delightful thing to work with. And at the same time they also share our interests in working in these hybrid multicloud environments. So, what we've done and what we announced here at Strata is actually this bi-directional integration between the products. You can start in Alation and find a data set that you want to work with, see what collaboration or notes or business metadata people have created and then say, I want to go see this in Paxata. And in a single click you can then actually open it up in Paxata and profile that data. Vice versa you can also be in Paxata and prepare data, and then with a single click push it back, and then everybody who works with Alation actually now has knowledge of where that data is. So, it's a really nice synergy. >> So, you pushed the user data back to Alation, cause that's what they care a lot about, the cataloging and making the user-centric view work. So, you provide, it's almost a flow back and forth. It's a handshake if you will to data. Am I getting that right? >> Yeah, I mean, the idea's to keep the analyst or the user of that data, data scientist, even in some cases a business user, keep them in the flow of their work as much as possible. But give them the advantage of understanding what others in the organization have done with that data prior and allow them to transform it, and then share that knowledge back with the rest of the community that might be working with that data. >> John: So, give me an example. I like your Excel spreadsheet concept cause that's obvious. People know what Excel spreadsheet is so. So, it's Excel-like. That's an easy TAM to go after. All Microsoft users might not get that Azure thing. But this one, just take me through a usecase. >> So, I've got a good example. >> Okay, take me through. >> It's very common in a data lake for your data to be compressed. And when data's compressed, to a user it looks like a black box. So, if the data is compressed in Avro or Parquet or it's even like JSON format. A business user has no idea what's in that file. >> John: Yeah. >> So, what we do is we find the file for them. It may have some comments on that file of how that data's been used in past projects that we infer from looking at how others have used that data in Alation. >> John: So, you put metadata around it. >> We put a whole bunch of metadata around it. It might be comments that people have made. It might be >> Annotations, yeah. >> actual observations, annotations. And the great thing that we can do with Paxata is open that Avro file or Parquet file, open it up so that you can actually see the data elements themselves. So, all of a sudden, the business user has access without having to use a command line utility or understand anything about compression, and how you open that file up-- >> John: So, as Paxata spitting out there nuggets of value back to you, you're kind of understanding it, translating it to the user. And they get to do their thing, you get to do your thing, right? >> It's making a Avro or a Parquet file as easy to use as Excel, basically. Which is great, right? >> It's awesome. >> Now, you've enabled >> a whole new class of people who can use that. >> Well, and people just >> Get turned off when it's anything like jargon, or like, "What is that? I'm afraid it's phishing. Click on that and oh!" >> Well, the scary thing is that in a data lake environment, in a lot of cases people don't even label the files with extensions. They're just files. (Stephanie laughs) So, what started-- >> It's like getting your pictures like DS, JPEG. It's like what? >> Exactly. >> Right. >> So, you're talking about unlabeled-- >> If you looked on your laptop, and if you didn't have JPEG or DOC or PPT. Okay, I don't know that this file is. Well, what you have in the data lake environment is that you have thousands of these files that people don't really know what they are. And so, with Alation we have the ability to get all the value around the curation of the metadata, and how people are using that data. But then somebody says, "Okay, but I understand that this file exists. What's in it?" And then with Click to Profile from Alation you're immediately taken into Paxata. And now you're actually looking at what's in that file. So, you can very quickly go from this looks interesting to let me understand what's inside of it. And that's very powerful. >> Talk about Alation. Cause I had the CEO on, also their lead investor Greg Sands from Costanoa Ventures. They're a pretty amazing team but it's kind of out there. No offense, it's kind of a compliment actually. (Stephanie laughs) >> They got a symbolic >> Stephanie: Keep going. system Stanford guy, who's like super-smart. >> Nenshad: Yeah. >> They're on something that's really unique but it's almost too simple to be. Like, wait a minute! Google for the data, it's an awesome opportunity. How do you describe Alation to people who say, "Hey, what's this Alation thing?" >> Yeah, so I think that the best way to describe it is it's the browser for all of the distributed data in the enterprise. Sorry, so it's both the catalog, and the browser that sits on top of it. It sounds very simple. Conceptually it's very simple but they have a lot of richness in what they're able to do behind the scenes in terms of introspecting what type of work people are doing with data, and then taking that knowledge and actually surfacing it to the end user. So, for example, they have very powerful scenarios where they can watch what people are doing in different data sources, and then based on that information actually bubble up how queries are being used or the different patterns that people are doing to consume data with. So, what we find really exciting is that this is something that is very complex under the covers. Which Paxata is as well being built upon Spark. But they have put in the hard engineering work so that it looks simple to the end user. And that's the exact same thing that we've tried to do. >> And that's the hard problem. Okay, Stephanie back ... That was a great example by the way. Can't wait to have our little analyst breakdown of the event. But back to Alation for you. So, how do you talk about, you've been VP of Marketing of Alation. But you've been around the block. You know B2B, tech, big data. So, you've seen a bunch of different, you've worked at Trifacta, you worked at other companies, and you've seen a lot of waves of innovation come. What's different about Alation that people might not know about? How do you describe the difference? Because it sounds easy, "Oh, it's a browser! It's a catalog!" But it's really hard. Is it the tech that's the secret? Is it the approach? How do you describe the value of Alation? I think what's interesting about Alation is that we're solving a problem that since the dawn of the data warehouse has not been solved. And that is how to help end users really find and understand the data that they need to do their jobs. A lot of our customers talk about this-- >> John: Hold on. Repeat that. Cause that's like a key thing. What problem hasn't been solved since the data warehouse? >> To be able to actually find and fully understand, understand to the point of trust the data that you want to use for your analysis. And so, in the world of-- >> John: That sounds so simple. >> Stephanie: In the world of data warehousing-- >> John: Why is it so hard? >> Well, because in the world of data warehousing business people were told what data they should use. Someone in IT decided how to model the data, came up with a KPR calculation, and told you as a business person, you as a CEO, this is how you're going to monitor you business. >> John: Yeah. >> What business person >> Wants to be told that by an IT guy, right? >> Well, it was bounded by IT. >> Right. >> Expression and discovery >> Should be unbounded. Machine learning can take care of a lot of bounded stuff. I get that. But like, when you start to get into the discovery side of it, it should be free. >> Well, no offense to the IT team, but they were doing their best to try to figure out how to make this technology work. >> Well, just look at the cost of goods sold for storage. I mean, how many EMC drives? Expensive! IT was not cheap. >> Right. >> Not even 10, 15, 20 years ago. >> So, now when we have more self-service access to data, and we can have more exploratory analysis. What data science really introduced and Hadoop introduced was this ability on-demand to be able to create these structures, you have this more iterative world of how you can discover and explore datasets to come to an insight. The only challenge is, without simplifying that process, a business person is still lost, right? >> John: Yeah. >> Still lost in the data. >> So, we simply call that a catalog. But a catalog is much more-- >> Index, catalog, anthology, there's other words for it, right? >> Yeah, but I think it's interesting because like a concept of a catalog is an inventory has been around forever in this space. But the concept of a catalog that learns from other's behavior with that data, this concept of Behavior I/O that Aaron talked about earlier today. The fact that behavior of how people query data as an input and that input then informs a recommendation as an output is very powerful. And that's where all the machine learning and A.I. comes to work. It's hidden underneath that concept of Behavior I/O but that's there real innovation that drives this rich catalog is how can we make active recommendations to a business person who doesn't have to understand the technology but they know how to apply that data to making a decision. >> Yeah, that's key. Behavior and textual information has always been the two fly wheels in analysis whether you're talking search engine or data in general. And I think what I like about the trends here at Big Data NYC this weekend. We've certainly been seeing it at the hundreds of CUBE events we've gone to over the past 12 months and more is that people are using data differently. Not only say differently, there's baselining, foundational things you got to do. But the real innovators have a twist on it that give them an advantage. They see how they can use data. And the trend is collective intelligence of the customer seems to be big. You guys are doing it. You're seeing patterns. You're automating the data. So, it seems to be this fly wheel of some data, get some collective data. What's your thoughts and reactions. Are people getting it? Is this by people doing it by accident on purpose kind of thing? Did people just fell on their head? Or you see, "Oh, I just backed into this?" >> I think that the companies that have emerged as the leaders in the last 15 or 20 years, Google being a great example, Amazon being a great example. These are companies whose entire business models were based on data. They've generated out-sized returns. They are the leaders on the stock market. And I think that many companies have awoken to the fact that data as a monetizable asset to be turned into information either for analysis, to be turned into information for generating new products that can then be resold on the market. The leading edge companies have figured that out, and our adopting technologies like Alation, like Paxata, to get a competitive advantage in the business processes where they know they can make a difference inside of the enterprise. So, I don't think it's a fluke at all. I think that most of these companies are being forced to go down that path because they have been shown the way in terms of the digital giants that are currently ruling the enterprise tech world. >> All right, what's your thoughts on the week this week so far on the big trends? What are obvious, obviously A.I., don't need to talk about A.I., but what were the big things that came out of it? And what surprised you that didn't come out from a trends standpoint buzz here at Strata Data and Big Data NYC? What were the big themes that you saw emerge and didn't emerge what was the surprise? Any surprises? >> Basically, we're seeing in general the maturation of the market finally. People are finally realizing that, hey, it's not just about cool technology. It's not about what distribution or package. It's about can you actually drive return on investment? Can you actually drive insights and results from the stack? And so, even the technologists that we were talking with today throughout the course of the show are starting to talk about it's that last mile of making the humans more intelligent about navigating this data, where all the breakthroughs are going to happen. Even in places like IOT, where you think about a lot of automation, and you think about a lot of capability to use deep learning to maybe make some decisions. There's still a lot of human training that goes into that decision-making process and having agency at the edge. And so I think this acknowledgement that there should be balance between human input and what the technology can do is a nice breakthrough that's going to help us get to the next level. >> What's missing? What do you see that people missed that is super-important, that wasn't talked much about? Is there anything that jumps out at you? I'll let you think about it. Nenshad, you have something now. >> Yeah, I would say I completely agree with what Stephanie said which we are seeing the market mature. >> John: Yeah. >> And there is a compelling force to now justify business value for all the investments people have made. The science experiment phase of the big data world is over. People now have to show a return on that investment. I think that being said though, this is my sort of way of being a little more provocative. I still think there's way too much emphasis on data science and not enough emphasis on the average business analyst who's doing work in the Fortune 500. >> It should be kind of the same thing. I mean, with data science you're just more of an advanced analyst maybe. >> Right. But the idea that every person who works with data is suddenly going to understand different types of machine learning models, and what's the right way to do hyper parameter tuning, and other words that I could throw at you to show that I'm smart. (laughter) >> You guys have a vision with the Excel thing. I could see how you see that perspective because you see a future. I just think we're not there yet because I think the data scientists are still handcuffed and hamstrung by the fact that they're doing too much provisioning work, right? >> Yeah. >> To you're point about >> surfacing the insights, it's like the data scientists, "Oh, you own it now!" They become the sysadmin, if you will, for their department. And it's like it's not their job. >> Well, we need to get them out of data preparation, right? >> Yeah, get out of that. >> You shouldn't be a data scientist-- >> Right now, you have two values. You've got the use interface value, which I love, but you guys do the automation. So, I think we're getting there. I see where you're coming from, but still those data sciences have to set the tone for the generation, right? So, it's kind of like you got to get those guys productive. >> And it's not a .. Please go ahead. >> I mean, it's somewhat interesting if you look at can the data scientist start to collaborate a little bit more with the common business person? You start to think about it as a little bit of scientific inquiry process. >> John: Yeah. >> Right? >> If you can have more innovators around the table in a common place to discuss what are the insights in this data, and people are bringing business perspective together with machine learning perspective, or the knowledge of the higher algorithms, then maybe you can bring those next leaps forward. >> Great insight. If you want my observations, I use the crazy analogy. Here's my crazy analogy. Years it's been about the engine Model T, the car, the horse and buggy, you know? Now, "We got an engine in the car!" And they got wheels, it's got a chassis. And so, it's about the apparatus of the car. And then it evolved to, "Hey, this thing actually drives. It's transportation." You can actually go from A to B faster than the other guys, and people still think there's a horse and buggy market out there. So, they got to go to that. But now people are crashing. Now, there's an art to driving the car. >> Right. >> So, whether you're a sports car or whatever, this is where the value piece I think hits home is that, people are driving the data now. They're driving the value proposition. So, I think that, to me, the big surprise here is how people aren't getting into the hype cycle. They like the hype in terms of lead gen, and A.I., but they're too busy for the hype. It's like, drive the value. This is not just B.S. either, outcomes. It's like, "I'm busy. I got security. I got app development." >> And I think they're getting smarter about how their valuing data. We're starting to see some economic models, and some ways of putting actual numbers on what impact is this data having today. We do a lot of usage analysis with our customers, and looking at they have a goal to distribute data across more of the organization, and really get people using it in a self-service manner. And from that, you're being able to calculate what actually is the impact. We're not just storing this for insurance policy reasons. >> Yeah, yeah. >> And this cheap-- >> John: It's not some POC. Don't do a POC. All right, so we're going to end the day and the segment on you guys having the last word. I want to phrase it this way. Share an anecdotal story you've heard from a customer, or a prospective customer, that looked at your product, not the joint product but your products each, that blew you away, and that would be a good thing to leave people with. What was the coolest or nicest thing you've heard someone say about Alation and Paxata? >> For me, the coolest thing they said, "This was a social network for nerds. I finally feel like I've found my home." (laughter) >> Data nerds, okay. >> Data nerds. So, if you're a data nerd, you want to network, Alation is the place you want to be. >> So, there is like profiles? And like, you guys have a profile for everybody who comes in? >> Yeah, so the interesting thing is part of our automation, when we go and we index the data sources we also index the people that are accessing those sources. So, you kind of have a leaderboard now of data users, that contract one another in system. >> John: Ooh. >> And at eBay leader was this guy, Caleb, who was their data scientist. And Caleb was famous because everyone in the organization would ask Caleb to prepare data for them. And Caleb was like well known if you were around eBay for awhile. >> John: Yeah, he was the master of the domain. >> And then when we turned on, you know, we were indexing tables on teradata as well as their Hadoop implementation. And all of a sudden, there are table structures that are Caleb underscore cussed. Caleb underscore revenue. Caleb underscore ... We're like, "Wow!" Caleb drove a lot of teradata revenue. (Laughs) >> Awesome. >> Paxata, what was the coolest thing someone said about you in terms of being the nicest or coolest most relevant thing? >> So, something that a prospect said earlier this week is that, "I've been hearing in our personal lives about self-driving cars. But seeing your product and where you're going with it I see the path towards self-driving data." And that's really what we need to aspire towards. It's not about spending hours doing prep. It's not about spending hours doing manual inventories. It's about getting to the point that you can automate the usage to get to the outcomes that people are looking for. So, I'm looking forward to self-driving information. Nenshad, thanks so much. Stephanie from Alation. Thanks so much. Congratulations both on your success. And great to see you guys partnering. Big, big community here. And just the beginning. We see the big waves coming, so thanks for sharing perspective. >> Thank you very much. >> And your color commentary on our wrap up segment here for Big Data NYC. This is theCUBE live from New York, wrapping up great three days of coverage here in Manhattan. I'm John Furrier. Thanks for watching. See you next time. (upbeat techo music)

Published Date : Oct 3 2017

SUMMARY :

Brought to you by Silicon Angle Media and Hadoop World, all part of the Big Data ecosystem. in addition to the partnership that you guys have. What's the deal? And so, one of the things that really drove this partnership So, you pushed the user data back to Alation, Yeah, I mean, the idea's to keep the analyst That's an easy TAM to go after. So, if the data is compressed in Avro or Parquet of how that data's been used in past projects It might be comments that people have made. And the great thing that we can do with Paxata And they get to do their thing, as easy to use as Excel, basically. a whole new class of people Click on that and oh!" the files with extensions. It's like getting your pictures like DS, JPEG. is that you have thousands of these files Cause I had the CEO on, also their lead investor Stephanie: Keep going. Google for the data, it's an awesome opportunity. And that's the exact same thing that we've tried to do. And that's the hard problem. What problem hasn't been solved since the data warehouse? the data that you want to use for your analysis. Well, because in the world of data warehousing But like, when you start to get into to the IT team, but they were doing Well, just look at the cost of goods sold for storage. of how you can discover and explore datasets So, we simply call that a catalog. But the concept of a catalog that learns of the customer seems to be big. And I think that many companies have awoken to the fact And what surprised you that didn't come out And so, even the technologists What do you see that people missed the market mature. in the Fortune 500. It should be kind of the same thing. But the idea that every person and hamstrung by the fact that they're doing They become the sysadmin, if you will, So, it's kind of like you got to get those guys productive. And it's not a .. can the data scientist start to collaborate or the knowledge of the higher algorithms, the car, the horse and buggy, you know? So, I think that, to me, the big surprise here is across more of the organization, and the segment on you guys having the last word. For me, the coolest thing they said, Alation is the place you want to be. Yeah, so the interesting thing is if you were around eBay for awhile. And all of a sudden, there are table structures And great to see you guys partnering. See you next time.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
StephaniePERSON

0.99+

Stephanie McReynoldsPERSON

0.99+

Greg SandsPERSON

0.99+

JohnPERSON

0.99+

CalebPERSON

0.99+

John FurrierPERSON

0.99+

NenshadPERSON

0.99+

New YorkLOCATION

0.99+

PrakashPERSON

0.99+

AmazonORGANIZATION

0.99+

AaronPERSON

0.99+

Silicon Angle MediaORGANIZATION

0.99+

2013DATE

0.99+

thousandsQUANTITY

0.99+

Costanoa VenturesORGANIZATION

0.99+

ManhattanLOCATION

0.99+

two companiesQUANTITY

0.99+

both companiesQUANTITY

0.99+

ExcelTITLE

0.99+

TrifactaORGANIZATION

0.99+

GoogleORGANIZATION

0.99+

Strata DataORGANIZATION

0.99+

AlationORGANIZATION

0.99+

PaxataORGANIZATION

0.99+

Nenshad BardoliwallaPERSON

0.99+

eBayORGANIZATION

0.99+

three daysQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

two valuesQUANTITY

0.99+

NYCLOCATION

0.99+

hundredsQUANTITY

0.99+

Big DataORGANIZATION

0.99+

firstQUANTITY

0.99+

oneQUANTITY

0.99+

bothQUANTITY

0.99+

Strata HadoopORGANIZATION

0.99+

Hadoop WorldORGANIZATION

0.99+

earlier this weekDATE

0.98+

PaxataPERSON

0.98+

todayDATE

0.98+

Day ThreeQUANTITY

0.98+

ParquetTITLE

0.96+

three years agoDATE

0.96+

Josh Rogers, Syncsort | Big Data NYC 2017


 

>> Announcer: Live from Midtown Manhattan it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. >> Welcome back everyone live here in New York City this theCUBE's coverage of our fifth annual annual event that we put on ourselves in conjunction Strata Hadoop now called Strata Data. It's theCUBE and we're covering the scene here at Hadoop World going back to 2010, eight years of Coverage. I'm John Furrier co-host of theCUBE. Usually Dave Vellante is here but he's down covering the Splunk Conference and who was there yesterday was no other than Josh Rogers my next guest the CEO of Syncsort, you were with Dave Vellante yesterday and live on theCUBE in Washington, DC for the Splunk .conf kind of a Big Data Conference but it's a proprietary, branded event for themselves. This is a more industry even here at Big Data NYC that we put on. Welcome back glad you flew up on the on the Concord, the private jet. >> Early morning but it was was fine. >> No good to see you a CEO of Syncsort, you guys have been busy. For the folks watching in theCUBE community know that you've been on many times. The folks that are learning more about theCUBE every day, you guys had an interesting transformations as a company, take a minute to talk about where you've come from and where you are today. Certainly a ton of corporate development activity in your end it, as you guys are seeing the opportunities, you're moving on them. Take a minute to explain. >> So, you know it's been a great journey so far and there's a lot more work to do, but you know Syncsort is one of the first software companies, right. Founded in the late 60's today has a unparalleled franchise in the mainframe space. But over the last 10 years or so we branched out into open systems and delivered high performance data integration solutions. About 4 years ago really started to invest in the Big Data space we had a DNA around performance and scale we felt like that would be relevant in the Big Data space. We delivered a Hadoop focused product and today we focus around that product around helping customers ingest mainframe data assets into their into Hadoop clusters along with other types data. But a specific focus there. That has lead us into understanding a bigger market space that we call Big Iron to Big Data. And what we see in the marketplace is that customers are adapting. >> Just before you get in there I love that term, Big Iron Big Data you know I love Big Iron. Used to be a term for the mainframe for the younger generation out there. But you're really talking about you guys have leveraged experience with the installed base activity that scale call it batched, molded, single threaded, whatever you want to call it. But as you got into the game of Big Data you then saw other opportunities, did I get that right? You got into the game with some Hadoop, then you realize, whoa, I can do some large scale. What was that opportunity? >> The opportunity is that you know large enterprise is absolutely investing heavily in the next generation of analytic technologies in a new stack. Hadoop is a part of that, Spark is a part of that. And they're rapidly adopting these new infrastructures to drive deeper analytics to answer bigger questions and improve their business and in multiple dimensions. The opportunity we saw was that you know the ability for those enterprises to be able to integrate this new kind of architecture with the legacy architectures. So, the old architectures that were powering key applications impede key up producers of data was a challenge, there was multiple technology challenges, there's cultural challenges. And we had this kind of expertise on both sides of the house and and we found that to be unique in the marketplace. So we put a lot of effort into understanding, defining what are the challenges in that Big Iron to Big Data space that helped customers maximize their value out of these investments in next generation architectures. And we define the problem two ways, one is our two components. One is that people are generating more and more data more and more touch points and driving more and more transactions with their customers. And that's generating increased load on the compute environments and they want to figure out how do I run that, you know if I have a mainframe how to run as efficiently as possible contain my costs maximize availability and uptime. At the same time I've got all this new data that I can start to analyze but I got to get it from the area that it's produced into this next generation system. And there's a lot of challenges there. So we started to isolate, you know, what are the specific use cases the present customers challenge and deliver very different IT solutions. Overarching kind of messages around positioning is around solving the Big Iron to Big Data challenge. >> You guys had done some acquisitions and been successful, I want to talk a little bit about the ones that you like right now that happened the past year or two years. I think you've done five in the past two years. A couple key notable ones that set you up kind of give you pole position for some of these big markets, and then after we talk then I want to talk about your ecosystem opportunity. But some of the acquisitions and what's working for you? What's been the big deals? >> So the larger the larger we did in 2016 was a company called Trillium, leader in the data quality space. Long time leader in the data quality space and the opportunity we saw with Trillium was to complement our data movement integration capabilities. A natural complement, but to focus very specifically on how to drive value in this next generation architecture. Particularly in things like Hadoop. what I'd like to be able to do is apply best in class data quality routines directly in that environment. And so we, from our experience in delivering these Big Data solutions in the past, we knew that we could take a lot of technology and create really powerful solutions that were that leverage the native kind of capabilities of Hadoop but had it on a layer of you've proven technology for best in class day quality. Probably the biggest news of the last few weeks has been that we were acquired by a new private equity partner called Centerbridge Partners. In that acquisition actually acquired Syncsort and they acquired a company called Vision Solutions. And we've combined those organizations. >> John: When did that happen? >> The deal was announced July, early July and it closed in the middle of August. And vision solutions is a really interesting company. They're the leader in high availability for the IBM i market. IBM i was originally called AS/400 it's had a couple of different names and a dominant kind of market position. What we liked about that business was A. That market position four thousand customers generally large enterprise. And also you know market leading capability around data replication in real time. >> And we saw IBM. >> Migration data, disaster recovery kind of thing? >> It's DR it's high availability, it's migrations, it's also changed data capture actually. And leveraging all common technology elements there. But it also represents a market leading franchise in IBM i which is in many ways very similar to the mainframe. Run optimized for transactional systems, hard to kind of get at. >> Sounds like you're reconstructing the mainframe in the cloud. >> It's not so much that, it's the recognition that those compute systems still run the world. They still run all the transactions. >> Well, some say the cloud is a software mainframe. >> I think over time you'll see that, we don't see that our business today. There is a cloud aspect our business it's not to move this transactional applications running on those platforms into the cloud yet. Although I suspect that happens at some point. But our point, our interest was more these are the systems that are producing the world's data. And it's hard to to get. >> There are big, big power sources for data, they're not going anywhere. So we've got the expertise to source that data into these next generation systems. And that's a tricky problem for a lot of customers, and and not something. >> That a problem they have. And you guys basically cornered the market on that. >> So think about Big Iron and Big Data as these two components, being able to source data and make a productive using these next generation analytics systems, and also be able to run those existing systems as you know efficiently as possible. >> All right, so how do you talk to customers and I've asked this question before so I just ask again, oh, Syncsort now you got vision you guys are just a bunch of old mainframe guys. What do you know about cloud native? A lot of the hipsters and the young guns out there might not know about some of the things you're doing on the cutting edge, because even though you have the power base of these old big systems, we're just throwing off massive amounts of data that aren't going anywhere. You still are integrated into some cutting edge. Talk about that, that narrative, and how you. >> So I mean the folks that we target. >> I used cloud only as an example. Shiny, cool, new toys. >> Organizations we target and our customers and prospects, and generally we we serve large enterprise. You know large complex global enterprises. They are making significant investments in Hadoop and Splunk and these next generation environments. We approach them and say we believe to get full value out of your investments in these next generation technologies, it would be helpful if you had your most critical data assets available. And that's hard, and we can help you do that. And we can help you do that in a number of ways that you won't be able to find anywhere else. That includes features in our products, it includes experts on the ground. And what we're seeing is there's a huge demand because, you know, Hadoop is really kind of you can see it in the Cloudera and Hortonworks results and the scale of revenue. This is a you know a real foundational component data management this point. Enterprises are embracing it. If they can't solve that integration challenge between the systems that produce all the data and, you know, where they want to analyze the data There's a there's a big value gap. And we think we're uniquely positioned to be able to do that, one because we've got the technical expertise, two, they're all our customers at this point, we have six thousand customers. >> You guys have executed very well. I just got to say you guys are just slowly taking territory down you and you got a great strategy, get into a business, you don't overplay your hand or get over your skis, whatever you want to call it. And you figure it out and see if was a fit. If it is, grab it, if not, you move on. So also you guys have relationships so we're talking about your ecosystem. What is your ecosystem and what is your partner strategy? >> I'll talk a little bit about the overall strategy and I'll talk about how partners fit into that. Our strategy is to identify specific use cases that are common and challenging in our customer set, that fall within this Big Iron to Big Data umbrella. It's then to deliver a solution that is highly differentiated. Now, the third piece of that is to partner very closely with you know the emerging platform vendors in the in the Big Data space. And the reason for that is we're solving an integration challenge for them. Like Cloudera, like Hortonworks, like Splunk. We launched a relationship with Calibra in the middle the year. We just announced our relationship. >> Yeah, for them the benefits of them is they don't do the heavy lifting you've got that covered. >> We can we can solve a lot of pain points they have getting their platforms setup. >> That's hard to replicate on their end, it's not like they're going to go build it. >> Cloudera and Hortonworks, they don't have mainframe skills. They don't understand how to go access >> Classic partnering example. >> But that the other pieces is we do real engineering work with these partnerships. So we build, we write code to integrate and add value to platforms. >> It's not a Barney deal, it's not an optical deal. >> Absolutely. >> Any jazz is critical in the VM world of some of the deals he's been done in the industry referring to his deal, that's seems to be back in vogue thank God, that people going to say they're going to do a deal and they back it with actually following through. What about other partnerships, how else, how you looking at partnering? So, pretty much, where it fits in your business, are people coming to you, are you going to them? >> We certainly have people coming to us. The the key thing, the number one driver is customers. You know, as we understand use cases, as customers introduce us to new challenges that they are facing, we will not just look at how do we solve it, but and what are the other platforms that we're integrating with, and if we believe we can add unique value to that partner we'll approach that partner. >> Let's talk customers, give me some customer use cases that you're working on right now, that you think are notable worth highlighting. >> Sure so we do a lot in the in the financial services space. You know we have a number of customers >> Where there's mainframes. >> Where there's a lot of mainframes, but it's not just in financial services. Here's an interesting one, was insurance company and they were looking at how to transition their mainframe archive strategy. So they have regulations around how long they have to keep data, they had been using traditional mainframe archive technology, very expensive on annual basis and also unflexible. They didn't have access to. >> And performance too. At the end of the day don't forget performance >> They want performance, this was more of an archive use case and what they really wanted was an ability both access the data and also lower the cost of storing the data for the required time from a regulation perspective. And so they made the decision that they wanted to store it in the cloud, they want to store it in S3. There's a complicated data movement there, there's a complicated data translation process there and you need to understand the mainframe and you need to understand AWS and S3 and all those components, and we had all those pieces and all that expertise and were able to solve that. So we're doing that with a few different customers now. But that's just an example of, you know, there's a great ROI, there's a lot more business flexibility then there's a modernization aspect to it that's very attractive. >> Well, great to hear from you today. I'm glad you made it up here, again you were in DC yesterday thanks for coming in, checking out to shows you're certainly pounding the pavement as they say in New York, to quote New Yorker phrase. What's new for you guys, what's coming out? More acquisitions happening? what's the outlook for Syncsort? >> So were were always active on the M&A front. We certainly have a pipeline of activities and there's a lot of different you know interesting spaces, adjacencies that we're exploring right now. There's nothing that I can really talk about there >> Can you talk about the categories you're looking at? >> Sure you know, things around metadata management, things around real-time data movement, cloud opportunities. There's there's some interesting opportunities in the artificial intelligence, machine learning space. Those are all >> Deep learning. >> Deep learning, those are all interesting spaces for us to think about. Security and other space is interesting. So we're pretty active in a lot of adjacencies >> Classic adjacent markets that you're looking at. So you take one step at a time, slow. >> But then we try to innovate on, you know, after the catch, so we did three announcements this week. Transaction tracing for Ironstream and a kind of refresh of data quality for Hadoop approach. So we'll continue to innovate on the organic setup as well. >> Final question the whole private equity thing. So that's done, so they put a big bag of money in there and brought the two companies together. Is there structural changes, management changes, you're the Syncsort CEO is there a new co name? >> The combined companies will operate under the Syncsort name, I'll serve as the CEO. >> Syncsort is the remaining name and you guys now have another company under it. >> Yes, that's right. >> And cash they put in, probably a boatload of cash for corporate development. >> The announcement the announced deal value was $1.2 billion a little over $1.2 billion. >> So you get a checkbook and looking to buy companies? >> We are we're going to continue, as I said yesterday, to Dave, you know I like to believe that we proved the hypothesis were in about the second inning. Can't wait to keep playing the game. >> It's interesting just, real quick while I got you in here, we got a break coming up for the guys. Private equity move is a good move in this transitional markets, you and I have talked about this in the past off-camera. It's a great thing to do, is take, if you're public and you're not really knocking it out of the park. Kill the 90 day shot clock, go private, there seems to be a lot of movement there. Retool and then re-emerge stronger. >> We've never been public, but I will say, the Centerbridge team has been terrific. A lot of resources there and certainly we do talk we're still very quarterly focused, but I think we've got a great partner and look forward to continue. >> The waves are coming, the big waves are coming so get your big surfboard out, we say in California. Josh, thanks for spending the time. Josh Rogers, CEO Syncsort here on theCUBE. More live coverage in New York after this break. Stay with us for our day two of three days of coverage of Big Data NYC 2017. Our event that we hold every year here in conjunction with Hadoop World right around the corner. I'm John Furrier, we'll be right back.

Published Date : Oct 2 2017

SUMMARY :

Brought to you by SiliconANGLE Media the CEO of Syncsort, you were with Dave Vellante No good to see you a CEO of Syncsort, in the Big Data space we had a DNA around performance You got into the game with some Hadoop, of the house and and we found that to be unique about the ones that you like right now and the opportunity we saw with Trillium was and it closed in the middle of August. hard to kind of get at. reconstructing the mainframe in the cloud. It's not so much that, it's the recognition the systems that are producing the world's data. and and not something. And you guys basically cornered the market on that. as you know efficiently as possible. A lot of the hipsters and the young guns out there I used cloud only as an example. And that's hard, and we can help you do that. I just got to say you guys are just slowly Now, the third piece of that is to partner very closely is they don't do the heavy lifting you've got that covered. We can we can solve a lot of pain points it's not like they're going to go build it. Cloudera and Hortonworks, they don't But that the other pieces is we of some of the deals he's been done in the industry the other platforms that we're integrating with, that you think are notable worth highlighting. the financial services space. and they were looking at how to transition At the end of the day don't forget performance and you need to understand the mainframe Well, great to hear from you today. and there's a lot of different you know interesting spaces, in the artificial intelligence, machine learning space. Security and other space is interesting. So you take one step at a time, slow. But then we try to innovate on, you know, and brought the two companies together. the Syncsort name, I'll serve as the CEO. Syncsort is the remaining name and you guys And cash they put in, probably a boatload of cash the announced deal value was $1.2 billion to Dave, you know I like to believe that we proved in this transitional markets, you and I the Centerbridge team has been terrific. Our event that we hold every year here

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

New YorkLOCATION

0.99+

Josh RogersPERSON

0.99+

2016DATE

0.99+

CaliforniaLOCATION

0.99+

$1.2 billionQUANTITY

0.99+

SyncsortORGANIZATION

0.99+

JulyDATE

0.99+

John FurrierPERSON

0.99+

JoshPERSON

0.99+

two companiesQUANTITY

0.99+

Centerbridge PartnersORGANIZATION

0.99+

New York CityLOCATION

0.99+

90 dayQUANTITY

0.99+

Washington, DCLOCATION

0.99+

yesterdayDATE

0.99+

2010DATE

0.99+

CenterbridgeORGANIZATION

0.99+

three daysQUANTITY

0.99+

Vision SolutionsORGANIZATION

0.99+

ClouderaORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

fiveQUANTITY

0.99+

DCLOCATION

0.99+

Big IronORGANIZATION

0.99+

third pieceQUANTITY

0.99+

CalibraORGANIZATION

0.99+

Hadoop WorldORGANIZATION

0.99+

oneQUANTITY

0.99+

OneQUANTITY

0.99+

two waysQUANTITY

0.99+

two componentsQUANTITY

0.99+

early JulyDATE

0.99+

TrilliumORGANIZATION

0.99+

HadoopTITLE

0.99+

both sidesQUANTITY

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

late 60'sDATE

0.98+

todayDATE

0.98+

middle of AugustDATE

0.98+

M&AORGANIZATION

0.98+

this weekDATE

0.98+

AWSORGANIZATION

0.98+

six thousand customersQUANTITY

0.98+

NYCLOCATION

0.98+

Midtown ManhattanLOCATION

0.98+

one stepQUANTITY

0.98+

SplunkORGANIZATION

0.98+

four thousand customersQUANTITY

0.98+

eight yearsQUANTITY

0.98+

vision solutionsORGANIZATION

0.98+

over $1.2 billionQUANTITY

0.97+

bothQUANTITY

0.97+

BarneyORGANIZATION

0.97+

S3TITLE

0.97+

IronstreamORGANIZATION

0.97+

Splunk ConferenceEVENT

0.97+

About 4 years agoDATE

0.96+

twoQUANTITY

0.96+

past yearDATE

0.96+

three announcementsQUANTITY

0.96+

ConcordLOCATION

0.95+

theCUBEORGANIZATION

0.95+

Rob Thomas, IBM | Big Data NYC 2017


 

>> Voiceover: Live from midtown Manhattan, it's theCUBE! Covering Big Data New York City 2017. Brought to you by, SiliconANGLE Media and as ecosystems sponsors. >> Okay, welcome back everyone, live in New York City this is theCUBE's coverage of, eighth year doing Hadoop World now, evolved into Strata Hadoop, now called Strata Data, it's had many incarnations but O'Reilly Media running their event in conjunction with Cloudera, mainly an O'Reilly media show. We do our own show called Big Data NYC here with our community with theCUBE bringing you the best interviews, the best people, entrepreneurs, thought leaders, experts, to get the data and try to project the future and help users find the value in data. My next guest is Rob Thomas, who is the General Manager of IBM Analytics, theCUBE Alumni, been on multiple times successfully executing in the San Francisco Bay area. Great to see you again. >> Yeah John, great to see you, thanks for having me. >> You know IBM is really been interesting through its own transformation and a lot of people will throw IBM in that category but you guys have been transforming okay and the scoreboard yet has to yet to show in my mind what's truly happening because if you still look at this industry, we're only eight years into what Hadoop evolved into now as a large data set but the analytics game just seems to be getting started with the cloud now coming over the top, you're starting to see a lot of cloud conversations in the air. Certainly there's a lot of AI washing, you know, AI this, but it's machine learning and deep learning at the heart of it as innovation but a lot more work on the analytics side is coming. You guys are at the center of that. What's the update? What's your view of this analytics market? >> Most enterprises struggle with complexity. That's the number one problem when it comes to analytics. It's not imagination, it's not willpower, in many cases, it's not even investment, it's just complexity. We are trying to make data really simple to use and the way I would describe it is we're moving from a world of products to platforms. Today, if you want to go solve a data governance problem you're typically integrating 10, 15 different products. And the burden then is on the client. So, we're trying to make analytics a platform game. And my view is an enterprise has to have three platforms if they're serious about analytics. They need a data manager platform for managing all types of data, public, private cloud. They need unified governance so governance of all types of data and they need a data science platform machine learning. If a client has those three platforms, they will be successful with data. And what I see now is really mixed. We've got 10 products that do that, five products that do this, but it has to be integrated in a platform. >> You as an IBM or the customer has these tools? >> Yeah, when I go see clients that's what I see is data... >> John: Disparate data log. >> Yeah, they have disparate tools and so we are unifying what we deliver from a product perspective to this platform concept. >> You guys announce an integrated analytic system, got to see my notes here, I want to get into that in a second but interesting you bring up the word platform because you know, platforms have always been kind of reserved for the big supplier but you're talking about customers having a platform, not a supplier delivering a platform per se 'cause this is where the integration thing becomes interesting. We were joking yesterday on theCUBE here, kind of just kind of ad hoc conceptually like the world has turned into a tool shed. I mean everyone has a tool shed or knows someone that has a tool shed where you have the tools in the back and they're rusty. And so, this brings up the tool conversation, there's too many tools out there that try to be platforms. >> Rob: Yes. >> And if you have too many tools, you're not really doing the platform game right. And complexity also turns into when you bought a hammer it turned into a lawn mower. Right so, a lot of these companies have been groping and trying to iterate what their tool was into something else it wasn't built for. So, as the industry evolves, that's natural Darwinism if you will, they will fall to the wayside. So talk about that dynamic because you still need tooling >> Rob: Yes. but tool will be a function of the work as Peter Burris would say, so talk about how does a customer really get that platform out there without sacrificing the tooling that they may have bought or want to get rid of. >> Well, so think about the, in enterprise today, what the data architecture looks like is, I've got this box that has this software on it, use your terms, has these types of tools on it, and it's isolated and if you want a different set of tooling, okay, move that data to this other box where we have the other tooling. So, it's very isolated in terms of how platforms have evolved or technology platforms today. When I talk about an integrated platform, we are big contributors to Kubernetes. We're making that foundational in terms of what we're doing on Private Cloud and Public Cloud is if you move to that model, suddenly what was a bunch of disparate tools are now microservices against a common architecture. And so it totally changes the nature of the data platform in an enterprise. It's a much more fluid data layer. The term I use sometimes is you have data as a service now, available to all your employees. That's totally different than I want to do this project, so step one, make room in the data center, step two, bring in a server. It's a much more flexible approach so that's what I mean when I say platform. >> So operationalizing it is a lot easier than just going down the linear path of provisioning. All right, so let's bring up the complexity issue because integrated and unified are two different concepts that kind of mean the same thing depending on how you look at it. When you look at the data integration problem, you've got all this complexity around governance, it's a lot of moving parts of data. How does a customer actually execute without compromising the integrity of their policies that they need to have in place? So in other words, what are the baby steps that someone can take, the customers take through with what you guys are dealing with them, how do they get into the game, how do they take steps towards the outcome? They might not have the big money to push it all at once, they might want to take a risk of risk management approach. >> I think there's a clear recipe for doing this right and we have experience of doing it well and doing it not so well, so over time we've gotten some, I'd say a pretty good perspective on that. My view is very simple, data governance has to start with a catalog. And the analogy I use is, you have to do for data what libraries do for books. And think about a library, the first thing you do with books, card catalog. You know where, you basically itemize everything, you know exactly where it sits. If you've got multiple copies of the same book, you can distinguish between which one is which. As books get older they go to archives, to microfilm or something like that. That's what you have to do with your data. >> On the front end. >> On the front end. And it starts with a catalog. And that reason I say that is, I see some organizations that start with, hey, let's go start ETL, I'll create a new warehouse, create a new Hadoop environment. That might be the right thing to do but without having a basis of what you have, which is the catalog, that's where I think clients need to start. >> Well, I would just add one more level of complexity just to kind of reinforce, first of all I agree with you but here's another example that would reinforce this step. Let's just say you write some machine learning and some algorithms and a new policy from the government comes down. Hey, you know, we're dealing with Bitcoin differently or whatever, some GPRS kind of thing happens where someone gets hacked and a new law comes out. How do you inject that policy? You got to rewrite the code, so I'm thinking that if you do this right, you don't have to do a lot of rewriting of applications to the library or the catalog will handle it. Is that right, am I getting that right? >> That's right 'cause then you have a baseline is what I would describe it as. It's codified in the form of a data model or in the form on ontology for how you're looking at unstructured data. You have a baseline so then as changes come, you can easily adjust to those changes. Where I see clients struggle is if you don't have that baseline then you're constantly trying to change things on the fly and that makes it really hard to get to this... >> Well, really hard, expensive, they have to rewrite apps. >> Exactly. >> Rewrite algorithms and machine learning things that were built probably by people that maybe left the company, who knows, right? So the consequences are pretty grave, I mean, pretty big. >> Yes. >> Okay, so let's back to something that you said yesterday. You were on theCUBE yesterday with Hortonworks CEO, Rob Bearden and you were commenting about AI or AI washing. You said quote, "You can't have AI without IA." A play on letters there, sequence of letters which was really an interesting comment, we kind of referenced it pretty much all day yesterday. Information architecture is the IA and AI is the artificial intelligence basically saying if you don't have some sort of architecture AI really can't work. Which really means models have to be understood, with the learning machine kind of approach. Expand more on that 'cause that was I think a fundamental thing that we're seeing at the show this week, this in New York is a model for the models. Who trains the machine learning? Machines got to learn somewhere too so there's learning for the learning machines. This is a real complex data problem and a half. If you don't set up the architecture it may not work, explain. >> So, there's two big problems enterprises have today. One is trying to operationalize data science and machine learning that scale, the other one is getting the cloud but let's focus on the first one for a minute. The reason clients struggle to operationalize this at scale is because they start a data science project and they build a model for one discreet data set. Problem is that only applies to that data set, it doesn't, you can't pick it up and move it somewhere else so this idea of data architecture just to kind of follow through, whether it's the catalog or how you're managing your data across multiple clouds becomes fundamental because ultimately you want to be able to provide machine learning across all your data because machine learning is about predictions and it's hard to do really good predictions on a subset. But that pre-req is the need for an information architecture that comprehends for the fact that you're going to build models and you want to train those models. As new data comes in, you want to keep the training process going. And that's the biggest challenge I see clients struggling with. So they'll have success with their first ML project but then the next one becomes progressively harder because now they're trying to use more data and they haven't prepared their architecture for that. >> Great point. Now, switching to data science. You spoke many times with us on theCUBE about data science, we know you're passionate about you guys doing a lot of work on that. We've observed and Jim Kobielus and I were talking yesterday, there's too much work still in the data science guys plate. There's still doing a lot of what I call, sys admin like work, not the right word, but like administrative building and wrangling. They're not doing enough data science and there's enough proof points now to show that data science actually impacts business in whether it's military having data intelligence to execute something, to selling something at the right time, or even for work or play or consume, or we use, all proof is out there. So why aren't we going faster, why aren't the data scientists more effective, what does it going to take for the data science to have a seamless environment that works for them? They're still doing a lot of wrangling and they're still getting down the weeds. Is that just the role they have or how does it get easier for them that's the big catch? >> That's not the role. So they're a victim of their architecture to some extent and that's why they end up spending 80% of their time on data prep, data cleansing, that type of thing. Look, I think we solved that. That's why when we introduced the integrated analytic system this week, that whole idea was get rid of all the data prep that you need because land the data in one place, machine learning and data science is built into that. So everything that the data scientist struggles with today goes away. We can federate to data on cloud, on any cloud, we can federate to data that's sitting inside Hortonworks so it looks like one system but machine learning is built into it from the start. So we've eliminated the need for all of that data movement, for all that data wrangling 'cause we organized the data, we built the catalog, and we've made it really simple. And so if you go back to the point I made, so one issue is clients can't apply machine learning at scale, the other one is they're struggling to get the cloud. I think we've nailed those problems 'cause now with a click of a button, you can scale this to part of the cloud. >> All right, so how does the customer get their hands on this? Sounds like it's a great tool, you're saying it's leading edge. We'll take a look at it, certainly I'll do a review on it with the team but how do I get it, how do I get a hold of this? What do I do, download it, you guys supply it to me, is it some open source, how do your customers and potential customers engage with this product? >> However they want to but I'll give you some examples. So, we have an analytic system built on Spark, you can bring the whole box into your data center and right away you're ready for data science. That's one way. Somebody like you, you're going to want to go get the containerized version, you go download it on the web and you'll be up and running instantly with a highly performing warehouse integrated with machine learning and data science built on Spark using Apache Jupyter. Any developer can go use that and get value out of it. You can also say I want to run it on my desktop. >> And that's free? >> Yes. >> Okay. >> There's a trial version out there. >> That's the open source, yeah, that's the free version. >> There's also a version on public cloud so if you don't want to download it, you want to run it outside your firewall, you can go run it on IBM cloud on the public cloud so... >> Just your cloud, Amazon? >> No, not today. >> John: Just IBM cloud, okay, I got it. >> So there's variety of ways that you can go use this and I think what you'll find... >> But you have a premium model that people can get started out so they'll download it to your data center, is that also free too? >> Yeah, absolutely. >> Okay, so all the base stuff is free. >> We also have a desktop version too so you can download... >> What URL can people look at this? >> Go to datascience.ibm.com, that's the best place to start a data science journey. >> Okay, multi-cloud, Common Cloud is what people are calling it, you guys have Common SQL engine. What is this product, how does it relate to the whole multi-cloud trend? Customers are looking for multiple clouds. >> Yeah, so Common SQL is the idea of integrating data wherever it is, whatever form it's in, ANSI SQL compliant so what you would expect for a SQL query and the type of response you get back, you get that back with Common SQL no matter where the data is. Now when you start thinking multi-cloud you introduce a whole other bunch of factors. Network, latency, all those types of things so what we talked about yesterday with the announcement of Hortonworks Dataplane which is kind of extending the YARN environment across multi-clouds, that's something we can plug in to. So, I think let's be honest, the multi-cloud world is still pretty early. >> John: Oh, really early. >> Our focus is delivery... >> I don't think it really exists actually. >> I think... >> It's multiple clouds but no one's actually moving workloads across all the clouds, I haven't found any. >> Yeah, I think it's hard for latency reasons today. We're trying to deliver an outstanding... >> But people are saying, I mean this is head room I got but people are saying, I'd love to have a preferred future of multi-cloud even though they're kind of getting their own shops in order, retrenching, and re-platforming it but that's not a bad ask. I mean, I'm a user, I want to move from if I don't like IBM's cloud or I got a better service, I can move around here. If Amazon is too expensive I want to move to IBM, you got product differentiation, I might want to to be in your cloud. So again, this is the customers mindset, right. If you have something really compelling on your cloud, do I have to go all in on IBM cloud to run my data? You shouldn't have to, right? >> I agree, yeah I don't think any enterprise will go all in on one cloud. I think it's delusional for people to think that so you're going to have this world. So the reason when we built IBM Cloud Private we did it on Kubernetes was we said, that can be a substrate if you will, that provides a level of standards across multiple cloud type environments. >> John: And it's got some traction too so it's a good bet there. >> Absolutely. >> Rob, final word, just talk about the personas who you now engage with from IBM's standpoint. I know you have a lot of great developers stuff going on, you've done some great work, you've got a free product out there but you still got to make money, you got to provide value to IBM, who are you selling to, what's the main thing, you've got multiple stakeholders, could you just clarify the stakeholders that you're serving in the marketplace? >> Yeah, I mean, the emerging stakeholder that we speak with more and more than we used to is chief marketing officers who have real budgets for data and data science and trying to change how they're performing their job. That's a major stakeholder, CTOs, CIOs, any C level, >> Chief data officer. >> Chief data officer. You know chief data officers, honestly, it's a mixed bag. Some organizations they're incredibly empowered and they're driving the strategy. Others, they're figure heads and so you got to know how the organizations do it. >> A puppet for the CFO or something. >> Yeah, exactly. >> Our ops. >> A puppet? (chuckles) So, you got to you know. >> Well, they're not really driving it, they're not changing it. It's not like we're mandated to go do something they're maybe governance police or something. >> Yeah, and in some cases that's true. In other cases, they drive the data architecture, the data strategy, and that's somebody that we can engage with right away and help them out so... >> Any events you got going up? Things happening in the marketplace that people might want to participate in? I know you guys do a lot of stuff out in the open, events they can connect with IBM, things going on? >> So we do, so we're doing a big event here in New York on November first and second where we're rolling out a lot of our new data products and cloud products so that's one coming up pretty soon. The biggest thing we've changed this year is there's such a craving for clients for education as we've started doing what we're calling Analytics University where we actually go to clients and we'll spend a day or two days, go really deep and open languages, open source. That's become kind of a new focus for us. >> A lot of re-skilling going on too with the transformation, right? >> Rob: Yes, absolutely. >> All right, Rob Thomas here, General Manager IBM Analytics inside theCUBE. CUBE alumni, breaking it down, giving his perspective. He's got two books out there, The Data Revolution was the first one. >> Big Data Revolution. >> Big Data Revolution and the new one is Every Company is a Tech Company. Love that title which is true, check it out on Amazon. Rob Thomas, Bid Data Revolution, first book and then second book is Every Company is a Tech Company. It's theCUBE live from New York. More coverage after the short break. (theCUBE jingle) (theCUBE jingle) (calm soothing music)

Published Date : Oct 2 2017

SUMMARY :

Brought to you by, SiliconANGLE Media Great to see you again. but the analytics game just seems to be getting started and the way I would describe it is and so we are unifying what we deliver where you have the tools in the back and they're rusty. So talk about that dynamic because you still need tooling that they may have bought or want to get rid of. and it's isolated and if you want They might not have the big money to push it all at once, the first thing you do with books, card catalog. That might be the right thing to do just to kind of reinforce, first of all I agree with you and that makes it really hard to get to this... they have to rewrite apps. probably by people that maybe left the company, Okay, so let's back to something that you said yesterday. and you want to train those models. Is that just the role they have the data prep that you need What do I do, download it, you guys supply it to me, However they want to but I'll give you some examples. There's a That's the open source, so if you don't want to download it, So there's variety of ways that you can go use this that's the best place to start a data science journey. you guys have Common SQL engine. and the type of response you get back, across all the clouds, I haven't found any. Yeah, I think it's hard for latency reasons today. If you have something really compelling on your cloud, that can be a substrate if you will, so it's a good bet there. I know you have a lot of great developers stuff going on, Yeah, I mean, the emerging stakeholder that you got to know how the organizations do it. So, you got to you know. It's not like we're mandated to go do something the data strategy, and that's somebody that we can and cloud products so that's one coming up pretty soon. CUBE alumni, breaking it down, giving his perspective. and the new one is Every Company is a Tech Company.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Jim KobielusPERSON

0.99+

Peter BurrisPERSON

0.99+

AmazonORGANIZATION

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

Rob BeardenPERSON

0.99+

Rob ThomasPERSON

0.99+

O'Reilly MediaORGANIZATION

0.99+

80%QUANTITY

0.99+

10QUANTITY

0.99+

New YorkLOCATION

0.99+

10 productsQUANTITY

0.99+

O'ReillyORGANIZATION

0.99+

two daysQUANTITY

0.99+

first bookQUANTITY

0.99+

two booksQUANTITY

0.99+

a dayQUANTITY

0.99+

RobPERSON

0.99+

TodayDATE

0.99+

yesterdayDATE

0.99+

New York CityLOCATION

0.99+

HortonworksORGANIZATION

0.99+

San Francisco BayLOCATION

0.99+

five productsQUANTITY

0.99+

second bookQUANTITY

0.99+

IBM AnalyticsORGANIZATION

0.99+

this weekDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

firstQUANTITY

0.99+

first oneQUANTITY

0.99+

theCUBEORGANIZATION

0.99+

eight yearsQUANTITY

0.99+

SparkTITLE

0.99+

SQLTITLE

0.99+

Common SQLTITLE

0.98+

datascience.ibm.comOTHER

0.98+

eighth yearQUANTITY

0.98+

OneQUANTITY

0.98+

one issueQUANTITY

0.97+

Hortonworks DataplaneORGANIZATION

0.97+

three platformsQUANTITY

0.97+

Strata HadoopTITLE

0.97+

todayDATE

0.97+

The Data RevolutionTITLE

0.97+

ClouderaORGANIZATION

0.97+

secondQUANTITY

0.96+

NYCLOCATION

0.96+

two big problemsQUANTITY

0.96+

Analytics UniversityORGANIZATION

0.96+

step twoQUANTITY

0.96+

one wayQUANTITY

0.96+

November firstDATE

0.96+

Big Data RevolutionTITLE

0.95+

oneQUANTITY

0.94+

Every Company is a Tech CompanyTITLE

0.94+

CUBEORGANIZATION

0.93+

this yearDATE

0.93+

two different conceptsQUANTITY

0.92+

one systemQUANTITY

0.92+

step oneQUANTITY

0.92+