Image Title

Search Results for Yarn:

Arun Murthy, Hortonworks | theCUBE NYC 2018


 

>> Live from New York, it's The Cube, covering The Cube New York City 2018 brought to you by SiliconAngle Media and its Ecosystem partners. >> Okay, welcome back everyone, here live in New York City for Cube NYC, formally Big Data NYC, now called CubeNYC. The topic has moved beyond big data. It's about cloud, it's about data, it's also about potentially blockchain in the future. I'm John Furrier, Dave Vellante. We're happy to have a special guest here, Arun Murthy. He's the cofounder and chief product officer of Hortonworks, been in the Ecosystem from the beginning, at Yahoo, already been on the Cube many times, but great to see you, thanks for coming in, >> My pleasure, >> appreciate it. >> thanks for having me. >> Super smart to have you on here, because a lot of people have been squinting through the noise of the market place. You guys have been now for a few years on this data plan idea, so you guys have actually launched Hadoop with Cloudera, they were first. You came after, Yahoo became second, two big players. Evolved it quickly, you guys saw early on that this is bigger than Hadoop. And now, all the conversations on what you guys have been talking about three years ago. Give us the update, what's the product update? How is the hybrids a big part of that, what's the story? >> We started off being the Hadoop company, and Rob, our CEO who was here on Cube, a couple of hours ago, he calls it sort of the phase one of the company, where it were Hadoop company. Very quickly realized we had to help enterprises manage the entire life cycle data, all the way from the edge to the data center, to the cloud, and between, right. So which is why we did acquisition of YARN, we've been talking about it, which kind of became the basis of our Hot marks Data flow product. And then as we went through the phase of that journey it was quickly obvious to us that enterprises had to manage data and applications in a hybrid manner right which is both on prem And public load and increasingly Edge, which is really very we spend a lot of time these days With IOT and everything from autonomous cars to video monitoring to all these aspects coming in. Which is why we wanted to get to the data plan architecture it allows to get you to a consistent security governance model. There's a lot of, I'll call it a lot of, a lot of fight about Cloud being insecure and so on, I don't think there's anything inherently insecure about the Cloud. The issue that we see is lack of skills and our enterprises know how to manage the data on-prem they know how to do LDAP, groups, and curb rows, and AAD, and what have you, they just don't have the skill sets yet to be able to do it on the public load, which leads to mistakes occasionally. >> Um-hm. >> And Data breaches and so on. So we recognize really early that part of data plan was to get that consistent security in governance models, so you don't have to worry about how you set up IMRL's on Amazon versus LDAP on-prem versus something else on Google. >> It's operating consistency. >> It's operating, exactly. I've talked about this in the past. So getting that Data plan was that journey, and this week at Charlotte work week we announced was we wanted to take that step further we've been able to kind of allow enterprise to manage this hybrid architecture on prem, multiple public loads. >> And the Edge. >> In a connected manner, the issue we saw early on and it's something we've been working on for a long while. Is that we've been able to connect the architectures Hadoop when it started it was more of an on premise architecture right, and I was there in 2005, 2006 when it started, Hadoop's started was bought on the world wide web we had a gigabyte of ethernet and I was up to the rack. From the rack on we had only eight gigs up to the rack so if you have a 2000 or cluster your dealing with eight gigs of connection. >> Bottleneck >> Huge bottleneck, fast forward today, you have at least ten if not one hundred gigabits. Moving to one hundred to a terabyte architecture, for that standpoint, and then what's happening is everything in that world, if you had the opportunity to read things on the assumptions we have in Hadoop. And then the good news is that when Cloud came along Cloud already had decoupled storage and architecture, storage and compute architectures. As we've sort of helped customers navigate the two worlds, with data plan, it's been a journey that's been reasonably successful and I think we have an opportunity to kind of provide identical consistent architectures both on prem and on Cloud. So it's almost like we took Hadoop and adapted it to Cloud. I think we can adapt the Cloud architecture back on prem, too to have consistent architectures. >> So talk about the Cloud native architecture. So you have a post that just got published. Cloud native architecture for big data and the data center. No, Cloud native architecture to big data in the data center. That's hyrid, explain the hybrid model, how do you define that? >> Like I said, for us it's really important to be able to have consistent architectures, consistent security, consistent governance, consistent way to manage data, and consistent way to actually to double up and port applications. So portability for data is important, which is why having security and governance consistently is a key. And then portability for the applications themselves are important, which is why we are so excited to kind of be, kind of first to embrace the whole containerize the ecosystem initiative. We've announced the open hybrid architecture initiative which is about decoupling storage and compute and then leveraging containers for all the big data apps, for the entire ecosystem. And this is where we are really excited to be working with both IBM and Redhat especially Redhat given their sort of investments in Kubernetes and open ship. We see that much like you'll have S3 and EC2, S3 for storage, EC2 for compute, and same thing with ADLS and azure compute. You'll actually have the next gen HDFS and Kubernetives. So is this a massive architectural rewrite, or is it more sort of management around the core. >> Great question. So part of it is evolution of the architecture. We have to get, whether it's Spark or Kafka or any of these open source projects, we need to do some evolution in the architecture, to make them work in the ecosystem, in the containerized world. So we are containerizing every one of the 28 animals 30 animals, in the zoo, right. That's a lot of work, we are kind of you know, sort of do it, we've done it in the past. Along with your point it's not enough to just have the architecture, you need to have a consistent fabric to be able to manage and operate it, which is really where the data plan comes in again. That was really the point of data plane all the time, this is a multi-roadmap, you know when we sit down we are thinking about what we'll do in 22, and 23. But we really have to execute on a multi-roadmap. >> And Data plane was a lynch pin. >> Well it was just like the sharp edge of the sword. Right, it was the tip of the sphere, but really the idea was always that we have to get data plan in to kind of get that hybrid product out there. And then we can sort of get to a inter generational data plan which would work with the next generation of the big data ecosystem itself. >> Do you see Kubernetes and things like Kubernetes, you've got STO a few service meshes up the stack, >> Absolutely are going to play a pretty instrumental role around orchestrating work loads and providing new stateless and stateful application with data, so now data you've got more data being generated there. So this is a new dynamic, it sounds like that's a fit for what you guys are doing. >> Which is something we've seen for awhile now. Like containers are something we've tracked for a long time and really excited to see Docker and RedHat. All the work that they are doing with Redhat containers. Get the security and so on. It's the maturing of that ecosystem. And now, the ability to port, build and port applications. And the really cool part for me is that, we will definitely see Kubenetes and open shift, and prem but even if you look at the Cloud the really nice part is that each of the Cloud providers themselves, provide a Kubenesos. Whether it's GKE on Google or Fargate on Amazon or AKS on Microsoft, we will be able to take identical architectures and leverage them. When we containerize high mark aft or spark we will be able to do this with kubernetes on spark with open shift and there will be open shift on leg which is available in the public cloud but also GKE and Fargate and AKS. >> What's interesting about the Redhat relationship is that I think you guys are smart to do this, is by partnering with Redhat you can, customers can run their workloads, analytical workloads, in the same production environment that Redhat is in. But with kind of differentiation if you will. >> Exactly with data plane. >> Data plane is just a wonderful thing there. So again good move there. Now around the ecosystem. Who else are you partnering with? what else do you see out there? who is in your world that is important? >> You know again our friends at IBM, that we've had a long relationship with them. We are doing a lot of work with IBM to integrate, data plane and also ICPD, which is the IBM Cloud plane for data, which brings along all of the IBM ecosystem. Whether it's DBT or IGC information governance catalogs, all that kind of were back in this world. What we also believe this will give a flip to is the whole continued standardization of security and governance. So you guys remember the old dpi, it caused a bit of a flutter, a few years ago. (anxious laughing) >> We know how that turned out. >> What we did was we kind of said, old DPI was based on the old distributions, now it's DPI's turn to be more about merit and governance. So we are collaborating with IBM on DPI more on merit and governance, because again we see that as being very critical in this sort of multi-Cloud, on prem edge world. >> Well the narrative, was always why do you need it, but it's clear that these three companies have succeeded dramatically, when you look at the financials, there has been statements made about IBM's contribution to seven figure deals to you guys. We had Redhat on and you guys are birds of a feather. [Murhty] Exactly. >> It certainly worked for you three, which presumably means it confers value to your customers. >> Which is really important, right from a customer standpoint, what is something we really focus on is that the benefit of the bargain is that now they understand that some of their key vendor partners that's us and Ibm and Redhat, we have a shared roadmap so now they can be much more sure about the fact that they can go to containers and kubernetes and so on and so on. Because all of the tools that they depend on are and all the partners they depend on are working together. >> So they can place bets. >> So they can place bets, and the important thing is that they can place longer term bets. Not a quarter bet, we hear about customers talking about building the next gen data centers, with kubernetes in mind. >> They have too. >> They have too, right and it's more than just building machines up, because what happens is with this world we talked about things like networking the way you do networking in this world with kubernetes, is different than you do before. So now they have to place longer term bets and they can do this now with the guarantee that the three of us will work together to deliver on the architecture. >> Well Arun, great to have you on the Cube, great to see you, final question for you, as you guys have a good long plan which is very cool. Short term customers are realizing, the set-up phase is over, okay now they're in usage mode. So the data has got to deliver value, so there is a real pressure for ROI, we would give people a little bit of a pass earlier on because set-up everything, set-up the data legs, do all this stuff, get it all operationalized, but now, with the AI and the machine learning front and center that's a signal that people want to start putting this to work. What have you seen customers gravitate to from the product side? Where are they going, is it the streaming is it the Kafka, is it the, what products are they gravitating to? >> Yeah definitely, I look at these in my role, in terms of use cases, right, we are certainly seeing a continued push towards the real-time analytics space. Which is why we place a longer-term bet on HDF and Kafka and so on. What's been really heartening kind of back to your sentiment, is we are seeing a lot of push right now on security garments. That's why we introduced for GDPR, we introduced a bunch of cable readies and data plane, with DSS and James Cornelius wrote about this earlier in the year, we are seeing customers really push us for key aspects like GDPR. This is a reflection for me of the fact of the maturing of the ecosystem, it means that it's no longer something on the side that you play with, it's something that's more, the whole ecosystem is now more a system of record instead of a system of augmentation, so that is really heartening but also brings a sharper focus and more sort of responsibility on our shoulders. >> Awesome, well congratulations, you guys have stock prices at a 52-week high. Congratulations. >> Those things take care of themselves. >> Good products, and stock prices take care of themselves. >> Okay the Cube coverage here in New York City, I'm John Vellante, stay with us for more live coverage all things data happening here in New York City. We will be right back after this short break. (digital beat)

Published Date : Sep 12 2018

SUMMARY :

brought to you by SiliconAngle Media at Yahoo, already been on the Cube many times, And now, all the conversations on what you guys a couple of hours ago, he calls it sort of the phase one so you don't have to worry about how you set up IMRL's on was we wanted to take that step further we've been able In a connected manner, the issue we saw early on on the assumptions we have in Hadoop. So talk about the Cloud native architecture. it more sort of management around the core. evolution in the architecture, to make them work in idea was always that we have to get data plan in to for what you guys are doing. And the really cool part for me is that, we will definitely What's interesting about the Redhat relationship is that Now around the ecosystem. So you guys remember the old dpi, it caused a bit of a So we are collaborating with IBM on DPI more on merit and Well the narrative, was always why do you need it, but It certainly worked for you three, which presumably be much more sure about the fact that they can go to building the next gen data centers, with kubernetes in mind. So now they have to place longer term bets and they So the data has got to deliver value, so there is a on the side that you play with, it's something that's Awesome, well congratulations, you guys have stock Okay the Cube coverage here in New York City,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Arun MurthyPERSON

0.99+

RobPERSON

0.99+

IBMORGANIZATION

0.99+

2005DATE

0.99+

John VellantePERSON

0.99+

John FurrierPERSON

0.99+

RedhatORGANIZATION

0.99+

YahooORGANIZATION

0.99+

30 animalsQUANTITY

0.99+

SiliconAngle MediaORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

AKSORGANIZATION

0.99+

New York CityLOCATION

0.99+

secondQUANTITY

0.99+

52-weekQUANTITY

0.99+

James CorneliusPERSON

0.99+

GoogleORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

New YorkLOCATION

0.99+

threeQUANTITY

0.99+

YARNORGANIZATION

0.99+

28 animalsQUANTITY

0.99+

one hundredQUANTITY

0.99+

FargateORGANIZATION

0.99+

two worldsQUANTITY

0.99+

GDPRTITLE

0.99+

2006DATE

0.99+

ArunPERSON

0.99+

three companiesQUANTITY

0.99+

one hundred gigabitsQUANTITY

0.99+

eight gigsQUANTITY

0.99+

this weekDATE

0.99+

two big playersQUANTITY

0.99+

HadoopTITLE

0.98+

firstQUANTITY

0.98+

SparkTITLE

0.98+

GKEORGANIZATION

0.98+

KafkaTITLE

0.98+

bothQUANTITY

0.98+

KubernetesTITLE

0.98+

eachQUANTITY

0.97+

todayDATE

0.97+

NYCLOCATION

0.97+

three years agoDATE

0.97+

CloudTITLE

0.97+

CharlotteLOCATION

0.96+

seven figureQUANTITY

0.96+

DSSORGANIZATION

0.96+

EC2TITLE

0.95+

S3TITLE

0.95+

CubeCOMMERCIAL_ITEM

0.94+

CubeORGANIZATION

0.92+

MurhtyPERSON

0.88+

2000QUANTITY

0.88+

few years agoDATE

0.87+

couple of hours agoDATE

0.87+

EcosystemORGANIZATION

0.86+

IbmPERSON

0.85+

Tim Vincent & Steve Roberts, IBM | DataWorks Summit 2018


 

>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, overing DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to day two of theCUBE's live coverage of DataWorks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host James Kobielus. We have two guests on this panel today, we have Tim Vincent, he is the VP of Cognitive Systems Software at IBM, and Steve Roberts, who is the Offering Manager for Big Data on IBM Power Systems. Thanks so much for coming on theCUBE. >> Oh thank you very much. >> Thanks for having us. >> So we're now in this new era, this Cognitive Systems era. Can you set the scene for our viewers, and tell our viewers a little bit about what you do and why it's so important >> Okay, I'll give a bit of a background first, because James knows me from my previous role as, and you know I spent a lot of time in the data and analytics space. I was the CTO for Bob running the analytics group up 'til about a year and a half ago, and we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the Cognitive Systems, Bob Picciano who's my current boss, Bob asked me to move over and really start helping build, help to build out more of a software, and more of an AI focus, and a workload focus on how we thinking of the Power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software, so co-optimize software space and the hardware space specific of what's needed for AI systems. Because the act of processing, the data processing, the algorithmic processing for AI is very, very different then what you would have for traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. >> And is this driven by customers, is this driven by just a trend that IBM is seeing? I mean how are you, >> It's a combination of both. >> So a lot of this is, you know, there's a lot of thought put into this before I joined the team. So there was a lot of good thinking from the Power brand, but it was really foresight on things like Moore's Law coming to an end of it's lifecycle right, and the ramifications to that. And at the same time as you start getting into things like narrow NATS and the floating point operations that you need to drive a narrow NAT, it was clear that we were hitting the boundaries. And then there's new technologies such as what Nvidia produces with with their GPUs, that are clearly advantageous. So there's a lot of trends that were comin' together the technical team saw, and at the same time we were seeing customers struggling with specific things. You know how to actually build a model if the training time is going to be weeks, and months, or let alone hours. And one of the scenarios I like to think about, I was probably showing my age a bit, but went to a school called University of Waterloo, and when I went to school, and in my early years, they had a batch based system for compilation and a systems run. You sit in the lab at night and you submit a compile job and the compile job will say, okay it's going to take three hours to compile the application, and you think of the productivity hit that has to you. And now you start thinking about, okay you've got this new skill in data scientists, which is really, really hard to find, they're very, very valuable. And you're giving them systems that take hours and weeks to do what the need to do. And you know, so they're trying to drive these models and get a high degree of accuracy in their predictions, and they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. >> Before the cameras were rolling you were talking about how the term data scientists and app developers is used interchangeably, and that's just wrong. >> And actually let's hear, 'cause I'd be in this whole position that I agree with it. I think it's the right framework. Data science is a team sport but application development has an even larger team sport in which data scientists, data engineers play a role. So, yeah we want to hear your ideas on the broader application development ecosystem, and where data scientists, and data engineers, and sort, fall into that broader spectrum. And then how IBM is supporting that entire new paradigm of application development, with your solution portfolio including, you know Power, AI on Power? >> So I think you used the word collaboration and team sport, and data science is a collaborative team sport. But you're 100% correct, there's also a, and I think it's missing to a great degree today, and it's probably limiting the actual value AI in the industry, and that's had to be data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer-producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is you know simply an AI model, >> Machine models, deep-learning models. >> Machine learning and deep learning, and the application developers are consuming those things and then producing something else, which is the application logic which is driving your business processes, and this view. So they got to work together. But there's a lot of confusion about who does what. You know you see people who talk with data scientists, build application logic, and you know the number of people who are data scientists can do that is, you know it exists, but it's not where the value, the value they bring to the equation. And the application developers developing AI models, you know they exist, but it's not the most prevalent form fact. >> But you know it's kind of unbalanced Tim, in the industry discussion of these role definitions. Quite often the traditional, you know definition, our sculpting of data scientist is that they know statistical modeling, plus data management, plus coding right? But you never hear the opposite, that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future will at least on some level need to be conversant with the practices of building,and tuning, or training the machine learning models or no? >> I think it's absolutely happen. And I will actually take it a step further, because again the data scientist skill is hard for a lot of people to find. >> Yeah. >> And as such is a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called PowerAI Vision, and it takes it up another level above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert. So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of narrow NATS you use, how to do data preparation. So we build a tool set which is, you know effectively a SME tool set, which allows you to automatically label, it actually allows you to tag and label images, and then as you're tagging and labeling images it learns from that and actually it helps automate the labeling of the image. >> Is this distinct from data science experience on the one hand, which is geared towards the data scientists and I think Watson Analytics among your tools, is geared towards the SME, this a third tool, or an overlap. >> Yeah this is a third tool, which is really again one of the co-optimized capabilities that I talked about, is it's a tool that we built out that really is leveraging the combination of what we do in Power, the interconnect which we have with the GPU's, which is the NVLink interconnect, which gives us basically a 10X improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly, so we're seeing about a 4X improvement over competitive technologies that are also using GPU's. And if we're looking at machine learning algorithms, we've recently come out with some technology we call Snap ML, which allows you to push machine learning, >> Snap ML, >> Yeah, it allows you to push machine learning algorithms down into the GPU's, and this is, we're seeing about a 40 to 50X improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to something specific, which is allow them to build out AI models to do recognition on either images or videos. >> Is there a pre-existing library of models in the solution that they can tap into? >> Basically it allows, it has a, >> Are they pre-trained? >> No they're not pre-trained models that's one of the differences in it. It actually has a set of models that allow, it picks for you, and actually so, >> Oh yes, okay. >> So this is why it helps the business persona because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyper-parameter tuning, but you know the end-user doesn't have to know about all these things right? So you're tryin' to lift, and it comes back to your point on application developers, it allows you to lift the barrier for people to do these tasks. >> Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally you should have, the infrastructure should recommend and choose, under various circumstances, the models, and the algorithms, the libraries, whatever for you for to the task, great. >> One extra feature of PowerAI Enterprises is that it does include a way to do a quick visual inspection of a models accuracy with a small data sample before you invest in scaling over a cluster or large data set. So you can get a visual indicator as to the, whether the models moving towards accuracy or you need to go and test an alternate model. >> So it's like a dashboard, of like Gini coefficients and all that stuff, okay. >> Exactly it gives you a snapshot view. And the other thing I was going to mention, you guys talked about application development, data scientists and of course a big message here at the conference is, you know data science meets big data and the work that Hortonworks is doing involving the notion of container support in YARN, GPU awareness in YARN, bringing data science experience, which you can include the PowerAI capability that Tim was talking about, as a workload tightly coupled with Hadoop. And this is where our Power servers are really built, not for just a monolithic building block that always has the same ratio of compute and storage, but fit for purpose servers that can address either GPU optimized workloads, providing the bandwidth enhancements that Tim talked about with the GPU, but also day-to-day servers, that can now support two terrabytes of memory, double the overall memory bandwidth on the box, 44 cores that can support up to 176 threads for parallelization of Spark workloads, Sequel workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need, 'cause a dupe isn't now just map produced, it's a multitude of workloads that you need to be able to mix and match, and bring various capabilities to the table for a compute, and that's where Power8, now Power9 has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller core, smaller memory, where it makes sense, pick and choose. >> So Steve at this show, at DataWorks 2018 here in San Jose, the prime announcement, partnership announced between IBM and Hortonworks was IHAH, which I believe is IBM Host Analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie-in from an offering management standpoint between that and PowerAI so you can build models in the PowerAI environment, and then deploy them out to, in conjunction with the IHAH, is there, going forward, I mean just wanted to get a sense of whether those kinds of integrations. >> Well the same data science capability, data science experience, whether you choose to run it in the public cloud, or run it in private cloud monitor on prem, it's the same data science package. You know PowerAI has a set of optimized deep-learning libraries that can provide advantage on power, apply when you choose to run those deployments on our Power system alright, so we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So really it depends upon the customer requirements and whether a Power foundation would make sense in some of those deployment models. I mean for us here with Power9 we've recently announced a whole series of Linux Power9 servers. That's our latest family, including as I mentioned, storage dense servers. The one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architecture. It's really to support combinations of clustered models that can as I mentioned, fit for purpose for the workload, to bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing Power in ICP with big data solutions as well. >> And before we wrap, we just wanted to wrap. I think in the reference architecture you describe, I'm excited about the fact that you've commercialized distributed deep-learning for the growing number of instances where you're going to build containerized AI and distributing pieces of it across in this multi-cloud, you need the underlying middleware fabric to allow all those pieces to play together into some larger applications. So I've been following DDL because you've, research lab has been posting information about that, you know for quite a while. So I'm excited that you guys have finally commercialized it. I think there's a really good job of commercializing what comes out of the lab, like with Watson. >> Great well a good note to end on. Thanks so much for joining us. >> Oh thank you. Thank you for the, >> Thank you. >> We will have more from theCUBE's live coverage of DataWorks coming up just after this. (bright electronic music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon he is the VP of Cognitive little bit about what you do and you know I spent a lot of time And at the same time as you how the term data scientists on the broader application one of the models I like to think about and the application developers in the industry discussion because again the data scientist skill So in the case of vision, on the one hand, which is geared that really is leveraging the combination down into the GPU's, and this is, that's one of the differences in it. it allows you to lift the barrier for the particular task. So you can get a visual and all that stuff, okay. and the work that Hortonworks is doing in the PowerAI environment, in the right combination. So I'm excited that you guys Thanks so much for joining us. Thank you for the, of DataWorks coming up just after this.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

BobPERSON

0.99+

Steve RobertsPERSON

0.99+

Tim VincentPERSON

0.99+

IBMORGANIZATION

0.99+

JamesPERSON

0.99+

HortonworksORGANIZATION

0.99+

Bob PiccianoPERSON

0.99+

StevePERSON

0.99+

San JoseLOCATION

0.99+

100%QUANTITY

0.99+

44 coresQUANTITY

0.99+

two guestsQUANTITY

0.99+

TimPERSON

0.99+

Silicon ValleyLOCATION

0.99+

10XQUANTITY

0.99+

NvidiaORGANIZATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

IBM Power SystemsORGANIZATION

0.99+

Cognitive Systems SoftwareORGANIZATION

0.99+

todayDATE

0.99+

three hoursQUANTITY

0.99+

oneQUANTITY

0.99+

bothQUANTITY

0.99+

Cognitive SystemsORGANIZATION

0.99+

University of WaterlooORGANIZATION

0.98+

third toolQUANTITY

0.98+

DataWorks Summit 2018EVENT

0.97+

50XQUANTITY

0.96+

PowerAITITLE

0.96+

DataWorks 2018EVENT

0.93+

theCUBEORGANIZATION

0.93+

two terrabytesQUANTITY

0.93+

up to 176 threadsQUANTITY

0.92+

40QUANTITY

0.91+

aboutDATE

0.91+

Power9COMMERCIAL_ITEM

0.89+

a year and a half agoDATE

0.89+

IHAHORGANIZATION

0.88+

4XQUANTITY

0.88+

IHAHTITLE

0.86+

DataWorksTITLE

0.85+

WatsonORGANIZATION

0.84+

Linux Power9TITLE

0.83+

Snap MLOTHER

0.78+

Power8COMMERCIAL_ITEM

0.77+

SparkTITLE

0.76+

firstQUANTITY

0.73+

PowerAIORGANIZATION

0.73+

One extraQUANTITY

0.71+

DataWorksORGANIZATION

0.7+

day twoQUANTITY

0.69+

HDP 3.0TITLE

0.68+

Watson AnalyticsORGANIZATION

0.65+

PowerORGANIZATION

0.58+

NVLinkOTHER

0.57+

YARNORGANIZATION

0.55+

HadoopTITLE

0.55+

theCUBEEVENT

0.53+

MooreORGANIZATION

0.45+

AnalyticsORGANIZATION

0.43+

Power9ORGANIZATION

0.41+

HostTITLE

0.36+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018


 

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

Rebecca KnightPERSON

0.99+

Dan PotterPERSON

0.99+

HortonworksORGANIZATION

0.99+

Ali BajwahPERSON

0.99+

DanPERSON

0.99+

Ali BajwaPERSON

0.99+

AliPERSON

0.99+

James KobielusPERSON

0.99+

Thursday morningDATE

0.99+

San JoseLOCATION

0.99+

Silicon ValleyLOCATION

0.99+

last yearDATE

0.99+

San JoseLOCATION

0.99+

AttunityORGANIZATION

0.99+

Last yearDATE

0.99+

OneQUANTITY

0.99+

second pieceQUANTITY

0.99+

GDPRTITLE

0.99+

AtlasORGANIZATION

0.99+

ThursdayDATE

0.99+

bothQUANTITY

0.99+

theCUBEORGANIZATION

0.98+

RangerORGANIZATION

0.98+

second offeringQUANTITY

0.98+

DataWorksORGANIZATION

0.98+

EuropeLOCATION

0.98+

AtlasTITLE

0.98+

Boston, MassachusettsLOCATION

0.98+

todayDATE

0.97+

DataWorks Summit 2018EVENT

0.96+

two main barriersQUANTITY

0.95+

DataPlane ServicesORGANIZATION

0.95+

DataWorks Summit 2018EVENT

0.94+

oneQUANTITY

0.93+

San Jose, CaliforniaLOCATION

0.93+

DockerTITLE

0.9+

single glassQUANTITY

0.87+

3.0OTHER

0.85+

EuropeanOTHER

0.84+

AttunityPERSON

0.84+

HiveLOCATION

0.83+

HDP 3.0OTHER

0.82+

one nice thingQUANTITY

0.82+

DataWorks BerlinEVENT

0.81+

EUORGANIZATION

0.81+

firstQUANTITY

0.8+

DataPlaneTITLE

0.8+

EULOCATION

0.78+

EDWTITLE

0.77+

Data Steward StudioORGANIZATION

0.73+

HiveORGANIZATION

0.73+

Data Steward StudioTITLE

0.69+

single transactionQUANTITY

0.68+

RangerTITLE

0.66+

StudioCOMMERCIAL_ITEM

0.63+

CDCORGANIZATION

0.58+

DataPlaneORGANIZATION

0.55+

themQUANTITY

0.53+

HDP 3.0OTHER

0.52+

Piotr Mierzejewski, IBM | Dataworks Summit EU 2018


 

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Piotr MierzejewskiPERSON

0.99+

James KobielusPERSON

0.99+

JamesPERSON

0.99+

IBMORGANIZATION

0.99+

PiotrPERSON

0.99+

HortonworksORGANIZATION

0.99+

30 secondsQUANTITY

0.99+

BerlinLOCATION

0.99+

IWSORGANIZATION

0.99+

PythonTITLE

0.99+

SparkTITLE

0.99+

twoQUANTITY

0.99+

FirstQUANTITY

0.99+

ScalaTITLE

0.99+

Berlin, GermanyLOCATION

0.99+

350,000 employeesQUANTITY

0.99+

DSXORGANIZATION

0.99+

MacCOMMERCIAL_ITEM

0.99+

two thingsQUANTITY

0.99+

RStudioTITLE

0.99+

DSXTITLE

0.99+

DSX 1.2TITLE

0.98+

both developersQUANTITY

0.98+

secondQUANTITY

0.98+

GDPRTITLE

0.98+

Watson ExplorerTITLE

0.98+

Dataworks Summit 2018EVENT

0.98+

first lineQUANTITY

0.98+

Dataworks Summit Europe 2018EVENT

0.98+

SiliconANGLE MediaORGANIZATION

0.97+

end of JuneDATE

0.97+

TensorFlowTITLE

0.97+

thousands of librariesQUANTITY

0.96+

RTITLE

0.96+

JupyterORGANIZATION

0.96+

1.2.1OTHER

0.96+

two excellent daysQUANTITY

0.95+

Dataworks SummitEVENT

0.94+

Dataworks Summit EU 2018EVENT

0.94+

SPSSTITLE

0.94+

oneQUANTITY

0.94+

AzureTITLE

0.92+

one kindQUANTITY

0.92+

theCUBEORGANIZATION

0.92+

HDPORGANIZATION

0.91+

Vikram Murali, IBM | IBM Data Science For All


 

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

IBMORGANIZATION

0.99+

DavePERSON

0.99+

VikramPERSON

0.99+

JohnPERSON

0.99+

three monthsQUANTITY

0.99+

six monthsQUANTITY

0.99+

John WallsPERSON

0.99+

October 30thDATE

0.99+

2017DATE

0.99+

AprilDATE

0.99+

JuneDATE

0.99+

one yearQUANTITY

0.99+

Daniel HernandezPERSON

0.99+

HortonworksORGANIZATION

0.99+

SeptemberDATE

0.99+

oneQUANTITY

0.99+

ten milesQUANTITY

0.99+

YARNORGANIZATION

0.99+

eight milesQUANTITY

0.99+

Vikram MuraliPERSON

0.99+

New York CityLOCATION

0.99+

North AmericaLOCATION

0.99+

two dayQUANTITY

0.99+

PythonTITLE

0.99+

two releasesQUANTITY

0.99+

New YorkLOCATION

0.99+

two yearsQUANTITY

0.99+

three yearsQUANTITY

0.99+

six releasesQUANTITY

0.99+

TorontoLOCATION

0.99+

todayDATE

0.99+

BothQUANTITY

0.99+

two monthsQUANTITY

0.99+

a yearQUANTITY

0.99+

YahooORGANIZATION

0.99+

third phaseQUANTITY

0.98+

bothQUANTITY

0.98+

this yearDATE

0.98+

first methodologyQUANTITY

0.98+

FirstQUANTITY

0.97+

second thingQUANTITY

0.97+

one small pieceQUANTITY

0.96+

OneQUANTITY

0.96+

XGBoostTITLE

0.96+

CameramanPERSON

0.96+

about eight milesQUANTITY

0.95+

Horton Data PlatformORGANIZATION

0.95+

2017-2018DATE

0.94+

firstQUANTITY

0.94+

The ValleyLOCATION

0.94+

TensorFlowTITLE

0.94+

Arun Murthy, Hortonworks | BigData NYC 2017


 

>> Coming back when we were a DOS spreadsheet company. I did a short stint at Microsoft and then joined Frank Quattrone when he spun out of Morgan Stanley to create what would become the number three tech investment (upbeat music) >> Host: Live from mid-town Manhattan, it's theCUBE covering the BigData New York City 2017. Brought to you by SiliconANGLE Media and its ecosystem sponsors. (upbeat electronic music) >> Welcome back, everyone. We're here, live, on day two of our three days of coverage of BigData NYC. This is our event that we put on every year. It's our fifth year doing BigData NYC in conjunction with Hadoop World which evolved into Strata Conference, which evolved into Strata Hadoop, now called Strata Data. Probably next year will be called Strata AI, but we're still theCUBE, we'll always be theCUBE and this our BigData NYC, our eighth year covering the BigData world since Hadoop World. And then as Hortonworks came on we started covering Hortonworks' data summit. >> Arun: DataWorks Summit. >> DataWorks Summit. Arun Murthy, my next guest, Co-Founder and Chief Product Officer of Hortonworks. Great to see you, looking good. >> Likewise, thank you. Thanks for having me. >> Boy, what a journey. Hadoop, years ago, >> 12 years now. >> I still remember, you guys came out of Yahoo, you guys put Hortonworks together and then since, gone public, first to go public, then Cloudera just went public. So, the Hadoop World is pretty much out there, everyone knows where it's at, it's got to nice use case, but the whole world's moved around it. You guys have been, really the first of the Hadoop players, before ever Cloudera, on this notion of data in flight, or, I call, real-time data but I think, you guys call it data-in-motion. Batch, we all know what Batch does, a lot of things to do with Batch, you can optimize it, it's not going anywhere, it's going to grow. Real-time data-in-motion's a huge deal. Give us the update. >> Absolutely, you know, we've obviously been in this space, personally, I've been in this for about 12 years now. So, we've had a lot of time to think about it. >> Host: Since you were 12? >> Yeah. (laughs) Almost. Probably look like it. So, back in 2014 and '15 when we, sort of, went public and we're started looking around, the thesis always was, yes, Hadoop is important, we're going to love you to manage lots and lots of data, but a lot of the stuff we've done since the beginning, starting with YARN and so on, was really enable the use cases beyond the whole traditional transactions and analytics. And Drop, our CO calls it, his vision's always been we've got to get into a pre-transactional world, if you will, rather than the post-transactional analytics and BIN and so on. So that's where it started. And increasingly, the obvious next step was to say, look enterprises want to be able to get insights from data, but they also want, increasingly, they want to get insights and they want to deal with it in real-time. You know while you're in you shopping cart. They want to make sure you don't abandon your shopping cart. If you were sitting at at retailer and you're on an island and you're about to walk away from a dress, you want to be able to do something about it. So, this notion of real-time is really important because it helps the enterprise connect with the customer at the point of action, if you will, and provide value right away rather than having to try to do this post-transaction. So, it's been a really important journey. We went and bought this company called Onyara, which is a bunch of geeks like us who started off with the government, built this batching NiFi thing, huge community. Its just, like, taking off at this point. It's been a fantastic thing to join hands and join the team and keep pushing in the whole streaming data style. >> There's a real, I don't mean to tangent but I do since you brought up community I wanted to bring this up. It's been the theme here this week. It's more and more obvious that the community role is becoming central, beyond open-source. We all know open-source, standing on the shoulders before us, you know. And Linux Foundation showing code numbers hitting up from $64 million to billions in the next five, ten years, exponential growth of new code coming in. So open-source certainly blew me. But now community is translating to things you start to see blockchain, very community based. That's a whole new currency market that's changing the financial landscape, ICOs and what-not, that's just one data point. Businesses, marketing communities, you're starting to see data as a fundamental thing around communities. And certainly it's going to change the vendor landscape. So you guys compare to, Cloudera and others have always been community driven. >> Yeah our philosophy has been simple. You know, more eyes and more hands are better than fewer. And it's been one of the cornerstones of our founding thesis, if you will. And you saw how that's gone on over course of six years we've been around. Super-excited to have someone like IBM join hands, it happened at DataWorks Summit in San Jose. That announcement, again, is a reflection of the fact that we've been very, very community driven and very, very ecosystem driven. >> Communities are fundamentally built on trust and partnering. >> Arun: Exactly >> Coding is pretty obvious, you code with your friends. You code with people who are good, they become your friends. There's an honor system among you. You're starting to see that in the corporate deals. So explain the dynamic there and some of the successes that you guys have had on the product side where one plus one equals more than two. One plus one equals five or three. >> You know IBM has been a great example. They've decided to focus on their strengths which is around Watson and machine learning and for us to focus on our strengths around data management, infrastructure, cloud and so on. So this combination of DSX, which is their data science work experience, along with Hortonworks is really powerful. We are seeing that over and over again. Just yesterday we announced the whole Dataplane thing, we were super excited about it. And now to get IBM to say, we'll get in our technologies and our IP, big data, whether it's big Quality or big Insights or big SEQUEL, and the word has been phenomenal. >> Well the Dataplane announcement, finally people who know me know that I hate the term data lake. I always said it's always been a data ocean. So I get redemption because now the data lakes, now it's admitting it's a horrible name but just saying stitching together the data lakes, Which is essentially a data ocean. Data lakes are out there and you can form these data lakes, or data sets, batch, whatever, but connecting them and integrating them is a huge issue, especially with security. >> And a lot of it is, it's also just pragmatism. We start off with this notion of data lake and say, hey, you got too many silos inside the enterprise in one data center, you want to put them together. But then increasingly, as Hadoop has become more and more mainstream, I can't remember the last time I had to explain what Hadoop is to somebody. As it has become mainstream, couple things have happened. One is, we talked about streaming data. We see all the time, especially with HTF. We have customers streaming data from autonomous cars. You have customers streaming from security cameras. You can put a small minify agent in a security camera or smart phone and can stream it all the way back. Then you get into physics. You're up against the laws of physics. If you have a security camera in Japan, why would you want to move it all the way to California and process it. You'd rather do it right there, right? So with this notion of a regional data center becomes really important. >> And that talks to the Edge as well. >> Exactly, right. So you want to have something in Japan that collects all of the security cameras in Tokyo, and you do analysis and push what you want back here, right. So that's physics. The other thing we are increasingly seeing is with data sovereignty rules especially things like GDPR, there's now regulation reasons where data has to naturally stay in different regions. Customer data from Germany cannot move to France or visa versa, right. >> Data governance is a huge issue and this is the problem I have with data governance. I am really looking for a solution so if you can illuminate this it would be great. So there is going to be an Equifax out there again. >> Arun: Oh, for sure. >> And the problem is, is that going to force some regulation change? So what we see is, certainly on the mugi bond side, I see it personally is that, you can almost see that something else will happen that'll force some policy regulation or governance. You don't want to screw up your data. You also don't want to rewrite your applications or rewrite you machine learning algorithms. So there's a lot of waste potential by not structuring the data properly. Can you comment on what's the preferred path? >> Absolutely, and that's why we've been working on things like Dataplane for almost a couple of years now. We is to say, you have to have data and policies which make sense, given a context. And the context is going to change by application, by usage, by compliance, by law. So, now to manage 20, 30, 50 a 100 data lakes, would it be better, not saying lakes, data ponds, >> [Host} Any Data. >> Any data >> Any data pool, stream, river, ocean, whatever. (laughs) >> Jacuzzis. Data jacuzzis, right. So what you want to do is want a holistic fabric, I like the term, you know Forrester uses, they call it the fabric. >> Host: Data fabric. >> Data fabric, right? You want a fabric over these so you can actually control and maintain governance and security centrally, but apply it with context. Last not least, is you want to do this whether it's on frame or on the cloud, or multi-cloud. So we've been working with a bank. They were probably based in Germany but for GDPR they had to stand up something in France now. They had French customers, but for a bunch of new reasons, regulation reasons, they had to sign up something in France. So they bring their own data center, then they had only the cloud provider, right, who I won't name. And they were great, things are working well. Now they want to expand the similar offering to customers in Asia. It turns out their favorite cloud vendor was not available in Asia or they were not available in time frame which made sense for the offering. So they had to go with cloud vendor two. So now although each of the vendors will do their job in terms of giving you all the security and governance and so on, the fact that you are to manage it three ways, one for OnFrame, one for cloud vendor A and B, was really hard, too hard for them. So this notion of a fabric across these things, which is Dataplane. And that, by the way, is based by all the open source technologies we love like Atlas and Ranger. By the way, that is also what IBM is betting on and what the entire ecosystem, but it seems like a no-brainer at this point. That was the kind of reason why we foresaw the need for something like a Dataplane and obviously couldn't be more excited to have something like that in the market today as a net new service that people can use. >> You get the catalogs, security controls, data integration. >> Arun: Exactly. >> Then you get the cloud, whatever, pick your cloud scenario, you can do that. Killer architecture, I liked it a lot. I guess the question I have for you personally is what's driving the product decisions at Hortonworks? And the second part of that question is, how does that change your ecosystem engagement? Because you guys have been very friendly in a partnering sense and also very good with the ecosystem. How are you guys deciding the product strategies? Does it bubble up from the community? Is there an ivory tower, let's go take that hill? >> It's both, because what typically happens is obviously we've been in the community now for a long time. Working publicly now with well over 1,000 customers not only puts a lot of responsibility on our shoulders but it's also very nice because it gives us a vantage point which is unique. That's number one. The second one we see is being in the community, also we see the fact that people are starting to solve the problems. So it's another elementary for us. So you have one as the enterprise side, we see what the enterprises are facing which is kind of where Dataplane came in, but we also saw in the community where people are starting to ask us about hey, can you do multi-cluster Atlas? Or multi-cluster Ranger? Put two and two together and say there is a real need. >> So you get some consensus. >> You get some consensus, and you also see that on the enterprise side. Last not least is when went to friends like IBM and say hey we're doing this. This is where we can position this, right. So we can actually bring in IGSC, you can bring big Quality and bring all these type, >> [Host} So things had clicked with IBM? >> Exactly. >> Rob Thomas was thinking the same thing. Bring in the power system and the horsepower. >> Exactly, yep. We announced something, for example, we have been working with the power guys and NVIDIA, for deep learning, right. That sort of stuff is what clicks if you're in the community long enough, if you have the vantage point of the enterprise long enough, it feels like the two of them click. And that's frankly, my job. >> Great, and you've got obviously the landscape. The waves are coming in. So I've got to ask you, the big waves are coming in and you're seeing people starting to get hip with the couple of key things that they got to get their hands on. They need to have the big surfboards, metaphorically speaking. They got to have some good products, big emphasis on real value. Don't give me any hype, don't give me a head fake. You know, I buy, okay, AI Wash, and people can see right through that. Alright, that's clear. But AI's great. We all cheer for AI but the reality is, everyone knows that's pretty much b.s. except for core machine learning is on the front edge of innovation. So that's cool, but value. [Laughs] Hey I've got the integrate and operationalize my data so that's the big wave that's coming. Comment on the community piece because enterprises now are realizing as open source becomes the dominant source of value for them, they are now really going to the next level. It used to be like the emerging enterprises that knew open source. The guys will volunteer and they may not go deeper in the community. But now more people in the enterprises are in open source communities, they are recruiting from open source communities, and that's impacting their business. What's your advice for someone who's been in the community of open source? Lessons you've learned, what is the best practice, from your standpoint on philosophy, how to build into the community, how to build a community model. >> Yeah, I mean, the end of the day, my best advice is to say look, the community is defined by the people who contribute. So, you get advice if you contribute. Which means, if that's the fundamental truth. Which means you have to get your legal policies and so on to a point that you can actually start to let your employees contribute. That kicks off a flywheel, where you can actually go then recruit the best talent, because the best talent wants to stand out. Github is a resume now. It is not a word doc. If you don't allow them to build that resume they're not going to come by and it's just a fundamental truth. >> It's self governing, it's reality. >> It's reality, exactly. Right and we see that over and over again. It's taken time but it as with things, the flywheel has changed enough. >> A whole new generation's coming online. If you look at the young kids coming in now, it is an amazing environment. You've got TensorFlow, all this cool stuff happening. It's just amazing. >> You, know 20 years ago that wouldn't happen because the Googles of the world won't open source it. Now increasingly, >> The secret's out, open source works. >> Yeah, (laughs) shh. >> Tell everybody. You know they know already but, This is changing some of the how H.R. works and how people collaborate, >> And the policies around it. The legal policies around contribution so, >> Arun, great to see you. Congratulations. It's been fun to watch the Hortonworks journey. I want to appreciate you and Rob Bearden for supporting theCUBE here in BigData NYC. If is wasn't for Hortonworks and Rob Bearden and your support, theCUBE would not be part of the Strata Data, which we are not allowed to broadcast into, for the record. O'Reilly Media does not allow TheCube or our analysts inside their venue. They've excluded us and that's a bummer for them. They're a closed organization. But I want to thank Hortonworks and you guys for supporting us. >> Arun: Likewise. >> We really appreciate it. >> Arun: Thanks for having me back. >> Thanks and shout out to Rob Bearden. Good luck and CPO, it's a fun job, you know, not the pressure. I got a lot of pressure. A whole lot. >> Arun: Alright, thanks. >> More Cube coverage after this short break. (upbeat electronic music)

Published Date : Sep 28 2017

SUMMARY :

the number three tech investment Brought to you by SiliconANGLE Media This is our event that we put on every year. Co-Founder and Chief Product Officer of Hortonworks. Thanks for having me. Boy, what a journey. You guys have been, really the first of the Hadoop players, Absolutely, you know, we've obviously been in this space, at the point of action, if you will, standing on the shoulders before us, you know. And it's been one of the cornerstones Communities are fundamentally built on that you guys have had on the product side and the word has been phenomenal. So I get redemption because now the data lakes, I can't remember the last time I had to explain and you do analysis and push what you want back here, right. so if you can illuminate this it would be great. I see it personally is that, you can almost see that We is to say, you have to have data and policies Any data pool, stream, river, ocean, whatever. I like the term, you know Forrester uses, the fact that you are to manage it three ways, I guess the question I have for you personally is So you have one as the enterprise side, and you also see that on the enterprise side. Bring in the power system and the horsepower. if you have the vantage point of the enterprise long enough, is on the front edge of innovation. and so on to a point that you can actually the flywheel has changed enough. If you look at the young kids coming in now, because the Googles of the world won't open source it. This is changing some of the how H.R. works And the policies around it. and you guys for supporting us. Thanks and shout out to Rob Bearden. More Cube coverage after this short break.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AsiaLOCATION

0.99+

FranceLOCATION

0.99+

ArunPERSON

0.99+

IBMORGANIZATION

0.99+

Rob BeardenPERSON

0.99+

GermanyLOCATION

0.99+

Arun MurthyPERSON

0.99+

JapanLOCATION

0.99+

NVIDIAORGANIZATION

0.99+

TokyoLOCATION

0.99+

2014DATE

0.99+

CaliforniaLOCATION

0.99+

12QUANTITY

0.99+

fiveQUANTITY

0.99+

Frank QuattronePERSON

0.99+

threeQUANTITY

0.99+

twoQUANTITY

0.99+

OnyaraORGANIZATION

0.99+

$64 millionQUANTITY

0.99+

MicrosoftORGANIZATION

0.99+

San JoseLOCATION

0.99+

O'Reilly MediaORGANIZATION

0.99+

eachQUANTITY

0.99+

Morgan StanleyORGANIZATION

0.99+

Linux FoundationORGANIZATION

0.99+

OneQUANTITY

0.99+

fifth yearQUANTITY

0.99+

AtlasORGANIZATION

0.99+

20QUANTITY

0.99+

oneQUANTITY

0.99+

Rob ThomasPERSON

0.99+

three daysQUANTITY

0.99+

eighth yearQUANTITY

0.99+

yesterdayDATE

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

six yearsQUANTITY

0.99+

EquifaxORGANIZATION

0.99+

next yearDATE

0.99+

NYCLOCATION

0.99+

HortonworksORGANIZATION

0.99+

second partQUANTITY

0.99+

bothQUANTITY

0.99+

RangerORGANIZATION

0.99+

50QUANTITY

0.98+

30QUANTITY

0.98+

YahooORGANIZATION

0.98+

Strata ConferenceEVENT

0.98+

DataWorks SummitEVENT

0.98+

HadoopTITLE

0.98+

'15DATE

0.97+

20 years agoDATE

0.97+

ForresterORGANIZATION

0.97+

GDPRTITLE

0.97+

second oneQUANTITY

0.97+

one data centerQUANTITY

0.97+

GithubORGANIZATION

0.96+

about 12 yearsQUANTITY

0.96+

three waysQUANTITY

0.96+

ManhattanLOCATION

0.95+

day twoQUANTITY

0.95+

this weekDATE

0.95+

NiFiORGANIZATION

0.94+

DataplaneORGANIZATION

0.94+

BigDataORGANIZATION

0.94+

Hadoop WorldEVENT

0.93+

billionsQUANTITY

0.93+

Rob Bearden, Hortonworks & Rob Thomas, IBM | BigData NYC 2017


 

>> Announcer: Live from Midtown Manhattan, it's theCUBE. Covering Big Data New York City 2017. Brought to you by SiliconANGLE media, and its ecosystem sponsor. >> Okay, welcome back, everyone. We're here live in New York City for BigData NYC, our annual event with SiliconANGLE Media, theCUBE, and Wikibon, in conjunction with Strata Hadoop, which is now called Strata Data as that show evolves. I'm John Furrier, cohost of theCUBE, with Peter Burris, head of research for SiliconANGLE Media, and General Manager of Wikibon. Our next two guests are two legends in the big data industry, Rob Bearden, the CEO of Hortonworks, really one of the founders of the big data movement, you know, got Cloudaire and Hortonworks, really kind of built that out, and Rob Thomas, General Manager of IBM Analytics. Big-time investments have made both of them. Congratulations for your success, guys. Welcome back to theCUBE, great to see you guys! >> Great to see you. >> Great, yeah. >> And got an exciting partnership to talk about, as well. >> So, but let's do a little history, you guys, obviously, I want to get to that, and get clarified on the news in a second, but you guys have been there from the beginning, kind of looking at the market, developing it, almost from the embryonic state to now. I mean, what a changeover. Give a quick comparison of where we've come from and what's the current landscape now, because you have, it evolved into so much more. You got IOT, you got AI, you have a lot of things in the enterprise. You've got cloud computing. A lot of tailwinds for this industry. It's gotten bigger. It's become big and now it's huge. What's your thoughts, guys? >> You know I, so you look at arcs and really all this started with Hadoop, and Rob and I met early in the days of that. You kind of gone from the early few years is about optimizing operations. Hadoop is a great way for a company to become more efficient, take out costs in their data infrastructure, and so that put huge momentum into this area, and now we've kind of fast-forwarded to the point where now it's about, "So how "am I actually going to extract insight?" So instead of just getting operational advantages, how am I going to get competitive advantage, and that's about bringing the world of data science and machine learning, run it natively on Hadoop, that's the next chapter, and that's what Rob and I are working closely together on. >> Rob, your thoughts, too? You know, we've been talking about data in motion. You guys were early on in that, seeing that trend. Real time is still hot. Data is still the core asset people are trying to figure out and move from wrangling to actually enabling that data. >> Right. Well, you know, in the early days of Big Data, it was, to Rob's point, it was very much about bringing operational leverage and efficiency and being able to aggregate very siloed data sets, and unlocking that data and bringing it into a central platform. In the early days in resources, and Hadoop went to making Hadoop an enterprise-viable data platform, with security, governance, operations, management capability, that mirrored any of the proprietary transactional or EDW platforms, and what the lessons learned in that were, is that by bringing all that data together in a central data set, we now can understand what's happening with our customers, and with our other assets pre-transaction, and so they can become very prescriptive in engaging in new business models, and so what we've learned now is the further upstream we can get in the world of IOT and bring that data under management from the point of origination and be able to manage that all the way through its life cycle, we can create new business models with higher velocity of engagement and a lot more rapid value that gets created. It, though, creates a number of new challenges in all the areas of how you secure that data, how you bring governance across that entire life cycle from a common stream set. >> Well, let's talk about the news you guys have. Obviously, the partnership. Partnerships become the new normal in an open source era that we're living in. We're seeing open source software grow really exponentially in the forecast coming in the next five years and ten years and exponential growth in new code. Just new people coming on board, new developers, dev ops is mainstream. Partnerships are key for communities. 90% of the code is going to be open source, 10%, as they say, the Code Sandwich as Jim Zemlin, the executive director of Linux Foundation, wants to, and you're seeing that work. You guys have worked together with Apache Atlas. What's the news, what's the relationship with Hortonworks and IBM? Share the news. >> So, a lot of great work's been happening there, and generally in the open source community, around Apache Atlas, and making sure that we're bringing missing critical governance capabilities across the big data sets and environments. As we then get into the complexity of now multiple data lakes, multiple tiers of data coming from multiple sources, that brings a higher level of requirement in both the security and governance aspects, and that's where the partnership with IBM is continuing to drive Apache Atlas into mission critical enterprise viability, but then when we get into the distributed models and enterprise requirements, the IBM platforms leveraging Atlas and what we're doing together then take that into the mission critical enterprise capability. >> You got the open source, and now you got the enterprise. Rob, we've talked many times about the enterprise as a hard, hard environment to crack for say, a start up, but even now, they're becoming reliant on open source, but yet, they have a lot of operational challenges. How does this relate to the challenge of, you know, CIO and his staff, now new personas coming in, you seeing the data science role, you see it expanding from analytics to dev ops. A day of challenges. >> Look, enterprises are getting better at this. Clearly we've seen progress the last five years on that, but to kind of go back and link the points, there's a phrase I heard I like. It says, "There's no AI without IA," meaning information architecture. Fundamentally, what our partnership is about is delivering the right information architecture. So it's Hadoop federated with whatever you have in terms of warehouses and databases. We partner around IBM common sequel for that. It's meta data for your core governance because without governance you don't have compliance, you can't offer self-service analytics, so we are forming what I would call the fluid data layer for an enterprise that enables them to get to this future of AI, and my view is there's a stop in between, which is data science, machine learning, applications that are ready today that clients can put into production and improve the outcomes they're getting. That's what we're focused on right now is how do we take the information architecture we've been able to establish, and then help clients on this journey? That's what enterprises want, because that's how they're going to build differentiation in their businesses. >> But the definition of an information architecture is closest to applications, and maybe this informs your perspective, it's close to the applications that the business is running on. Goes back to your observation about, "We used to be focusing, optimizing operations." As you move away from those applications, your information architecture becomes increasingly diffuse. It's not as crystal clear. How do you drive that clarity, as the data moves to derived new applications? >> Rob and I have talked about this. I think we're at the dawn of probably a new era in application development. Much more agile, flexible applications that are taking advantage of data wherever it resides. We are really early in that. Right now we are in the let's actually put into practice, machine learning and data science, let's extract value the data we got, that will then inform a new set of applications, which is related to the announcements that Hortonworks made this week around data plane, which is looking at multi-cloud environments and how would you manage applications and data across those? Rob, you can speak to that better than I can, I think. >> Well, the data plan thing, this information architecture, I think you're 100% right on. The data that we're hearing from customers in the enterprise is, they see the IOT buzz, oh, of course they're going to connect with IOT devices down the road, but when they see the security challenges, when they see the operational challenges around hiring people to actually run the dev ops, they have to then re-architect. So there's certainly a conversation we see on what is the architecture for the data, but also a little bit bigger than that, the holistic architecture of, say, cloud. So a lot of people are like, trying to clean up their house, if you will, to be ready for this new era, and I think Wikibon, your private cloud report you guys put out really amplified that by saying, "Yeah, they see these trends, "but they got to kind of get their act together." They got to look at who the staff is, what the data architecture's going to be, what apps are being developed, so doing a lot more retrenching. Given that, if we agree, what does that mean for the data plane, and then your vision of having that data architecture so that this will be a solid foundational transition? >> I think we all hit on the same point, which is it is about enabling a next generation IT architecture, of which, sort of the X and the Y axis or network, and generally what Big Data's been able to do, and Hadoop specifically, was over the last five years, enabling the existing applications architected, and I like the term that's been coined by you, is they were known processes with known technology, and that's how applications in the last 20 years have been enabled. Big Data and Hadoop generally have unlocked that ability to now be able to move all the way out to the edge and incorporate IOT, data at rest, data in motion, on-prem and cloud hybrid architecture. What that's done is said, "Now we know how to build an "application that takes advantage of an event or an "occurrence and then can drive outcome in a variety of ways. "We don't have to wait for a static programming model "to automate a function." >> And in fact, if we are wait, we're going to fail. That's one of the biggest challenges. I mean, IBM, I will tell you guys, or I'll tell you, Rob, that one of the craziest days I've ever spent is I flew from Japan to New York City for the IBM Information Architecture Announcement back in like 1994, and it was the most painful two days I've ever experienced in my entire life. That's a long time ago. It's ancient history. We can't use information architecture as a way of slowing things down. What we need to be able to do is we need to be able to introduce technology that again, allows the clarity of information architecture close to these core applications to move, and that may involve things like machine learning itself being embedded directly into how we envision data being moved, how we envision optimization, how we envision the data plane working. So, as you guys think about this data plane, everybody ends up asking themselves, "Is there a natural place for data to be?" What's going to be centralized, what's going to be decentralized, and I'm asking you, is increasingly the data going to be decentralized but the governance and securities and policies that we put in place going to be centralized and that's what's going to inform the operation of the data plane? What do you guys think? >> It's our view, very specifically from Hortonworks' perspective, that we want to give the ability for the data to exist and reside wherever the physics dictate, whether that be on-prem, whether that be in the cloud, and we want to give the ability to process and take action on an event or an occurrence or drive and outcome as early in the cycle as possible. >> Describe what you mean by "early in the cycle." >> So, as we see conditions emerge. A machine part breaking down. A customer taking an action. A supply chain inventory outage. >> So as close as possible to the event that's generating the data. >> As it's being generated, or as the processes are leading up to the natural outcome and we can maybe disintermediate for a better outcome, and so, that means that we have to be able to engage with the data irrespective of where it is in its cycle, and that's where we've enabled, with data plane, the ability to extract out the requirement of where that data is, and to be able to have a common plane, pun intended, for the operations and managing and provisioning of the environment, for being able to govern that and secure it, which are increasingly becoming intertwined, because you have to deal with it from point of origin through point at rest. >> The new phrase, "The single plane of glass." All joking aside, I want to just get your thoughts on this, Rob, too. "What's in it for me? "I'm the customer. "Right now I have a couple challenges." This is what we hear from the market. "I need data consistency because things are happening in "real time; whatever events are going on with data, we know "more data's going to be coming out from the edge and "everywhere else, faster and more volume, so I need "consistency of my data, and I don't want "to have multiple data silos," and then they got to integrate the data, so on the application developer side, a dev ops-like ethos is emerging where, "Hey, if there's data being done, I need to integrate that "into my app in real time," so those are two challenges. Does the data plane address that concern for customers? That's the question. >> Today it enables the ops world. >> So I can integrate my apps into the data plane. >> My apps and my other data assets, irrespective of where they reside, on-prem, cloud, or out to the edge, and all points in between. >> Rob, for enterprise, is this going to be the single pane of glass for data governance? Is that how the vision that you guys see this, because that's a benefit. If that could happen, that's essentially one step towards the promised land, if you will, for more data flowing through apps and app developers. >> So let me reshape a little bit. There's two main problems that collectively we have to address for enterprises: one is they want to apply machine learning and data science at scale, and they're struggling with that, and two is they want to get the cloud, and it's not talked about nearly enough, but most clients are really struggling with that. Then you fast forward on that one, we are moving to a multi-cloud world, absolutely. I don't think any enterprise is going to standardize on a single cloud, that's pretty clear. So you need things like data plane that acknowledge it's a multi-cloud world, and even as you move to multi clouds, you want a single focus for your data governance, a single strategy for your data governance, and then what we're doing together with IBM Data Science Experience with Hortonworks, let's say, whatever data you have in there, you can now do your machine learning right where that data is. You don't need to move it around. You can if you want, but you don't have to move it around, 'cause it's built in, and it's integrated right into the Hadoop ecosystem. That solves the two main enterprise pain points, which is help me get the cloud, help me apply data science and machine learning. >> Well we'll have to follow up and we'll have to do just a segment just on that. I think multi-cloud is clearly the direction, but what the hell does that mean? If I run 365 on Azure, that's one app. If I run something else on Amazon, that's multiple clouds, not necessarily moving workloads across. So the question I want to ask here is, it's clear from customers they want single code bases that run on all clouds seamlessly so I don't have to scale up on things on Amazon, Azure, and Google. Not all clouds are created equal in how they do things. Storage, through ever, inside the data factories of how they process. That's a challenge. How do you guys see that playing out of, you have on-premise activities that have been bootstrapped. Now you have multiple clouds with different ways of doing things, from pipelining, ingestion and processing, and learning. How do you see that playing out? Clouds just kind of standardizing around data plane? >> There's also the complexity of even within the multi-clouds, you're going to have multiple tiers within the clouds, if you're running in one data center in Asia, versus one in Latin America, maybe a couple across the Americas. >> But as a customer, do I need to know the cloud internals of Amazon, Azure, and Google? >> You do. In a stand-alone world, yes you do. That's where we have to bring and abstract the complexity of that out, and that's the goal with data plane, is to be able to extract, whether it's, which tier it's in, on-prem, or whether it's on, irrespective of which cloud platform. >> But Rob Thomas, I really like the way you put it. There may be some other issues that users have to worry about, certainly there are some that we think, but the two questions of, "Where am I going to run the machine learning," and "How am I going to get that to the cloud appropriately," I really like the way you put that. At the end of the day, what users need to focus on is less where the application code is, and more where the data is, so that they can move the application code or they can move the work to the data. That's fundamentally the perspective. We think that businesses don't take their business to the cloud, they bring the cloud to their business. So, when you think about this notion of increasingly looking at a set of work that needs to be performed, where the data exists, and what acts you're going to take in that data, it does suggest that data is going to become more of a centerpiece asset within the business. How does some of the things that you guys are doing lead customers to start to acknowledge data as an asset so they're making the appropriate investments in their data as their business evolves, and partly in response to data as an asset? What do you think? >> We have to do our job to build to common denominators, and that's what we're doing to make this easy for clients. So today we announced the IBM integrated analytics system. Same code base on private cloud as on a hardware system as on public cloud, all of it federates to Hortonworks through common sequel. That's what clients need, 'cause it solves their problem. Click of a button, they can get the cloud, and by the way, on private cloud it's based on Kubernetes, which is aligned with what we have on public cloud. We're working with Hortonworks to optimize Yarn and Kubernetes working together. These are the meaty issues that if we don't solve it, then clients have to deal with the bag of bolts, and so that's the kind of stuff we're solving together. So think about it: one single code base for managing your data, federates to Hadoop, machine learning is built into the system, and it's based on Kubernetes, that's what clients want. >> And the containers is just great, too. Great cloud-native trend. You guys been great, active in there. Congratulations to both of you guys. Final question, get you guys the last word: How does the relationship between Hortonworks and IBM evolve? How do you guys see this playing out? More of the same? Keep integrating in code? Is there any new thing you see on the horizon that you're going to be knocking down in the future? >> I'll take the first shot. The goal is to continue to make it simple and easy for the customer to get to the cloud, bring those machine learning and data science models to the data, and make it easy for the consumption of the new next generation of applications, and continue to make our customer successful and drive value, but to do it through transparently enabling the technology platforms together, and I think we've acknowledged the things that IBM is extraordinarily good at, the things that Hortworks is good at, and bring those two together with virtually no overlap. >> Rob, you've been very partner-centric. Your thoughts on this partnership? >> Look, it's what clients want. Since we announced this, the results and the response has been fantastic, and I think it's for one simple reason. So, Hortonworks' mission, we all know, is open source, and delivering in the community. They do a fantastic job of that. We also know that sometimes, clients need a little bit more, and so, when you bring those two things together, that's what clients want. That's very different than what other people in the industry do that say, "We're going to create a proprietary wrapper "around your Hadoop environment and lock your data in." That's the opposite of what we're doing. We're saying we're giving you full freedom of open source, but we're enabling you to augment that with machine learning, data science capabilities. This is what clients want. That's why the partnership's working. I think that's why we've gotten the response that we have. >> And you guys have been multiple years into the new operating model of being much more aggressive within the Big Data community, which has now morphed into much larger landscape. You pleased with some of the results you're seeing on the IBM side and more coding, more involvement in these projects on your end? >> Yeah, I mean, look, we were certainly early on Spark, created a lot of momentum there. I think it actually ended up helping both of our interests in the market. We built a huge community of developers at IBM, which is not something IBM had even a few years ago, but it's great to have a relationship like this where we can continue to augment our skills. We make each other better, and I think what you'll see in the future is more on the governance side; I think that's the piece that's still not quite been figured out by most enterprises yet. The need is understood. The implementation is slow, so you'll see more from us collectively there. >> Well, congratulations in the community work you guys have done. I think the community's model's evolving mainstream as well. Open source will continue to grow. Congratulations. Rob Bearden and Rob Thomas here inside theCUBE, more coverage here in Big Data NYC with theCUBE, after this short break.

Published Date : Sep 27 2017

SUMMARY :

Brought to you by SiliconANGLE media, of the big data movement, you know, almost from the embryonic state to now. You kind of gone from the early few years Data is still the core asset people are trying to figure out and be able to manage that all the way through its 90% of the code is going to be open source, and generally in the open source community, How does this relate to the challenge of, you know, CIO the fluid data layer for an enterprise that enables them to But the definition of an information architecture is the data we got, that will then inform a new set Well, the data plan thing, this information architecture, and that's how applications in the last 20 years of the data plane? to give the ability to process and take action on an event So, as we see conditions emerge. So as close as possible to the event and provisioning of the environment, and then they got to integrate the data, they reside, on-prem, cloud, or out to the edge, Is that how the vision that you guys see this, I don't think any enterprise is going to standardize So the question I want to ask here is, There's also the complexity of even within the of that out, and that's the goal with data plane, How does some of the things that you guys are doing and so that's the kind of stuff we're solving together. Congratulations to both of you guys. for the customer to get to the cloud, bring those machine Rob, you've been very partner-centric. and delivering in the community. on the IBM side and more coding, more involvement in these in the market. Well, congratulations in the community work

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

Rob BeardenPERSON

0.99+

JapanLOCATION

0.99+

RobPERSON

0.99+

Rob ThomasPERSON

0.99+

Peter BurrisPERSON

0.99+

John FurrierPERSON

0.99+

AmazonORGANIZATION

0.99+

AsiaLOCATION

0.99+

Jim ZemlinPERSON

0.99+

1994DATE

0.99+

100%QUANTITY

0.99+

GoogleORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

AmericasLOCATION

0.99+

WikibonORGANIZATION

0.99+

SiliconANGLE MediaORGANIZATION

0.99+

Latin AmericaLOCATION

0.99+

twoQUANTITY

0.99+

HortworksORGANIZATION

0.99+

Linux FoundationORGANIZATION

0.99+

two questionsQUANTITY

0.99+

New York CityLOCATION

0.99+

10%QUANTITY

0.99+

bothQUANTITY

0.99+

CloudaireORGANIZATION

0.99+

90%QUANTITY

0.99+

IBM AnalyticsORGANIZATION

0.99+

theCUBEORGANIZATION

0.99+

two thingsQUANTITY

0.99+

NYCLOCATION

0.99+

two challengesQUANTITY

0.99+

oneQUANTITY

0.99+

Midtown ManhattanLOCATION

0.98+

two daysQUANTITY

0.98+

two main problemsQUANTITY

0.98+

Apache AtlasORGANIZATION

0.98+

first shotQUANTITY

0.98+

one stepQUANTITY

0.98+

ibonPERSON

0.98+

one appQUANTITY

0.98+

TodayDATE

0.97+

this weekDATE

0.97+

two guestsQUANTITY

0.97+

todayDATE

0.97+

YarnORGANIZATION

0.96+

BigDataORGANIZATION

0.96+

SiliconANGLE mediaORGANIZATION

0.95+

Hortonworks'PERSON

0.94+

single cloudQUANTITY

0.94+

Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017


 

>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
ScottPERSON

0.99+

George GilbertPERSON

0.99+

Lisa MartinPERSON

0.99+

hundredsQUANTITY

0.99+

90%QUANTITY

0.99+

TwiceQUANTITY

0.99+

Scott GnauPERSON

0.99+

IBMORGANIZATION

0.99+

twiceQUANTITY

0.99+

San JoseLOCATION

0.99+

HortonworksORGANIZATION

0.99+

TrilliumORGANIZATION

0.99+

SyncsortORGANIZATION

0.99+

bothQUANTITY

0.99+

60QUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

Data LakeORGANIZATION

0.99+

less than 10%QUANTITY

0.99+

this weekDATE

0.99+

one dayQUANTITY

0.99+

TenduORGANIZATION

0.99+

CollibraORGANIZATION

0.99+

87%QUANTITY

0.99+

first stepQUANTITY

0.99+

thousands of customersQUANTITY

0.99+

SyncsortTITLE

0.98+

87QUANTITY

0.98+

oneQUANTITY

0.98+

AtlasTITLE

0.98+

later this yearDATE

0.98+

SyncSortORGANIZATION

0.98+

DataWorks SummitEVENT

0.98+

a year and a half agoDATE

0.97+

TenduPERSON

0.97+

DataWorks Summit 2017EVENT

0.97+

Day OneQUANTITY

0.97+

Fortune 500ORGANIZATION

0.96+

a weekQUANTITY

0.96+

one sideQUANTITY

0.96+

Fortune 100ORGANIZATION

0.96+

Scott VoiceoverPERSON

0.95+

HadoopTITLE

0.93+

AtlasORGANIZATION

0.93+

theCUBEORGANIZATION

0.92+

this morningDATE

0.92+

CTOPERSON

0.92+

day oneQUANTITY

0.92+

coupleQUANTITY

0.91+

last six monthsDATE

0.9+

first use casesQUANTITY

0.9+

early February of this yearDATE

0.89+

theCubeORGANIZATION

0.89+

CUBE AlumniORGANIZATION

0.87+

DataWorks summitEVENT

0.86+

todayDATE

0.86+

Talco financial servicesORGANIZATION

0.85+

every 30 secondsQUANTITY

0.83+

FortuneORGANIZATION

0.8+

KafkaPERSON

0.79+

DMX-hORGANIZATION

0.75+

data lakeORGANIZATION

0.73+

Man's VoiceoverTITLE

0.6+

KafkaTITLE

0.6+

Ash Munshi, Pepperdata - #SparkSummit - #theCUBE


 

(upbeat music) >> Announcer: Live from San Francisco, it's theCUBE, covering Spark Summit 2017, brought to you by Databricks. >> Welcome back to theCUBE, it's day two at the Spark Summit 2017. I'm David Goad and here with George Gilbert from Wikibon, George. >> George: Good to be here. >> Alright and the guest of honor of course, is Ash Munshi, who is the CEO of Pepperdata. Ash, welcome to the show. >> Thank you very much, thank you. >> Well you have an interesting background, I want you to just tell us real quick here, not give the whole bio, but you got a great background in machine learning, you were an early user of Spark, tell us a little bit about your experience. >> So I'm actually a mathematician originally, a theoretician who worked for IBM Research, and then subsequently Larry Ellison at Oracle, and a number of other places. But most recently I was CTO at Yahoo, and then subsequent to that I did a bunch of startups, that involved different types of machine learning, and also just in general, sort of a lot of big data infrastructure stuff. >> And go back to 2012 with Spark right? You had an interesting development. Right, so 2011, 2012, when Spark was still early, we were actually building a recommendation system, based on user-generated reviews. That was a project that was done with Nando de Freitas, who is now at DeepMind, and Peter Cnudde, who's one of the key guys that runs infrastructure at Yahoo. We started that company, and we were one of the early users of Spark, and what we found was, that we were analyzing all the reviews at Amazon. So Amazon allows you to crawl all of their reviews, and we basically had natural language processing, that would allow us to analyze all those reviews. When we were doing sort of MapReduce stuff, it was taking us a huge number of nodes, and 24 hours to actually go do analysis. And then we had this little project called Spark, out of AMPlab, and we decided spin it up, and see what we could do. It had lots of issues at that time, but we were able to actually spin it up on to, I think it was in the order of 100,000 nodes, and we were able take our times for running our algorithms from you know, sort of tens of hours, down to sort of an hour or two, so it was a significant improvement in performance. And that's when we realized that, you know, this is going to be something that's going to be really important once this set of issues, where it, once it was going to get mature enough to make happen, and I'm glad to see that that it's actually happened now, and it's actually taken over the world. >> Yeah that little project became a big deal, didn't it? >> It became a big deal, and now everybody's taking advantage of the same thing. >> Well bring us to the present here. We'll talk about Pepperdata and what you do, and then George is going to ask a little bit more about some of the solutions that you have. >> Perfect, so Pepperdata was a company founded by two gentlemen, Sean Suchter and Chad Carson. Sean used to run Yahoo Search, and one of the first guys who actually helped develop Hadoop next to Eric14 and that team. And then Chad was one of the first guys who actually figured out how to monetize clicks, and was the data science guy around the whole thing. So those are the two guys that actually started the company. I joined the company last July as CEO, and you know, what we've done recently, is we've sort of expanded our focus of the company to addressing DevOps for big data. And the reason why DevOps for big data is important, is because what's happened in the last few years, is people have gone from experimenting with big data, to taking big data into production, and now they're actually starting to figure out how to actually make it so that it actually runs properly, and scales, and does all the other kinds of things that are there, right? So, it's that transition that's actually happened, so, "Hey, we ran it in production, "and it didn't quite work the way we wanted to, "now we actually have to make it work correctly." That's where we sort of fit in, and that's where DevOps comes in, right? DevOps comes in when you're actually trying to make production systems that are going to perform in the right way. And the reason for DevOps is it shortens the cycle between developers and operators, right? So the tighter the loop, the faster you can get solutions out, because business users are actually wanting that to happen. That's where we're squarely focused, is how do we make that work? How do we make that work correctly for big data? And the difference between, sort of classic DevOps and DevOps for big data, is that you're now dealing with not just, you know, a set of computers solving an isolated sort of problem. You're dealing with thousands of machines that are solving one problem, and the amount of data is significantly larger. So the classical methodologies that you have, while, you know, agile and all that still works, the tools don't work to actually figure out what you can do with DevOps, and that's where we come in. We've got a set of tools that are focused on performance effectively, 'cause that's the big difference between distributed systems performance I should say, that's the big difference between that, and sort of classic even scaled out computing, right? So if you've got web servers, yes performance is important, and you need data for those, but that can actually be sharded nicely. This is one system working on one problem, right? Or a set of systems working on one problem. That's much harder, it's a different set of problems, and we help solve those problems. >> Yeah, and George you look like you're itching to dig into this, feel free. (exclaims loudly) >> Well so, it was, so one of the big announcements at the show, and the sort of the headline announcement today, was Spark server lists, like so it's not just someone running Spark in the cloud sort of as a manage service, it's up there as a, you know, sort of SaaS application. And you could call it platform of the service, but it's basically a service where, you know, the infrastructure is invisible. Now, for all those customers who are running their own clusters, which is pretty much everyone I would imagine at this point, how far can you take them in hiding much of the overhead of running those clusters? And by the overhead I mean, you know, the primarily performance and maximizing, you know, sort of maximizing resource efficiency. >> So, you have to actually sort of double-click on to the kind of resources that we're talking about here, right? So there's the number of nodes that you're going to need to actually do the computation. There is, you know, the amount of disc storage and stuff that you're going to need, what type of CPUs you're going to need. All of that stuff is sort of part of the costing if you will, of running an infrastructure. If somebody hides all that stuff, and makes it so that it's economical, then you know, that's a great thing, right? And if it can actually be made so that it's works for huge installations, and hides it appropriately so I don't pay too much of a tax, that's a wonderful thing to do. But we have, our customers are enterprises, typically Fortune 200 enterprises, and they have both a mixture of cloud-based stuff, where they actually want to control everything about what's going on, and then they have infrastructure internally, which by definition they control everything that's going on, and for them we're very, very applicable. I don't know how we'd applicable in this, sort of new world as a service that grows and shrinks. I can certainly imagine that whoever provides that service would embed us, to be able to use the stuff more efficiently. >> No, you answered my question, which is, for the people who aren't getting the turnkey you know, sort of SaaS solution, and they need help managing, you know, what's a fairly involved stack, they would turn to you? >> Ash: Yes. >> Okay. >> Can I ask you about the specific products? >> George: Oh yes. >> I saw you at the booth, and I saw you were announcing a couple of things. Well what is new-- >> Ash: Correct. >> With the show? >> Correct, so at the show we announced Code Analyzer for Apache Spark, and what that allows people to do, is really understand where performance issues are actually happening in their code. So, one of the wonderful things about Spark, compared to MapReduce, is that it abstracts the paradigm that you actually write against, right? So that's a wonderful thing, 'cause it makes it easier to write code. The problem when we abstract, is what does that abstraction do down in the hardware, and where am I losing performance? And being able to give that information back to the user. So you know, in Spark, you have jobs that can run in parallel. So an apps consists of jobs, jobs can run in parallel, and each one of these things can consume resources, CPU, memory, and you see that through sort of garbage collection, or a disc or a network, and what you want to find out, is which one these parallel tasks was dominating the CPU? Why was it dominating the CPU? Which one actually caused the garbage collector actually go crazy at some point? While the Spark UI provides some of that information, what it doesn't do, is gives you a time series view of what's going on. So it's sort of a blow-by-blow view of what's going on. By imposing the time series view on sort of an enhanced version of the Spark UI, you now have much better visibility about which offending stages are causing the issue. And the nice thing about that is, once you know that, you know exactly which piece of code that you actually want to go and look at. So classic example would be, you might have two stages that are running in parallel. The Spark UI will tell you that it's stage three that's causing the problem, but if you look at the time series, you'll find out that stage two actually runs longer, and that's the one that's pegging the CPU. And you can see that because we have the time series, but you couldn't see that any other way. >> So you have a code analyzer and also the app profiler. >> So the app profiler is the other product that we announced a few months ago. We announced that I guess about three months ago or so. And the app profiler, what it does, is it actually looks after the run is done, it actually looks at all the data that the run produces, so the Spark history server produces, and then it actually goes back and analyzes that and says, "Well you know what? "You're executors here, are not working as efficiently, "these are the executors "that aren't working as efficiently." It might be using too much memory or whatever, and then it allows the developer to basically be able to click on it and say, "Explain to me why that's happening?" And then it gives you a little, you know, a little fix-it if you will. It's like, if this is happening, you probably want to do these things, in order to improve performance. So, what's happening with our customers, is our customers are asking developers to run the application profiler first, before they actually put stuff on production. Because if the application profiler comes back and says, "Everything is green." That there's no critical issues there. Then they're saying, "Okay fine, put it on my cluster, "on the production cluster, "but don't do it ahead of time." The application profiler, to be clear, is actually based on some work that, on open source project called Dr. Elephant, which comes out of LinkedIn. And now we're working very closely together to make sure that we actually can advance the set of heuristics that we have, that will allow developers to understand and diagnose more and more complex problems. >> The Spark community has the best code names ever. Dr. Elephant, I've never heard of that one before. (laughter) >> Well Dr. Elephant, actually, is not just the Spark community, it's actually also part of the MapReduce community, right? >> David: Ah, okay. >> So yeah, I mean remember Hadoop? >> David: Yes. >> The elephant thing, so Dr. Elephant, and you know. >> Well let's talk about where things are going next, George? >> So, you know, one of the things we hear all the time from customers and vendors, is, "How are we going to deal with this new era "of distributed computing?" You know, where we've got the cloud, on-prem, edge, and like so, for the first question, let's leave out the edge and say, you've got your Fortune 200 client, they have, you know, production clusters or even if it's just one on-prem, but they also want to work in the cloud, whether it's for elastics stuff, or just for, they're gathering a lot of data there. How can you help them manage both, you know, environments? >> Right, so I think there's a bunch of times still, before we get into most customers actually facing that problem. What we see today is, that a lot of the Fortune 200, or our customers, I shouldn't say a lot of the Fortune 200, a lot of our customers have significant, you know, deployments internally on-prem. They do experimentation on the cloud, right? The current infrastructure for managing all these, and sort of orchestrating all this stuff, is typically YARN. What we're seeing, is that more than likely they're going to wind up, or at least our intelligence tells us that it's going to wind up being Kubernetes that's actually going to wind up managing that. So, what will happen is-- >> George: Both on-prem and-- >> Well let me get to that, alright? >> George: Okay. >> So, I think YARN will be replaced certainly on-prem with Kupernetes, because then you can do multi data center, and things of that sort. The nice thing about Kupernetes, is it in fact can span the cloud as well. So, Kupernetes as an infrastructure, is certainly capable of being able to both handle a multi data center deployment on-prem, along with whatever actually happens on the cloud. There is infrastructure available to do that. It's very immature, most of the customers aren't anywhere close to being able to do that, and I would say even before Kupernetes gets accepted within the environment, it's probably 18 months, and there's probably another 18 months to two years, before we start facing this hybrid cloud, on-prem kind of problem. So we're a few years out I think. >> So, would, for those of us including our viewers, you know, who know the acronym, and know that it's a, you know, scheduler slash cluster manager, resource manager, would that give you enough of a control plane and knowledge of sort of the resources out there, for you to be able to either instrument or deploy an instrument to all the clusters (mumbles). >> So we are actually leading the effort right now for big data on Kupernetes. So there is a group of, there's a small group working. It's Google, us, Red Hat, Palantir, Bloomberg now has joined the group as well. We are actually today talking about our effort on getting HDFS working on Kupernetes, so we see the writing on the wall. We clearly are positioning ourselves to be a player in that particular space, so we think we'll be ready and able to take that challenge on. >> Ash this is great stuff, we've just got about a minute before the break, so I wanted to ask you just a final question. You've been in the Spark community for a while, so what of their open source tools should we be keeping our eyes out for? >> Kupernetes. >> David: That's the one? >> To me that is the killer that's coming next. >> David: Alright. >> I think that's going to make life, it's going to unify the microservices architecture, plus the sort of multi data center and everything else. I think it's really, really good. Board works, it's been working for a long time. >> David: Alright, and I want to thank you for that little Pepper pen that I got over at your booth, as the coolest-- >> Come and get more. >> Gadget here. >> We also have Pepper sauce. >> Oh, of course. (laughter) Well there sir-- >> It's our sauce. >> There's the hot news from-- >> Ash: There you go. >> Pepperdata Ash Munshi. Thank you so much for being on the show, we appreciate it. >> Ash: My pleasure, thank you very much. >> And thank you for watching theCUBE. We're going to be back with more guests, including Ali Ghodsi, CEO of Databricks, coming up next. (upbeat music) (ocean roaring)

Published Date : Jun 7 2017

SUMMARY :

brought to you by Databricks. and here with George Gilbert from Wikibon, George. Alright and the guest of honor of course, I want you to just tell us real quick here, and then subsequent to that I did a bunch of startups, and it's actually taken over the world. and now everybody's taking advantage of the same thing. about some of the solutions that you have. So the classical methodologies that you have, Yeah, and George you look like And by the overhead I mean, you know, is sort of part of the costing if you will, and I saw you were announcing a couple of things. And the nice thing about that is, once you know that, And then it gives you a little, The Spark community has the best code names ever. is not just the Spark community, and like so, for the first question, that a lot of the Fortune 200, or our customers, and there's probably another 18 months to two years, and know that it's a, you know, scheduler Bloomberg now has joined the group as well. so I wanted to ask you just a final question. plus the sort of multi data center Oh, of course. Thank you so much for being on the show, we appreciate it. And thank you for watching theCUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
David GoadPERSON

0.99+

Ash MunshiPERSON

0.99+

GeorgePERSON

0.99+

Ali GhodsiPERSON

0.99+

Larry EllisonPERSON

0.99+

George GilbertPERSON

0.99+

GoogleORGANIZATION

0.99+

Sean SuchterPERSON

0.99+

DavidPERSON

0.99+

SeanPERSON

0.99+

AshPERSON

0.99+

Red HatORGANIZATION

0.99+

OracleORGANIZATION

0.99+

YahooORGANIZATION

0.99+

Peter CnuddePERSON

0.99+

2011DATE

0.99+

DeepMindORGANIZATION

0.99+

BloombergORGANIZATION

0.99+

San FranciscoLOCATION

0.99+

two guysQUANTITY

0.99+

PepperdataORGANIZATION

0.99+

24 hoursQUANTITY

0.99+

first questionQUANTITY

0.99+

Spark UITITLE

0.99+

AmazonORGANIZATION

0.99+

DevOpsTITLE

0.99+

2012DATE

0.99+

Chad CarsonPERSON

0.99+

two yearsQUANTITY

0.99+

18 monthsQUANTITY

0.99+

oneQUANTITY

0.99+

twoQUANTITY

0.99+

one problemQUANTITY

0.99+

last JulyDATE

0.99+

DatabricksORGANIZATION

0.99+

LinkedInORGANIZATION

0.99+

Spark Summit 2017EVENT

0.99+

Code AnalyzerTITLE

0.99+

SparkTITLE

0.98+

100,000 nodesQUANTITY

0.98+

todayDATE

0.98+

PalantirORGANIZATION

0.98+

an hourQUANTITY

0.98+

IBM ResearchORGANIZATION

0.98+

BothQUANTITY

0.98+

two gentlemenQUANTITY

0.98+

ChadPERSON

0.98+

two stagesQUANTITY

0.98+

first guysQUANTITY

0.98+

bothQUANTITY

0.97+

thousands of machinesQUANTITY

0.97+

each oneQUANTITY

0.97+

tens of hoursQUANTITY

0.95+

KupernetesORGANIZATION

0.95+

MapReduceTITLE

0.95+

Yahoo SearchORGANIZATION

0.94+

Dr. Jisheng Wang, Hewlett Packard Enterprise, Spark Summit 2017 - #SparkSummit - #theCUBE


 

>> Announcer: Live from San Francisco, it's theCUBE covering Sparks Summit 2017 brought to you by Databricks. >> You are watching theCUBE at Sparks Summit 2017. We continue our coverage here talking with developers, partners, customers, all things Spark, and today we're honored now to have our next guest Dr. Jisheng Wang who's the Senior Director of Data Science at the CTO Office at Hewlett Packard Enterprise. Dr. Wang, welcome to the show. >> Yeah, thanks for having me here. >> All right and also to my right we have Mr. Jim Kobielus who's the Lead Analyst for Data Science at Wikibon. Welcome, Jim. >> Great to be here like always. >> Well let's jump into it. At first I want to ask about your background a little bit. We were talking about the organization, maybe you could do a better job (laughs) of telling me where you came from and you just recently joined HPE. >> Yes. I actually recently joined HPE earlier this year through the Niara acquisition, and now I'm the Senior Director of Data Science in the CTO Office of Aruba. Actually, Aruba you probably know like two years back, HP acquired Aruba as a wireless networking company, and now Aruba takes charge of the whole enterprise networking business in HP which is about over three billion annual revenue every year now. >> Host: That's not confusing at all. I can follow you (laughs). >> Yes, okay. >> Well all I know is you're doing some exciting stuff with Spark, so maybe tell us about this new solution that you're developing. >> Yes, actually my most experience of Spark now goes back to the Niara time, so Niara was a three and a half year old startup that invented, reinvented the enterprise security using big data and data science. So what is the problem we solved, we tried to solve in Niara is called a UEBA, user and entity behavioral analytics. So I'll just try to be very brief here. Most of the transitional security solutions focus on detecting attackers from outside, but what if the origin of the attacker is inside the enterprise, say Snowden, what can you do? So you probably heard of many cases today employees leaving the company by stealing lots of the company's IP and sensitive data. So UEBA is a new solution try to monitor the behavioral change of the enterprise users to detect both this kind of malicious insider and also the compromised user. >> Host: Behavioral analytics. >> Yes, so it sounds like it's a native analytics which we run like a product. >> Yeah and Jim you've done a lot of work in the industry on this, so any questions you might have for him around UEBA? >> Yeah, give us a sense for how you're incorporating streaming analytics and machine learning into that UEBA solution and then where Spark fits into the overall approach that you take? >> Right, okay. So actually when we started three and a half years back, the first version when we developed the first version of the data pipeline, we used a mix of Hadoop, YARN, Spark, even Apache Storm for different kind of stream and batch analytics work. But soon after with increased maturity and also the momentum from this open source Apache Spark community, we migrated all our stream and batch, you know the ETL and data analytics work into Spark. And it's not just Spark. It's Spark, Spark streaming, MLE, the whole ecosystem of that. So there are at least a couple advantages we have experienced through this kind of a transition. The first thing which really helped us is the simplification of the infrastructure and also the reduction of the DevOps efforts there. >> So simplification around Spark, the whole stack of Spark that you mentioned. >> Yes. >> Okay. >> So for the Niara solution originally, we supported, even here today, we supported both the on-premise and the cloud deployment. For the cloud we also supported the public cloud like AWS, Microsoft Azure, and also Privia Cloud. So you can understand with, if we have to maintain a stack of different like open source tools over this kind of many different deployments, the overhead of doing the DevOps work to monitor, alarming, debugging this kind of infrastructure over different deployments is very hard. So Spark provides us some unified platform. We can integrate the streaming, you know batch, real-time, near real-time, or even longterm batch job all together. So that heavily reduced both the expertise and also the effort required for the DevOps. This is one of the biggest advantages we experienced, and certainly we also experienced something like the scalability, performance, and also the convenience for developers to develop a new applications, all of this, from Spark. >> So are you using the Spark structured streaming runtime inside of your application? Is that true? >> We actually use Spark in the steaming processing when the data, so like in the UEBS solutions, the first thing is collecting a lot of the data, different account data source, network data, cloud application data. So when the data comes in, the first thing is streaming job for the ETL, to process the data. Then after that, we actually also develop the some, like different frequency like one minute, 10 minute, one hour, one day of this analytics job on top of that. And even recently we have started some early adoption of the deep learning into this, how to use deep learning to monitor the user behavior change over time, especially after user gives a notice what user, is user going to access like most servers or download some of the sensitive data? So all of this requires very complex analytics infrastructure. >> Now there were some announcements today here at Spark Summit by Databricks of adding deep learning support to their core Spark code base. What are your thoughts about the deep learning pipelines, API, that they announced this morning? It's new news, I'll understand if you don't, haven't digested it totally, but you probably have some good thoughts on the topic. >> Yes, actually this is also news for me, so I can just speak from my current experience. How to integrate deep learning into Spark actually was a big challenge so far for us because what we used so far, the deep learning piece, we used TensorFlow. And certainly most of our other stream and data massaging or ETL work is done by Spark. So in this case, there are a couple ways to manage this, too. One is to set up two separate resource pool, one for Spark, the other one for TensorFlow, but in our deployment there is some very small on-premise department which has only like four node or five node cluster. It's not efficient to split resource in that way. So we actually also looking for some closer integration between deep learning and Spark. So one thing we looked before is called the TensorFlow on Spark which was open source a couple months ago by Yahoo. >> Right. >> So maybe this is certainly more exciting news for the Spark team to develop this native integration. >> Jim: Very good. >> Okay and we talked about the UEBA solution, but let's go back to a little broader HPE perspective. You have this concept called the intelligent edge, what's that all about? >> So that's a very cool name. Actually come a little bit back. I come from the enterprise background, and enterprise applications have some, actually a lag behind than consumer applications in terms of the adoption of the new data science technology. So there are some native challenges for that. For example, collecting and storing large amount of this enterprise sensitive data is a huge concern, especially in European countries. Also for the similar reason how to collect, normally weigh developer enterprise applications. You're lack of some good quantity and quality of the trending data. So this is some native challenges when you develop enterprise applications, but even despite of this, HPE and Aruba recently made several acquisitions of analytics companies to accelerate the adoption of analytics into different product line. Actually that intelligent age comes from this IOT, which is internet of things, is expected to be the fastest growing market in the next few years here. >> So are you going to be integrating the UEBA behavioral analytics and Spark capability into your IOT portfolio at HP? Is that a strategy or direction for you? >> Yes. Yes, for the big picture that certainly is. So you can think, I think some of the Gartner Report expected the number of the IOT devices is going to grow over 20 billion by 2020. Since all of this IOT devices are connected to either intranet or internet, either through wire or wireless, so as a networking company, we have the advantage of collecting data and even take some actions at the first of place. So the idea of this intelligent age is we want to turn each of these IOT devices, the small IOT devices like IP camera, like those motion detection, all of these small devices as opposed to the distributed sensor for the data collection and also some inline actor to do some real-time or even close to real-time decisions. For example, the behavior anomaly detection is a very good example here. If IOT devices is compromised, if the IP camera has been compromised, then use that to steal your internal data. We should detect and stop that at the first place. >> Can you tell me about the challenges of putting deep learning algorithms natively on resource constrained endpoints in the IOT? That must be really challenging to get them to perform well considering that there may be just a little bit of memory or flash capacity or whatever on the endpoints. Any thoughts about how that can be done effectively and efficiently? >> Very good question >> And at low cost. >> Yes, very good question. So there are two aspects into this. First is this global training of the intelligence which is not going to be done on each of the device. In that case, each of the device is more like the sensor for the data collection. So we are going to build a, collect the data sent to the cloud, or build all of this giant pool, like computing resource to trend the classifier, to trend the model, but when we trend the model, we are going to ship the model, so the inference and the detection of the model of those behavioral anomaly really happen on the endpoint. >> Do the training centrally and then push the trained algorithms down to the edge devices. >> Yes. But even like, the second as well even like you said, some of the device like say people try to put those small chips in the spoon, in the case of, in hospital to make it like more intelligent, you cannot put even just the detection piece there. So we also looking to some new technology. I know like Caffe recently announced, released some of the lightweight deep learning models. Also there's some, your probably know, there's some of the improvement from the chip industry. >> Jim: Yes. >> How to optimize the chip design for this kind of more analytics driven task there. So we are all looking to this different areas now. >> We have just a couple minutes left, and Jim you get one last question after this, but I got to ask you, what's on your wishlist? What do you wish you could learn or maybe what did you come to Spark Summit hoping to take away? >> I've always treated myself as a technical developer. One thing I am very excited these days is the emerging of the new technology, like a Spark, like TensorFlow, like Caffe, even Big-Deal which was announced this morning. So this is something like the first go, when I come to this big advanced industry events, I want to learn the new technology. And the second thing is mostly to share our experience and also about adopting of this new technology and also learn from other colleagues from different industries, how people change life, disrupt the old industry by taking advantage of the new technologies here. >> The community's growing fast. I'm sure you're going to receive what you're looking for. And Jim, final question? >> Yeah, I heard you mention DevOps and Spark in same context, and that's a huge theme we're seeing, more DevOps is being wrapped around the lifecycle of development and training and deployment of machine learning models. If you could have your ideal DevOps tool for Spark developers, what would it look like? What would it do in a nutshell? >> Actually it's still, I just share my personal experience. In Niara, we actually developed a lot of the in-house DevOps tools like for example, when you run a lot of different Spark jobs, stream, batch, like one minute batch verus one day batch job, how do you monitor the status of those workflows? How do you know when the data stop coming? How do you know when the workflow failed? Then even how, monitor is a big thing and then alarming when you have something failure or something wrong, how do you alarm it, and also the debug is another big challenge. So I certainly see the growing effort from both Databricks and the community on different aspects of that. >> Jim: Very good. >> All right, so I'm going to ask you for kind of a soundbite summary. I'm going to put you on the spot here, you're in an elevator and I want you to answer this one question. Spark has enabled me to do blank better than ever before. >> Certainly, certainly. I think as I explained before, it helped a lot from both the developer, even the start-up try to disrupt some industry. It helps a lot, and I'm really excited to see this deep learning integration, all different road map report, you know, down the road. I think they're on the right track. >> All right. Dr. Wang, thank you so much for spending some time with us. We appreciate it and go enjoy the rest of your day. >> Yeah, thanks for being here. >> And thank you for watching the Cube. We're here at Spark Summit 2017. We'll be back after the break with another guest. (easygoing electronic music)

Published Date : Jun 6 2017

SUMMARY :

brought to you by Databricks. at the CTO Office at Hewlett Packard Enterprise. All right and also to my right we have Mr. Jim Kobielus (laughs) of telling me where you came from of the whole enterprise networking business I can follow you (laughs). that you're developing. of the company's IP and sensitive data. Yes, so it sounds like it's a native analytics of the data pipeline, we used a mix of Hadoop, YARN, the whole stack of Spark that you mentioned. We can integrate the streaming, you know batch, of the deep learning into this, but you probably have some good thoughts on the topic. one for Spark, the other one for TensorFlow, for the Spark team to develop this native integration. Okay and we talked about the UEBA solution, Also for the similar reason how to collect, of the IOT devices is going to grow natively on resource constrained endpoints in the IOT? collect the data sent to the cloud, Do the training centrally But even like, the second as well even like you said, So we are all looking to this different areas now. And the second thing is mostly to share our experience And Jim, final question? If you could have your ideal DevOps tool So I certainly see the growing effort All right, so I'm going to ask you even the start-up try to disrupt some industry. We appreciate it and go enjoy the rest of your day. We'll be back after the break with another guest.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
JimPERSON

0.99+

HPEORGANIZATION

0.99+

HPORGANIZATION

0.99+

10 minuteQUANTITY

0.99+

one hourQUANTITY

0.99+

one minuteQUANTITY

0.99+

WangPERSON

0.99+

San FranciscoLOCATION

0.99+

YahooORGANIZATION

0.99+

Jisheng WangPERSON

0.99+

NiaraORGANIZATION

0.99+

first versionQUANTITY

0.99+

one dayQUANTITY

0.99+

two aspectsQUANTITY

0.99+

Jim KobielusPERSON

0.99+

Hewlett Packard EnterpriseORGANIZATION

0.99+

FirstQUANTITY

0.99+

CaffeORGANIZATION

0.99+

SparkTITLE

0.99+

SparkORGANIZATION

0.99+

oneQUANTITY

0.99+

eachQUANTITY

0.99+

three and a half yearQUANTITY

0.99+

bothQUANTITY

0.99+

Sparks Summit 2017EVENT

0.99+

firstQUANTITY

0.99+

DevOpsTITLE

0.99+

2020DATE

0.99+

second thingQUANTITY

0.99+

ArubaORGANIZATION

0.98+

SnowdenPERSON

0.98+

two years backDATE

0.98+

first thingQUANTITY

0.98+

one last questionQUANTITY

0.98+

AWSORGANIZATION

0.98+

over 20 billionQUANTITY

0.98+

one questionQUANTITY

0.98+

UEBATITLE

0.98+

todayDATE

0.98+

Spark SummitEVENT

0.97+

MicrosoftORGANIZATION

0.97+

Spark Summit 2017EVENT

0.96+

ApacheORGANIZATION

0.96+

three and a half years backDATE

0.96+

DatabricksORGANIZATION

0.96+

one day batchQUANTITY

0.96+

earlier this yearDATE

0.94+

ArubaLOCATION

0.94+

OneQUANTITY

0.94+

#SparkSummitEVENT

0.94+

One thingQUANTITY

0.94+

one thingQUANTITY

0.94+

EuropeanLOCATION

0.94+

GartnerORGANIZATION

0.93+

Fireside Chat with Andy Jassy, AWS CEO, at the AWS Summit SF 2017


 

>> Announcer: Please welcome Vice President of Worldwide Marketing, Amazon Web Services, Ariel Kelman. (applause) (techno music) >> Good afternoon, everyone. Thank you for coming. I hope you guys are having a great day here. It is my pleasure to introduce to come up on stage here, the CEO of Amazon Web Services, Andy Jassy. (applause) (techno music) >> Okay. Let's get started. I have a bunch of questions here for you, Andy. >> Just like one of our meetings, Ariel. >> Just like one of our meetings. So, I thought I'd start with a little bit of a state of the state on AWS. Can you give us your quick take? >> Yeah, well, first of all, thank you, everyone, for being here. We really appreciate it. We know how busy you guys are. So, hope you're having a good day. You know, the business is growing really quickly. In the last financials, we released, in Q four of '16, AWS is a 14 billion dollar revenue run rate business, growing 47% year over year. We have millions of active customers, and we consider an active customer as a non-Amazon entity that's used the platform in the last 30 days. And it's really a very broad, diverse customer set, in every imaginable size of customer and every imaginable vertical business segment. And I won't repeat all the customers that I know Werner went through earlier in the keynote, but here are just some of the more recent ones that you've seen, you know NELL is moving their their digital and their connected devices, meters, real estate to AWS. McDonalds is re-inventing their digital platform on top of AWS. FINRA is moving all in to AWS, yeah. You see at Reinvent, Workday announced AWS was its preferred cloud provider, and to start building on top of AWS further. Today, in press releases, you saw both Dunkin Donuts and Here, the geo-spatial map company announced they'd chosen AWS as their provider. You know and then I think if you look at our business, we have a really large non-US or global customer base and business that continues to expand very dramatically. And we're also aggressively increasing the number of geographic regions in which we have infrastructure. So last year in 2016, on top of the broad footprint we had, we added Korea, India, and Canada, and the UK. We've announced that we have regions coming, another one in China, in Ningxia, as well as in France, as well as in Sweden. So we're not close to being done expanding geographically. And then of course, we continue to iterate and innovate really quickly on behalf of all of you, of our customers. I mean, just last year alone, we launched what we considered over 1,000 significant services and features. So on average, our customers wake up every day and have three new capabilities they can choose to use or not use, but at their disposal. You've seen it already this year, if you look at Chime, which is our new unified communication service. It makes meetings much easier to conduct, be productive with. You saw Connect, which is our new global call center routing service. If you look even today, you look at Redshift Spectrum, which makes it easy to query all your data, not just locally on disk in your data warehouse but across all of S3, or DAX, which puts a cash in front of DynamoDB, we use the same interface, or all the new features in our machine learning services. We're not close to being done delivering and iterating on your behalf. And I think if you look at that collection of things, it's part of why, as Gartner looks out at the infrastructure space, they estimate the AWS is several times the size business of the next 14 providers combined. It's a pretty significant market segment leadership position. >> You talked a lot about adopts in there, a lot of customers moving to AWS, migrating large numbers of workloads, some going all in on AWS. And with that as kind of backdrop, do you still see a role for hybrid as being something that's important for customers? >> Yeah, it's funny. The quick answer is yes. I think the, you know, if you think about a few years ago, a lot of the rage was this debate about private cloud versus what people call public cloud. And we don't really see that debate very often anymore. I think relatively few companies have had success with private clouds, and most are pretty substantially moving in the direction of building on top of clouds like AWS. But, while you increasingly see more and more companies every month announcing that they're going all in to the cloud, we will see most enterprises operate in some form of hybrid mode for the next number of years. And I think in the early days of AWS and the cloud, I think people got confused about this, where they thought that they had to make this binary decision to either be all in on the public cloud and AWS or not at all. And of course that's not the case. It's not a binary decision. And what we know many of our enterprise customers want is they want to be able to run the data centers that they're not ready to retire yet as seamlessly as they can alongside of AWS. And it's why we've built a lot of the capabilities we've built the last several years. These are things like PPC, which is our virtual private cloud, which allows you to cordon off a portion of our network, deploy resources into it and connect to it through VPN or Direct Connect, which is a private connection between your data centers and our regions or our storage gateway, which is a virtual storage appliance, or Identity Federation, or a whole bunch of capabilities like that. But what we've seen, even though the vast majority of the big hybrid implementations today are built on top of AWS, as more and more of the mainstream enterprises are now at the point where they're really building substantial cloud adoption plans, they've come back to us and they've said, well, you know, actually you guys have made us make kind of a binary decision. And that's because the vast majority of the world is virtualized on top of VMWare. And because VMWare and AWS, prior to a few months ago, had really done nothing to try and make it easy to use the VMWare tools that people have been using for many years seamlessly with AWS, customers were having to make a binary choice. Either they stick with the VMWare tools they've used for a while but have a really tough time integrating with AWS, or they move to AWS and they have to leave behind the VMWare tools they've been using. And it really was the impetus for VMWare and AWS to have a number of deep conversations about it, which led to the announcement we made late last fall of VMWare and AWS, which is going to allow customers who have been using the VMWare tools to manage their infrastructure for a long time to seamlessly be able to run those on top of AWS. And they get to do so as they move workloads back and forth and they evolve their hybrid implementation without having to buy any new hardware, which is a big deal for companies. Very few companies are looking to find ways to buy more hardware these days. And customers have been very excited about this prospect. We've announced that it's going to be ready in the middle of this year. You see companies like Amadeus and Merck and Western Digital and the state of Louisiana, a number of others, we've a very large, private beta and preview happening right now. And people are pretty excited about that prospect. So we will allow customers to run in the mode that they want to run, and I think you'll see a huge transition over the next five to 10 years. >> So in addition to hybrid, another question we get a lot from enterprises around the concept of lock-in and how they should think about their relationship with the vendor and how they should think about whether to spread the workloads across multiple infrastructure providers. How do you think about that? >> Well, it's a question we get a lot. And Oracle has sure made people care about that issue. You know, I think people are very sensitive about being locked in, given the experience that they've had over the last 10 to 15 years. And I think the reality is when you look at the cloud, it really is nothing like being locked into something like Oracle. The APIs look pretty similar between the various providers. We build an open standard, it's like Linux and MySQL and Postgres. All the migration tools that we build allow you to migrate in or out of AWS. It's up to customers based on how they want to run their workload. So it is much easier to move away from something like the cloud than it is from some of the old software services that has created some of this phobia. But I think when you look at most CIOs, enterprise CIOs particularly, as they think about moving to the cloud, many of them started off thinking that they, you know, very well might split their workloads across multiple cloud providers. And I think when push comes to shove, very few decide to do so. Most predominately pick an infrastructure provider to run their workloads. And the reason that they don't split it across, you know, pretty evenly across clouds is a few reasons. Number one, if you do so, you have to standardize in the lowest common denominator. And these platforms are in radically different stages at this point. And if you look at something like AWS, it has a lot more functionality than anybody else by a large margin. And we're also iterating more quickly than you'll find from the other providers. And most folks don't want to tie the hands of their developers behind their backs in the name of having the ability of splitting it across multiple clouds, cause they actually are, in most of their spaces, competitive, and they have a lot of ideas that they want to actually build and invent on behalf of their customers. So, you know, they don't want to actually limit their functionality. It turns out the second reason is that they don't want to force their development teams to have to learn multiple platforms. And most development teams, if any of you have managed multiple stacks across different technologies, and many of us have had that experience, it's a pain in the butt. And trying to make a shift from what you've been doing for the last 30 years on premises to the cloud is hard enough. But then forcing teams to have to get good at running across two or three platforms is something most teams don't relish, and it's wasteful of people's time, it's wasteful of natural resources. That's the second thing. And then the third reason is that you effectively diminish your buying power because all of these cloud providers have volume discounts, and then you're splitting what you buy across multiple providers, which gives you a lower amount you buy from everybody at a worse price. So when most CIOs and enterprises look at this carefully, they don't actually end up splitting it relatively evenly. They predominately pick a cloud provider. Some will just pick one. Others will pick one and then do a little bit with a second, just so they know they can run with a second provider, in case that relationship with the one they choose to predominately run with goes sideways in some fashion. But when you really look at it, CIOs are not making that decision to split it up relatively evenly because it makes their development teams much less capable and much less agile. >> Okay, let's shift gears a little bit, talk about a subject that's on the minds of not just enterprises but startups and government organizations and pretty much every organization we talk to. And that's AI and machine learning. Reinvent, we introduced our Amazon AI services and just this morning Werner announced the general availability of Amazon Lex. So where are we overall on machine learning? >> Well it's a hugely exciting opportunity for customers, and I think, we believe it's exciting for us as well. And it's still in the relatively early stages, if you look at how people are using it, but it's something that we passionately believe is going to make a huge difference in the world and a huge difference with customers, and that we're investing a pretty gigantic amount of resource and capability for our customers. And I think the way that we think about, at a high level, the machine learning and deep learning spaces are, you know, there's kind of three macro layers of the stack. I think at that bottom layer, it's generally for the expert machine learning practitioners, of which there are relatively few in the world. It's a scarce resource relative to what I think will be the case in five, 10 years from now. And these are folks who are comfortable working with deep learning engines, know how to build models, know how to tune those models, know how to do inference, know how to get that data from the models into production apps. And for that group of people, if you look at the vast majority of machine learning and deep learning that's being done in the cloud today, it's being done on top of AWS, are P2 instances, which are optimized for deep learning and our deep learning AMIs, that package, effectively the deep learning engines and libraries inside those AMIs. And you see companies like Netflix, Nvidia, and Pinterest and Stanford and a whole bunch of others that are doing significant amounts of machine learning on top of those optimized instances for machine learning and the deep learning AMIs. And I think that you can expect, over time, that we'll continue to build additional capabilities and tools for those expert practitioners. I think we will support and do support every single one of the deep learning engines on top of AWS, and we have a significant amount of those workloads with all those engines running on top of AWS today. We also are making, I would say, a disproportionate investment of our own resources and the MXNet community just because if you look at running deep learning models once you get beyond a few GPUs, it's pretty difficult to have those scale as you get into the hundreds of GPUs. And most of the deep learning engines don't scale very well horizontally. And so what we've found through a lot of extensive testing, cause remember, Amazon has thousands of deep learning experts inside the company that have built very sophisticated deep learning capabilities, like the ones you see in Alexa, we have found that MXNet scales the best and almost linearly, as we continue to add nodes, as we continue to horizontally scale. So we have a lot of investment at that bottom layer of the stack. Now, if you think about most companies with developers, it's still largely inaccessible to them to do the type of machine learning and deep learning that they'd really like to do. And that's because the tools, I think, are still too primitive. And there's a number of services out there, we built one ourselves in Amazon Machine Learning that we have a lot of customers use, and yet I would argue that all of those services, including our own, are still more difficult than they should be for everyday developers to be able to build machine learning and access machine learning and deep learning. And if you look at the history of what AWS has done, in every part of our business, and a lot of what's driven us, is trying to democratize technologies that were really only available and accessible before to a select, small number of companies. And so we're doing a lot of work at what I would call that middle layer of the stack to get rid of a lot of the muck associated with having to do, you know, building the models, tuning the models, doing the inference, figuring how to get the data into production apps, a lot of those capabilities at that middle layer that we think are really essential to allow deep learning and machine learning to reach its full potential. And then at the top layer of the stack, we think of those as solutions. And those are things like, pass me an image and I'll tell you what that image is, or show me this face, does it match faces in this group of faces, or pass me a string of text and I'll give you an mpg file, or give me some words and what your intent is and then I'll be able to return answers that allow people to build conversational apps like the Lex technology. And we have a whole bunch of other services coming in that area, atop of Lex and Polly and Recognition, and you can imagine some of those that we've had to use in Amazon over the years that we'll continue to make available for you, our customers. So very significant level of investment at all three layers of that stack. We think it's relatively early days in the space but have a lot of passion and excitement for that. >> Okay, now for ML and AI, we're seeing customers wanting to load in tons of data, both to train the models and to actually process data once they've built their models. And then outside of ML and AI, we're seeing just as much demand to move in data for analytics and traditional workloads. So as people are looking to move more and more data to the cloud, how are we thinking about making it easier to get data in? >> It's a great question. And I think it's actually an often overlooked question because a lot of what gets attention with customers is all the really interesting services that allow you to do everything from compute and storage and database and messaging and analytics and machine learning and AI. But at the end of the day, if you have a significant amount of data already somewhere else, you have to get it into the cloud to be able to take advantage of all these capabilities that you don't have on premises. And so we have spent a disproportionate amount of focus over the last few years trying to build capabilities for our customers to make this easier. And we have a set of capabilities that really is not close to matched anywhere else, in part because we have so many customers who are asking for help in this area that it's, you know, that's really what drives what we build. So of course, you could use the good old-fashioned wire to send data over the internet. Increasingly, we find customers that are trying to move large amounts of data into S3, is using our S3 transfer acceleration service, which basically uses our points of presence, or POPs, all over the world to expedite delivery into S3. You know, a few years ago, we were talking to a number of companies that were looking to make big shifts to the cloud, and they said, well, I need to move lots of data that just isn't viable for me to move it over the wire, given the connection we can assign to it. It's why we built Snowball. And so we launched Snowball a couple years ago, which is really, it's a 50 terabyte appliance that is encrypted, the data's encrypted three different ways, and you ingest the data from your data center into Snowball, it has a Kindle connected to it, it allows you to, you know, that makes sure that you send it to the right place, and you can also track the progress of your high-speed ingestion into our data centers. And when we first launched Snowball, we launched it at Reinvent a couple years ago, I could not believe that we were going to order as many Snowballs to start with as the team wanted to order. And in fact, I reproached the team and I said, this is way too much, why don't we first see if people actually use any of these Snowballs. And so the team thankfully didn't listen very carefully to that, and they really only pared back a little bit. And then it turned out that we, almost from the get-go, had ordered 10X too few. And so this has been something that people have used in a very broad, pervasive way all over the world. And last year, at the beginning of the year, as we were asking people what else they would like us to build in Snowball, customers told us a few things that were pretty interesting to us. First, one that wasn't that surprising was they said, well, it would be great if they were bigger, you know, if instead of 50 terabytes it was more data I could store on each device. Then they said, you know, one of the problems is when I load the data onto a Snowball and send it to you, I have to still keep my local copy on premises until it's ingested, cause I can't risk losing that data. So they said it would be great if you could find a way to provide clustering, so that I don't have to keep that copy on premises. That was pretty interesting. And then they said, you know, there's some of that data that I'd actually like to be loading synchronously to S3, and then, or some things back from S3 to that data that I may want to compare against. That was interesting, having that endpoint. And then they said, well, we'd really love it if there was some compute on those Snowballs so I can do analytics on some relatively short-term signals that I want to take action on right away. Those were really the pieces of feedback that informed Snowball Edge, which is the next version of Snowball that we launched, announced at Reinvent this past November. So it has, it's a hundred-terabyte appliance, still the same level of encryption, and it has clustering so that you don't have to keep that copy of the data local. It allows you to have an endpoint to S3 to synchronously load data back and forth, and then it has a compute inside of it. And so it allows customers to use these on premises. I'll give you a good example. GE is using these for their wind turbines. And they collect all kinds of data from those turbines, but there's certain short-term signals they want to do analytics on in as close to real time as they can, and take action on those. And so they use that compute to do the analytics and then when they fill up that Snowball Edge, they detach it and send it back to AWS to do broad-scale analytics in the cloud and then just start using an additional Snowball Edge to capture that short-term data and be able to do those analytics. So Snowball Edge is, you know, we just launched it a couple months ago, again, amazed at the type of response, how many customers are starting to deploy those all over the place. I think if you have exabytes of data that you need to move, it's not so easy. An exabyte of data, if you wanted to move from on premises to AWS, would require 10,000 Snowball Edges. Those customers don't want to really manage a fleet of 10,000 Snowball Edges if they don't have to. And so, we tried to figure out how to solve that problem, and it's why we launched Snowmobile back at Reinvent in November, which effectively, it's a hundred-petabyte container on a 45-foot trailer that we will take a truck and bring out to your facility. It comes with its own power and its own network fiber that we plug in to your data center. And if you want to move an exabyte of data over a 10 gigabit per second connection, it would take you 26 years. But using 10 Snowmobiles, it would take you six months. So really different level of scale. And you'd be surprised how many companies have exabytes of data at this point that they want to move to the cloud to get all those analytics and machine learning capabilities running on top of them. Then for streaming data, as we have more and more companies that are doing real-time analytics of streaming data, we have Kinesis, where we built something called the Kinesis Firehose that makes it really simple to stream all your real-time data. We have a storage gateway for companies that want to keep certain data hot, locally, and then asynchronously be loading the rest of their data to AWS to be able to use in different formats, should they need it as backup or should they choose to make a transition. So it's a very broad set of storage capabilities. And then of course, if you've moved a lot of data into the cloud or into anything, you realize that one of the hardest parts that people often leave to the end is ETL. And so we have announced an ETL service called Glue, which we announced at Reinvent, which is going to make it much easier to move your data, be able to find your data and map your data to different locations and do ETL, which of course is hugely important as you're moving large amounts. >> So we've talked a lot about moving things to the cloud, moving applications, moving data. But let's shift gears a little bit and talk about something not on the cloud, connected devices. >> Yeah. >> Where do they fit in and how do you think about edge? >> Well, you know, I've been working on AWS since the start of AWS, and we've been in the market for a little over 11 years at this point. And we have encountered, as I'm sure all of you have, many buzzwords. And of all the buzzwords that everybody has talked about, I think I can make a pretty strong argument that the one that has delivered fastest on its promise has been IOT and connected devices. Just amazing to me how much is happening at the edge today and how fast that's changing with device manufacturers. And I think that if you look out 10 years from now, when you talk about hybrid, I think most companies, majority on premise piece of hybrid will not be servers, it will be connected devices. There are going to be billions of devices all over the place, in your home, in your office, in factories, in oil fields, in agricultural fields, on ships, in cars, in planes, everywhere. You're going to have these assets that sit at the edge that companies are going to want to be able to collect data on, do analytics on, and then take action. And if you think about it, most of these devices, by their very nature, have relatively little CPU and have relatively little disk, which makes the cloud disproportionately important for them to supplement them. It's why you see most of the big, successful IOT applications today are using AWS to supplement them. Illumina has hooked up their genome sequencing to AWS to do analytics, or you can look at Major League Baseball Statcast is an IOT application built on top of AWS, or John Deer has over 200,000 telematically enabled tractors that are collecting real-time planting conditions and information that they're doing analytics on and sending it back to farmers so they can figure out where and how to optimally plant. Tata Motors manages their truck fleet this way. Phillips has their smart lighting project. I mean, there're innumerable amounts of these IOT applications built on top of AWS where the cloud is supplementing the device's capability. But when you think about these becoming more mission-critical applications for companies, there are going to be certain functions and certain conditions by which they're not going to want to connect back to the cloud. They're not going to want to take the time for that round trip. They're not going to have connectivity in some cases to be able to make a round trip to the cloud. And what they really want is customers really want the same capabilities they have on AWS, with AWS IOT, but on the devices themselves. And if you've ever tried to develop on these embedded devices, it's not for mere mortals. It's pretty delicate and it's pretty scary and there's a lot of archaic protocols associated with it, pretty tough to do it all and to do it without taking down your application. And so what we did was we built something called Greengrass, and we announced it at Reinvent. And Greengrass is really like a software module that you can effectively have inside your device. And it allows developers to write lambda functions, it's got lambda inside of it, and it allows customers to write lambda functions, some of which they want to run in the cloud, some of which they want to run on the device itself through Greengrass. So they have a common programming model to build those functions, to take the signals they see and take the actions they want to take against that, which is really going to help, I think, across all these IOT devices to be able to be much more flexible and allow the devices and the analytics and the actions you take to be much smarter, more intelligent. It's also why we built Snowball Edge. Snowball Edge, if you think about it, is really a purpose-built Greengrass device. We have Greengrass, it's inside of the Snowball Edge, and you know, the GE wind turbine example is a good example of that. And so it's to us, I think it's the future of what the on-premises piece of hybrid's going to be. I think there're going to be billions of devices all over the place and people are going to want to interact with them with a common programming model like they use in AWS and the cloud, and we're continuing to invest very significantly to make that easier and easier for companies. >> We've talked about several feature directions. We talked about AI, machine learning, the edge. What are some of the other areas of investment that this group should care about? >> Well there's a lot. (laughs) That's not a suit question, Ariel. But there's a lot. I think, I'll name a few. I think first of all, as I alluded to earlier, we are not close to being done expanding geographically. I think virtually every tier-one country will have an AWS region over time. I think many of the emerging countries will as well. I think the database space is an area that is radically changing. It's happening at a faster pace than I think people sometimes realize. And I think it's good news for all of you. I think the database space over the last few decades has been a lonely place for customers. I think that they have felt particularly locked into companies that are expensive and proprietary and have high degrees of lock-in and aren't so customer-friendly. And I think customers are sick of it. And we have a relational database service that we launched many years ago and has many flavors that you can run. You can run MySQL, you can run Postgres, you can run MariaDB, you can run SQLServer, you can run Oracle. And what a lot of our customers kept saying to us was, could you please figure out a way to have a database capability that has the performance characteristics of the commercial-grade databases but the customer-friendly and pricing model of the more open engines like the MySQL and Postgres and MariaDB. What you do on your own, we do a lot of it at Amazon, but it's hard, I mean, it takes a lot of work and a lot of tuning. And our customers really wanted us to solve that problem for them. And it's why we spent several years building Aurora, which is our own database engine that we built, but that's fully compatible with MySQL and with Postgres. It's at least as fault tolerant and durable and performant as the commercial-grade databases, but it's a tenth of the cost of those. And it's also nice because if it turns out that you use Aurora and you decide for whatever reason you don't want to use Aurora anymore, because it's fully compatible with MySQL and Postgres, you just dump it to the community versions of those, and off you are. So there's really hardly any transition there. So that is the fastest-growing service in the history of AWS. I'm amazed at how quickly it's grown. I think you may have heard earlier, we've had 23,000 database migrations just in the last year or so. There's a lot of pent-up demand to have database freedom. And we're here to help you have it. You know, I think on the analytic side, it's just never been easier and less expensive to collect, store, analyze, and share data than it is today. Part of that has to do with the economics of the cloud. But a lot of it has to do with the really broad analytics capability that we provide you. And it's a much broader capability than you'll find elsewhere. And you know, you can manage Hadoop and Spark and Presto and Hive and Pig and Yarn on top of AWS, or we have a managed elastic search service, and you know, of course we have a very high scale, very high performing data warehouse in Redshift, that just got even more performant with Spectrum, which now can query across all of your S3 data, and of course you have Athena, where you can query S3 directly. We have a service that allows you to do real-time analytics of streaming data in Kinesis. We have a business intelligence service in QuickSight. We have a number of machine learning capabilities I talked about earlier. It's a very broad array. And what we find is that it's a new day in analytics for companies. A lot of the data that companies felt like they had to throw away before, either because it was too expensive to hold or they didn't really have the tools accessible to them to get the learning from that data, it's a totally different day today. And so we have a pretty big investment in that space, I mentioned Glue earlier to do ETL on all that data. We have a lot more coming in that space. I think compute, super interesting, you know, I think you will find, I think we will find that companies will use full instances for many, many years and we have, you know, more than double the number of instances than you'll find elsewhere in every imaginable shape and size. But I would also say that the trend we see is that more and more companies are using smaller units of compute, and it's why you see containers becoming so popular. We have a really big business in ECS. And we will continue to build out the capability there. We have companies really running virtually every type of container and orchestration and management service on top of AWS at this point. And then of course, a couple years ago, we pioneered the event-driven serverless capability in compute that we call Lambda, which I'm just again, blown away by how many customers are using that for everything, in every way. So I think the basic unit of compute is continuing to get smaller. I think that's really good for customers. I think the ability to be serverless is a very exciting proposition that we're continuing to to fulfill that vision that we laid out a couple years ago. And then, probably, the last thing I'd point out right now is, I think it's really interesting to see how the basic procurement of software is changing. In significant part driven by what we've been doing with our Marketplace. If you think about it, in the old world, if you were a company that was buying software, you'd have to go find bunch of the companies that you should consider, you'd have to have a lot of conversations, you'd have to talk to a lot of salespeople. Those companies, by the way, have to have a big sales team, an expensive marketing budget to go find those companies and then go sell those companies and then both companies engage in this long tap-dance around doing an agreement and the legal terms and the legal teams and it's just, the process is very arduous. Then after you buy it, you have to figure out how you're going to actually package it, how you're deploy to infrastructure and get it done, and it's just, I think in general, both consumers of software and sellers of software really don't like the process that's existed over the last few decades. And then you look at AWS Marketplace, and we have 35 hundred product listings in there from 12 hundred technology providers. If you look at the number of hours, that software that's been running EC2 just in the last month alone, it's several hundred million hours, EC2 hours, of that software being run on top of our Marketplace. And it's just completely changing how software is bought and procured. I think that if you talk to a lot of the big sellers of software, like Splunk or Trend Micro, there's a whole number of them, they'll tell you it totally changes their ability to be able to sell. You know, one of the things that really helped AWS in the early days and still continues to help us, is that we have a self-service model where we don't actually have to have a lot of people talk to every customer to get started. I think if you're a seller of software, that's very appealing, to allow people to find your software and be able to buy it. And if you're a consumer, to be able to buy it quickly, again, without the hassle of all those conversations and the overhead associated with that, very appealing. And I think it's why the marketplace has just exploded and taken off like it has. It's also really good, by the way, for systems integrators, who are often packaging things on top of that software to their clients. This makes it much easier to build kind of smaller catalogs of software products for their customers. I think when you layer on top of that the capabilities that we've announced to make it easier for SASS providers to meter and to do billing and to do identity is just, it's a very different world. And so I think that also is very exciting, both for companies and customers as well as software providers. >> We certainly touched on a lot here. And we have a lot going on, and you know, while we have customers asking us a lot about how they can use all these new services and new features, we also tend to get a lot of questions from customers on how we innovate so quickly, and they can think about applying some of those lessons learned to their own businesses. >> So you're asking how we're able to innovate quickly? >> Mmm hmm. >> I think there's a few things that have helped us, and it's different for every company. But some of these might be helpful. I'll point to a few. I think the first thing is, I think we disproportionately index on hiring builders. And we think of builders as people who are inventors, people who look at different customer experiences really critically, are honest about what's flawed about them, and then seek to reinvent them. And then people who understand that launch is the starting line and not the finish line. There's very little that any of us ever built that's a home run right out of the gate. And so most things that succeed take a lot of listening to customers and a lot of experimentation and a lot of iterating before you get to an equation that really works. So the first thing is who we hire. I think the second thing is how we organize. And we have, at Amazon, long tried to organize into as small and separable and autonomous teams as we can, that have all the resources in those teams to own their own destiny. And so for instance, the technologists and the product managers are part of the same team. And a lot of that is because we don't want the finger pointing that goes back and forth between the teams, and if they're on the same team, they focus all their energy on owning it together and understanding what customers need from them, spending a disproportionate amount of time with customers, and then they get to own their own roadmaps. One of the reasons we don't publish a 12 to 18 month roadmap is we want those teams to have the freedom, in talking to customers and listening to what you tell us matters, to re-prioritize if there are certain things that we assumed mattered more than it turns out it does. So, you know I think that the way that we organize is the second piece. I think a third piece is all of our teams get to use the same AWS building blocks that all of you get to use, which allow you to move much more quickly. And I think one of the least told stories about Amazon over the last five years, in part because people have gotten interested in AWS, is people have missed how fast our consumer business at Amazon has iterated. Look at the amount of invention in Amazon's consumer business. And they'll tell you that a big piece of that is their ability to use the AWS building blocks like they do. I think a fourth thing is many big companies, as they get larger, what starts to happen is what people call the institutional no, which is that leaders walk into meetings on new ideas looking to find ways to say no, and not because they're ill intended but just because they get more conservative or they have a lot on their plate or things are really managed very centrally, so it's hard to imagine adding more to what you're already doing. At Amazon, it's really the opposite, and in part because of the way we're organized in such a decoupled, decentralized fashion, and in part because it's just part of our DNA. When the leaders walk into a meeting, they are looking for ways to say yes. And we don't say yes to everything, we have a lot of proposals. But we say yes to a lot more than I think virtually any other company on the planet. And when we're having conversations with builders who are proposing new ideas, we're in a mode where we're trying to problem-solve with them to get to yes, which I think is really different. And then I think the last thing is that we have mechanisms inside the company that allow us to make fast decisions. And if you want a little bit more detail, you should read our founder and CEO Jeff Bezos's shareholder letter, which just was released. He talks about the fast decision-making that happens inside the company. It's really true. We make fast decisions and we're willing to fail. And you know, we sometimes talk about how we're working on several of our next biggest failures, and we hope that most of the things we're doing aren't going to fail, but we know, if you're going to push the envelope and if you're going to experiment at the rate that we're trying to experiment, to find more pillars that allow us to do more for customers and allow us to be more relevant, you are going to fail sometimes. And you have to accept that, and you have to have a way of evaluating people that recognizes the inputs, meaning the things that they actually delivered as opposed to the outputs, cause on new ventures, you don't know what the outputs are going to be, you don't know consumers or customers are going to respond to the new thing you're trying to build. So you have to be able to reward employees on the inputs, you have to have a way for them to continue to progress and grow in their career even if they work on something didn't work. And you have to have a way of thinking about, when things don't work, how do I take the technology that I built as part of that, that really actually does work, but I didn't get it right in the form factor, and use it for other things. And I think that when you think about a culture like Amazon, that disproportionately hires builders, organizes into these separable, autonomous teams, and allows them to use building blocks to move fast, and has a leadership team that's looking to say yes to ideas and is willing to fail, you end up finding not only do you do more inventing but you get the people at every level of the organization spending their free cycles thinking about new ideas because it actually pays to think of new ideas cause you get a shot to try it. And so that has really helped us and I think most of our customers who have made significant shifts to AWS and the cloud would argue that that's one of the big transformational things they've seen in their companies as well. >> Okay. I want to go a little bit deeper on the subject of culture. What are some of the things that are most unique about the AWS culture that companies should know about when they're looking to partner with us? >> Well, I think if you're making a decision on a predominant infrastructure provider, it's really important that you decide that the culture of the company you're going to partner with is a fit for yours. And you know, it's a super important decision that you don't want to have to redo multiple times cause it's wasted effort. And I think that, look, I've been at Amazon for almost 20 years at this point, so I have obviously drank the Kool Aid. But there are a few things that I think are truly unique about Amazon's culture. I'll talk about three of them. The first is I think that we are unusually customer-oriented. And I think a lot of companies talk about being customer-oriented, but few actually are. I think most of the big technology companies truthfully are competitor-focused. They kind of look at what competitors are doing and then they try to one-up one another. You have one or two of them that I would say are product-focused, where they say, hey, it's great, you Mr. and Mrs. Customer have ideas on a product, but leave that to the experts, and you know, you'll like the products we're going to build. And those strategies can be good ones and successful ones, they're just not ours. We are driven by what customers tell us matters to them. We don't build technology for technology's sake, we don't become, you know, smitten by any one technology. We're trying to solve real problems for our customers. 90% of what we build is driven by what you tell us matters. And the other 10% is listening to you, and even if you can't articulate exactly what you want, trying to read between the lines and invent on your behalf. So that's the first thing. Second thing is that we are pioneers. We really like to invent, as I was talking about earlier. And I think most big technology companies at this point have either lost their will or their DNA to invent. Most of them acquire it or fast follow. And again, that can be a successful strategy. It's just not ours. I think in this day and age, where we're going through as big a shift as we are in the cloud, which is the biggest technology shift in our lifetime, as dynamic as it is, being able to partner with a company that has the most functionality, it's iterating the fastest, has the most customers, has the largest ecosystem of partners, has SIs and ISPs, that has had a vision for how all these pieces fit together from the start, instead of trying to patch them together in a following act, you have a big advantage. I think that the third thing is that we're unusually long-term oriented. And I think that you won't ever see us show up at your door the last day of a quarter, the last day of a year, trying to harass you into doing some kind of deal with us, not to be heard from again for a couple years when we either audit you or try to re-up you for a deal. That's just not the way that we will ever operate. We are trying to build a business, a set of relationships, that will outlast all of us here. And I think something that always ties it together well is this trusted advisor capability that we have inside our support function, which is, you know, we look at dozens of programmatic ways that our customers are using the platform and reach out to you if you're doing something we think's suboptimal. And one of the things we do is if you're not fully utilizing resources, or hardly, or not using them at all, we'll reach out and say, hey, you should stop paying for this. And over the last couple of years, we've sent out a couple million of these notifications that have led to actual annualized savings for customers of 350 million dollars. So I ask you, how many of your technology partners reach out to you and say stop spending money with us? To the tune of 350 million dollars lost revenue per year. Not too many. And I think when we first started doing it, people though it was gimmicky, but if you understand what I just talked about with regard to our culture, it makes perfect sense. We don't want to make money from customers unless you're getting value. We want to reinvent an experience that we think has been broken for the prior few decades. And then we're trying to build a relationship with you that outlasts all of us, and we think the best way to do that is to provide value and do right by customers over a long period of time. >> Okay, keeping going on the culture subject, what about some of the quirky things about Amazon's culture that people might find interesting or useful? >> Well there are a lot of quirky parts to our culture. And I think any, you know lots of companies who have strong culture will argue they have quirky pieces but I think there's a few I might point to. You know, I think the first would be the first several years I was with the company, I guess the first six years or so I was at the company, like most companies, all the information that was presented was via PowerPoint. And we would find that it was a very inefficient way to consume information. You know, you were often shaded by the charisma of the presenter, sometimes you would overweight what the presenters said based on whether they were a good presenter. And vice versa. You would very rarely have a deep conversation, cause you have no room on PowerPoint slides to have any depth. You would interrupt the presenter constantly with questions that they hadn't really thought through cause they didn't think they were going to have to present that level of depth. You constantly have the, you know, you'd ask the question, oh, I'm going to get to that in five slides, you want to do that now or you want to do that in five slides, you know, it was just maddening. And we would often find that most of the meetings required multiple meetings. And so we made a decision as a company to effectively ban PowerPoints as a communication vehicle inside the company. Really the only time I do PowerPoints is at Reinvent. And maybe that shows. And what we found is that it's a much more substantive and effective and time-efficient way to have conversations because there is no way to fake depth in a six-page narrative. So what we went to from PowerPoint was six-page narrative. You can write, have as much as you want in the appendix, but you have to assume nobody will read the appendices. Everything you have to communicate has to be done in six pages. You can't fake depth in a six-page narrative. And so what we do is we all get to the room, we spend 20 minutes or so reading the document so it's fresh in everybody's head. And then where we start the conversation is a radically different spot than when you're hearing a presentation one kind of shallow slide at a time. We all start the conversation with a fair bit of depth on the topic, and we can really hone in on the three or four issues that typically matter in each of these conversations. So we get to the heart of the matter and we can have one meeting on the topic instead of three or four. So that has been really, I mean it's unusual and it takes some time getting used to but it is a much more effective way to pay attention to the detail and have a substantive conversation. You know, I think a second thing, if you look at our working backwards process, we don't write a lot of code for any of our services until we write and refine and decide we have crisp press release and frequently asked question, or FAQ, for that product. And in the press release, what we're trying to do is make sure that we're building a product that has benefits that will really matter. How many times have we all gotten to the end of products and by the time we get there, we kind of think about what we're launching and think, this is not that interesting. Like, people are not going to find this that compelling. And it's because you just haven't thought through and argued and debated and made sure that you drew the line in the right spot on a set of benefits that will really matter to customers. So that's why we use the press release. The FAQ is to really have the arguments up front about how you're building the product. So what technology are you using? What's the architecture? What's the customer experience? What's the UI look like? What's the pricing dimensions? Are you going to charge for it or not? All of those decisions, what are people going to be most excited about, what are people going to be most disappointed by. All those conversations, if you have them up front, even if it takes you a few times to go through it, you can just let the teams build, and you don't have to check in with them except on the dates. And so we find that if we take the time up front we not only get the products right more often but the teams also deliver much more quickly and with much less churn. And then the third thing I'd say that's kind of quirky is it is an unusually truth-seeking culture at Amazon. I think we have a leadership principle that we say have backbone, disagree, and commit. And what it means is that we really expect people to speak up if they believe that we're headed down a path that's wrong for customers, no matter who is advancing it, what level in the company, everybody is empowered and expected to speak up. And then once we have the debate, then we all have to pull the same way, even if it's a different way than you were advocating. And I think, you always hear the old adage of where, two people look at a ceiling and one person says it's 14 feet and the other person says, it's 10 feet, and they say, okay let's compromise, it's 12 feet. And of course, it's not 12 feet, there is an answer. And not all things that we all consider has that black and white answer, but most things have an answer that really is more right if you actually assess it and debate it. And so we have an environment that really empowers people to challenge one another and I think it's part of why we end up getting to better answers, cause we have that level of openness and rigor. >> Okay, well Andy, we have time for one more question. >> Okay. >> So other than some of the things you've talked about, like customer focus, innovation, and long-term orientation, what is the single most important lesson that you've learned that is really relevant to this audience and this time we're living in? >> There's a lot. But I'll pick one. I would say I'll tell a short story that I think captures it. In the early days at Amazon, our sole business was what we called an owned inventory retail business, which meant we bought the inventory from distributors or publishers or manufacturers, stored it in our own fulfillment centers and shipped it to customers. And around the year 1999 or 2000, this third party seller model started becoming very popular. You know, these were companies like Half.com and eBay and folks like that. And we had a really animated debate inside the company about whether we should allow third party sellers to sell on the Amazon site. And the concerns internally were, first of all, we just had this fundamental belief that other sellers weren't going to care as much about the customer experience as we did cause it was such a central part of everything we did DNA-wise. And then also we had this entire business and all this machinery that was built around owned inventory business, with all these relationships with publishers and distributors and manufacturers, who we didn't think would necessarily like third party sellers selling right alongside us having bought their products. And so we really debated this, and we ultimately decided that we were going to allow third party sellers to sell in our marketplace. And we made that decision in part because it was better for customers, it allowed them to have lower prices, so more price variety and better selection. But also in significant part because we realized you can't fight gravity. If something is going to happen, whether you want it to happen or not, it is going to happen. And you are much better off cannibalizing yourself or being ahead of whatever direction the world is headed than you are at howling at the wind or wishing it away or trying to put up blockers and find a way to delay moving to the model that is really most successful and has the most amount of benefits for the customers in question. And that turned out to be a really important lesson for Amazon as a company and for me, personally, as well. You know, in the early days of doing Marketplace, we had all kinds of folks, even after we made the decision, that despite the have backbone, disagree and commit weren't really sure that they believed that it was going to be a successful decision. And it took several months, but thankfully we really were vigilant about it, and today in roughly half of the units we sell in our retail business are third party seller units. Been really good for our customers. And really good for our business as well. And I think the same thing is really applicable to the space we're talking about today, to the cloud, as you think about this gigantic shift that's going on right now, moving to the cloud, which is, you know, I think in the early days of the cloud, the first, I'll call it six, seven, eight years, I think collectively we consumed so much energy with all these arguments about are people going to move to the cloud, what are they going to move to the cloud, will they move mission-critical applications to the cloud, will the enterprise adopt it, will public sector adopt it, what about private cloud, you know, we just consumed a huge amount of energy and it was, you can see both in the results in what's happening in businesses like ours, it was a form of fighting gravity. And today we don't really have if conversations anymore with our customers. They're all when and how and what order conversations. And I would say that this going to be a much better world for all of us, because we will be able to build in a much more cost effective fashion, we will be able to build much more quickly, we'll be able to take our scarce resource of engineers and not spend their resource on the undifferentiated heavy lifting of infrastructure and instead on what truly differentiates your business. And you'll have a global presence, so that you have lower latency and a better end user customer experience being deployed with your applications and infrastructure all over the world. And you'll be able to meet the data sovereignty requirements of various locales. So I think it's a great world that we're entering right now, I think we're at a time where there's a lot less confusion about where the world is headed, and I think it's an unprecedented opportunity for you to reinvent your businesses, reinvent your applications, and build capabilities for your customers and for your business that weren't easily possible before. And I hope you take advantage of it, and we'll be right here every step of the way to help you. Thank you very much. I appreciate it. (applause) >> Thank you, Andy. And thank you, everyone. I appreciate your time today. >> Thank you. (applause) (upbeat music)

Published Date : May 3 2017

SUMMARY :

of Worldwide Marketing, Amazon Web Services, Ariel Kelman. It is my pleasure to introduce to come up on stage here, I have a bunch of questions here for you, Andy. of a state of the state on AWS. And I think if you look at that collection of things, a lot of customers moving to AWS, And of course that's not the case. and how they should think about their relationship And I think the reality is when you look at the cloud, talk about a subject that's on the minds And I think that you can expect, over time, So as people are looking to move and it has clustering so that you don't and talk about something not on the cloud, And I think that if you look out 10 years from now, What are some of the other areas of investment and we have, you know, more than double and you know, while we have customers and listening to what you tell us matters, What are some of the things that are most unique And the other 10% is listening to you, And I think any, you know lots of companies moving to the cloud, which is, you know, And thank you, everyone. Thank you.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
AmadeusORGANIZATION

0.99+

AWSORGANIZATION

0.99+

Western DigitalORGANIZATION

0.99+

AndyPERSON

0.99+

NvidiaORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

FranceLOCATION

0.99+

SwedenLOCATION

0.99+

NingxiaLOCATION

0.99+

ChinaLOCATION

0.99+

Andy JassyPERSON

0.99+

StanfordORGANIZATION

0.99+

six monthsQUANTITY

0.99+

Ariel KelmanPERSON

0.99+

Jeff BezosPERSON

0.99+

twoQUANTITY

0.99+

threeQUANTITY

0.99+

2000DATE

0.99+

OracleORGANIZATION

0.99+

12QUANTITY

0.99+

26 yearsQUANTITY

0.99+

20 minutesQUANTITY

0.99+

ArielPERSON

0.99+

two peopleQUANTITY

0.99+

10 feetQUANTITY

0.99+

six pagesQUANTITY

0.99+

90%QUANTITY

0.99+

GEORGANIZATION

0.99+

six-pageQUANTITY

0.99+

second pieceQUANTITY

0.99+

last yearDATE

0.99+

14 feetQUANTITY

0.99+

sixQUANTITY

0.99+

PowerPointTITLE

0.99+

47%QUANTITY

0.99+

50 terabytesQUANTITY

0.99+

Amazon Web ServicesORGANIZATION

0.99+

12 feetQUANTITY

0.99+

sevenQUANTITY

0.99+

five slidesQUANTITY

0.99+

TodayDATE

0.99+

fourQUANTITY

0.99+

oneQUANTITY

0.99+

10%QUANTITY

0.99+

2016DATE

0.99+

350 million dollarsQUANTITY

0.99+

10XQUANTITY

0.99+

NetflixORGANIZATION

0.99+

NovemberDATE

0.99+

USLOCATION

0.99+

second reasonQUANTITY

0.99+

McDonaldsORGANIZATION

0.99+

Yaron Haviv, iguazio - DockerCon 2017 - #theCUBE - #DockerCon


 

>> Narrator: From Austin, Texas. It's the CUBE, covering DockerCon 2017. Brought to you by Docker, and support from it's ecosystem partners. >> Hi, I'm Stu Miniman, with my co-host, James Kobielus, who's been digging into all the application development angles. Happy to welcome back to the program, here at DockerCon, Yaron Haviv, who is the co-founder and CTO of iguazio. Yaron, great to see you. >> Thanks. >> How have you been? >> Great, great, been busy traveling a lot. >> We talked about how some of us celebrated Passover recently, I had brisket at home. We had Franklin's Barbecue brisket here. Anthony Bourdain said the only two people that know how to do brisket well, are Franklin's and the Jews. (all laugh) >> So we had Passover, a lot of good food, but also a lot of traveling. I was also in a Kubernetes conference in Europe and here. Prior to that, big data show, so it's a lot of traveling. >> Kubernetes, Docker, Ecosystem. You've been watching this, your company is involved in it. What's your take on the state of the ecosystem, and what do you think of the announcements this week? >> You know, I have also been to the Kubernetes conference, and you see those are still small, relatively small shows. And it's mostly developer focused. What we see is that Kubernetes is taking a lot of share from the others, because most of the guys that adopt are not enterprises yet. It's people that have a large enough infrastructure that they want to use it internally, and Kubernetes is a little more flexible. And on the other end, you see Docker trying to create a greenware-like, shrink wrapped, version of container infrastructure. So we see those two, and there's obviously the Public Cloud with their fully integrated stack. Now, what I notice here in the show, and also when, a couple of weeks ago, in the Kubernetes conference, think about the stack. It has, let's say, 20 components. So someone like Amazon brings the entire 20 components, and it's fully integrated and secure and networking and storage and data services and everything. And here, what you'll see, is a lot of vendors, this guy has those four components, the other guys have those five components, in some cases they actually overlap. So this guy will have three unique components, and two other components, et cetera. And it's very hard to assemble a full blown solution. So as a buyer, how do you decide which components am I going to choose? That's part of the challenge, and also helps serve the cloud guys. >> I remember when I first joined at Wikibon, we talked about, the hyperscale model was you take your team of PhDs, you just architect your application and software. You're the enterprise though, you don't have that talent. So you will spend money to buy that packaged solution. I want to buy it as a service, I want to buy it easy. Where do you see the maturity of this market, and how that fits for, and what can the enterprise consume, how do they do it? Or do they just go to platforms? >> So this is why our positioning was, it was a platform. We are not a component. We are a fully integrated system. We have multi-tendency, we have security, we have data lifecycle management. We integrate with applications, we have our own UI. But it's focused more on the data services. So if you take a dozen Amazon data services, you need to send Dynamo, and others, and object and file. We basically pack all of them, because data is the biggest challenge, as you know. High volubility, versioning, reliability, security. The biggest and toughest challenge is the data. And once you solve that one, the applications, they all become stateless, and that's much easier. There still needs to be a bigger ecosystem around it, which is why we are doing a lot more work with CNCF. And trying to create standards for the different interactions between those components. So when a buyer goes and buys a certain component from one vendor, it doesn't necessarily lock in to that. They can just go and modify it in the future. I think once you solve the data problem, of the persistency, which is sort of the toughest challenge in this environment, the rest of it becomes simpler. >> One of the questions James has been asking this week, is where analytics fits in? I look at your real-time continuous analytics piece, not an application that I heard talked about too much, maybe we can get your viewpoint on it? >> And the relevance is, of course, much of the application development that is going on, the hot stuff, is related to artificial intelligence, on streaming analytics, clearly continuous. >> Which is where we focus on. Some of the things that I try, to work with different communities, it's explained, that right now we have bifurcation, we have the Apache ecosystem, and we have the Docker ecosystem, totally separate ecosystems, and by the way, you know that cloud is where most analytics happen. >> James: Yes. >> So basically, analytics and cloud technology have to converge. This is what we have been trying to pitch, is why do you use YARN, as a scheduler, where I can use Kubernetes, and it's more generic. Because I can schedule any type of work. So this is something that we are trying to push, and all this notion of continuous integration, when we say continuous analytics, it's not just about the real-time aspect, it's also about the continuous development and integration. >> James: Yes. >> So you actually want this notion of server-less function, which is one of the things I like. Also, just immutable code and infrastructure, you want to adopt those notions, so analytics is going to go into real-time, more and more. So that means, unless I have my connected car pipeline that I get streams, and I process it, and I generate insights. What happens if I find a bug in my application, or I just want to enhance it, and create another feature? So I want to be able to just push a new version, of my analytics code into some platforms, hopefully ours. >> You also want to train that new algorithm as well, to make sure it's fit for whatever specific... >> Yeah, but you have to have this notion of continuity, which means all the integrations we did, have to be different, it has to be a lot more atomic. >> Yeah. >> It has to be check-pointed. All those things that I can basically knock down my analytic process, and relaunch it, and it goes seamlessly and continues. And that's not the Apache model, to play around at bootcamp enough, it's a lot more Legacy kind of approach, which I don't connect to too much. >> Yaron, maybe complete out the stack that you're building, how does serverless fit into this also? >> Okay, so basically, we are building all the data engines, we are doing streaming, we are doing objects, files, NoSQL, SQL, for us it's all integrated into the same very high performance engine. We also have built in analytics, so we can build things like joints and aggregations, and all of the computations on the data as it injects, and it could basically present itself as many different things. Now one of the things we get asked from customers, and we demonstrated that in Strata, let's assume I'm throwing an image into this thing, I want to be able to immediately analyze the image, and say if there is a face, if there is something suspicious about the picture, or maybe even simple things, like extract meta-data information, like geolocation of the picture, so I can do something with it. So we had to develop internally, an event driven process, we didn't call it serverless internally, where you throw data, and it immediately launches and triggers a process, which is a Docker container based process. It has high speed message bust integration into our data platform, that immediately invokes and processes that in a very elastic fashion. So if you throw thousands of objects, it elastically generates multiple workers to work on that, and that's also how we design things like DR, and backup internally in our platform to be very flexible, so we can build DR to S3. How do we do it? We basically have serverless functions that know how to convert the updates into a continuous stream of updates, and then they just go and there is a small code that says "Go right to S3". And that allows me a lot flexibility to develop new features. So this is all this notion of data lifecycle management, with every advance in our product, is actually based on serverless functions, we just didn't call it serverless. One of the things that we're working on with the community, is trying to detach that portion from our product, and contribute it as an open-source projects, because it's much faster and much more optimized than what you'll see, including IBM Whisk or Amazon Lambda implementation of that. >> Are you working with the Apache... Are you working in the context of the Apache framework to expose, for example, machine learning pipeline functions as serverless functions? >> So again, Apache is not the right necessarily place to do that. >> You can do them in Spark. >> I do them in Spark and all that, but we do want the Kubernetes environment to deal with all the constriction requirements for that thing. The way that we do, for example, tensorflow integration is we may expose file into tensor float, on one end, to be able to look at the image, and the same time the metadata updates, so what the image contains is exposed to tensorflow as sort of a key value store, or document store. It just updates attributes on the same image. So the way that we work now with healthcare, an MRI image lands and something looks at the MRI image, and senses cancer. Basically, you can mainly attack the same image, with records, which fields say contains cancer by this guy, take picture of this guy. And then, when you want to run a query, and say, you know what, give me all the MRIs pictures that contain query, it now flips and acts like a database, and you just pull all those images. It's a different approach to how to do those things. >> Yaron talked about Docker containers, Kubernetes, serverless, how do virtual machines fit into the environment? >> I had some interesting conversations at Kubernetes with some friends that are high ranked in this industry, without disclosing, do you really need openstack in between bare metal and containers? Because the traditional approach is, Okay, we have bare metal, we need to put virtualization layer for isolation, and then we need to put Kubernetes or Docker. And we figure out that very little amount of risk, actually, in putting, especially with the new security, things around containers and image signing, and what we do, which is authenticating the container, not the infrastructure on data access, network isolation, all those things that eventually can collapse and eliminate virtualization, but not for every application. Some applications which are more traditional Legacy, the application may still require VMs, because it's quite a different philosophy to develop microservices and develop VMs. Apart of what I see here in the show is not everyone internalizes that. People still think in the notion of Here's my lightweight VM, that happen to be called Docker container, and I'm going to give it the volume, and I'm going to create snapshots on that volume, and all that stuff. But if you think about it, what is really microservices? It's about allowing this elasticity, so the same workload can spawn multiple workers, it's the ability to go and create update versions, it's the ability to knock down this container anytime I want, and just kill it and launch it in a different place. You know how Google works, or Amazon or Ebay, or all those guys. You're basically killing containers on purpose, to basically test their system. All this notion that my configuration and my logs and all that stuff, sits inside the container, is not cloud native, and it doesn't allow this elasticity that you want if you're building a Netflix or an Ebay, or a modern enterprise infrastructure. So I think we need to put those two things aside. You have Legacy applications, keep them in the VMs. You have new workloads, you need to think of data, and data integration, and microservices differently on something which is entirely stateless. The image of the container builds from the get. OK? And create a Docker image. And if you want to go to a different image, you just go and recreate, from source, the same image. The data for that image needs to be stored in a data facility like a database or an object or something like that. >> Yaron, final question I have for you is, talk a little about the customers you're interacting with, talk about the people that are here, as you said, there's a spectrum of how far along they are in the thinking. You're pretty advanced in some of your architectural thoughts and opinionated as to where you're going. Where are the customers today, how many of them are ready for the future versus sticking to what they have got? >> So what you mentioned before, part of the key challenge for enterprises is they all want to move into the digital transformation, they all want to be competitive, because some have existential threats, think about even banks, today, where Apple comes with Apple Pay, it kills a lot of the margins they are making from all those small transactions. And now, no one really cares how many branches you have in the bank, because all the Y Generation just goes to their mobile app. Someone like a bank, have to immediately transition and be able to offer premium services, offer better experiences for the mobile application, be able to analyze user behavior, some things that are more strategic. The traditional things that IT deals with like exchange server management, SAP, all those Legacy things will move to the cloud, because there's no real value there. And what you see is more and more enterprises thinking about how do we generate the differentiation, which is more about analyzing data, and being able to provide better service to the customers, and the biggest challenge is they don't know how to do it. Because what the industry tells them, Go to Apache, and take a dozen of projects, and now integrate those and figure out the security problem, and you know what, you want to add Kubernetes, that's from a different story, but let's try and glue this together, and that's extremely complicated. So what we are trying to do is go to those customers, say you know what, we're building a full blown solution, fully integrated, security is baked in, all the different data services, it integrates with things like Kubernetes natively, we actually do the extra mile, we actually build Spark and tensorflow, and the images that contain everything, including support for us, that you can just launch Spark and it connects and works. We want to make life easier for those enterprises to solve those key challenges that they are working on. And this is working extremely well for us, actually the challenge we have, we only have, I think, two sales guys and we have a huge pipeline, and we can't really deliver for most of those projects. >> Good challenges to have sometimes, talk about scaling, which has been one of the themes of the week here. Yaron Haviv, great to catch up with you as always. We'll be back with two days of our coverage here, at DockerCon 2017. You're watching the CUBE. (electronic music)

Published Date : Apr 19 2017

SUMMARY :

Brought to you by Docker, Yaron, great to see you. that know how to do brisket well, So we had Passover, a lot of good food, and what do you think of the announcements this week? And on the other end, you see Docker trying to create You're the enterprise though, you don't have that talent. because data is the biggest challenge, as you know. the hot stuff, is related to artificial intelligence, and by the way, you know that cloud is where it's not just about the real-time aspect, So you actually want this notion of to make sure it's fit for whatever specific... have to be different, it has to be a lot more atomic. And that's not the Apache model, and all of the computations on the data as it injects, Are you working with the Apache... So again, Apache is not the right necessarily place So the way that we work now with healthcare, and all that stuff, sits inside the container, talk about the people that are here, as you said, and the images that contain everything, Yaron Haviv, great to catch up with you as always.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
James KobielusPERSON

0.99+

EuropeLOCATION

0.99+

JamesPERSON

0.99+

Anthony BourdainPERSON

0.99+

Stu MinimanPERSON

0.99+

AmazonORGANIZATION

0.99+

Yaron HavivPERSON

0.99+

20 componentsQUANTITY

0.99+

EbayORGANIZATION

0.99+

NetflixORGANIZATION

0.99+

Austin, TexasLOCATION

0.99+

GoogleORGANIZATION

0.99+

two daysQUANTITY

0.99+

SparkTITLE

0.99+

five componentsQUANTITY

0.99+

#DockerConEVENT

0.99+

two peopleQUANTITY

0.99+

twoQUANTITY

0.99+

AppleORGANIZATION

0.99+

DockerORGANIZATION

0.99+

this weekDATE

0.98+

WikibonORGANIZATION

0.98+

PassoverEVENT

0.98+

OneQUANTITY

0.98+

S3TITLE

0.98+

StrataTITLE

0.98+

DockerCon 2017EVENT

0.98+

YaronPERSON

0.97+

todayDATE

0.97+

#theCUBEEVENT

0.97+

oneQUANTITY

0.97+

firstQUANTITY

0.97+

two other componentsQUANTITY

0.96+

one vendorQUANTITY

0.95+

two thingsQUANTITY

0.95+

a dozenQUANTITY

0.95+

thousands of objectsQUANTITY

0.95+

CNCFORGANIZATION

0.94+

DockerConEVENT

0.94+

four componentsQUANTITY

0.94+

ApacheORGANIZATION

0.93+

KubernetesTITLE

0.92+

three unique componentsQUANTITY

0.92+

IBM WhiskORGANIZATION

0.89+

Y GenerationORGANIZATION

0.88+

one endQUANTITY

0.88+

DynamoORGANIZATION

0.88+

KubernetesEVENT

0.86+

EcosystemORGANIZATION

0.86+

a couple of weeks agoDATE

0.82+

KubernetesORGANIZATION

0.82+

two sales guysQUANTITY

0.82+

NoSQLTITLE

0.8+

a dozen of projectsQUANTITY

0.79+

Yaron HavivEVENT

0.75+

ApacheTITLE

0.7+

One of the questionsQUANTITY

0.7+

LambdaTITLE

0.66+

JewsPERSON

0.66+

Apple PayTITLE

0.63+

PassoverORGANIZATION

0.62+

iguazioPERSON

0.6+

SQLTITLE

0.59+

bootcampTITLE

0.55+

Raj Verma | DataWorks Summit Europe 2017


 

>> Narrator: Live from Munich, Germany it's the CUBE, covering Dataworks Summit Europe 2017. Brought to you by Hortonworks. >> Okay, welcome back everyone here at day two coverage of the CUBE here in Munich, Germany for Dataworks 2017. I'm John Furrier, my co-host Dave Vellante. Two days of wall to wall coverage SiliconANGLE Media's the CUBE. Our next guest is Raj Verma, the president and COO of Hortonworks. First time on the CUBE, new to Hortonworks. Welcome to the CUBE. >> Thank you very much, John, appreciate it. >> Looking good with a three piece suit we were commenting when you were on stage. >> Raj: Thank you. >> Great scene here in Europe, again different show vis-a-vis North America, in San Jose. You got the show coming up there, it's the big show. Here, it's a little bit different. A lot of IOT in Germany. You got a lot of car manufacturers, but industrial nation here, smart city initiatives, a lot of big data. >> Uh-huh. >> What's your thoughts? >> Yeah no, firstly thanks for having me here. It's a pleasure and good chit chatting right before the show as well. We are very, very excited about the entire data space. Europe is leading many initiatives about how to use data as a sustainable, competitive differentiator. I just moderated a panel and you guys heard me talk to a retail bank, a retailer. And really, Centrica, which was nothing but British Gas, which is rather an organization steeped in history so as to speak and that institution is now, calls itself a technology company. And, it's a technology company or an IOT company based on them using data as the currency for innovation. So now, British Gas, or Centrica calls itself a data company, when would you have ever thought that? I was at dinner with a very large automotive manufacturers and the kind of stuff they are doing with data right from the driving habits, driver safety, real time insurance premium calculation, the autonomous drive. It's just fascinating no matter what industry you talk about. It's just very, very interesting. And, we are very glad to be here. International business is a big priority for me. >> We've been following Hortonworks since it's inception when it spun out of Yahoo years ago. I think we've been to every Hadoop World going back, except for the first one. We watched the transition. It's interesting, it's always been a learning environment at these shows. And certainly the customer testimonials speaks to the ecosystem, but I have to ask you, you're new to Hortonworks. You have interesting technology background. Why did you join Hortonworks? Because you certainly see the movies before and the cycles of innovation, but now we're living in a pretty epic, machine learning, data AI is on the horizon. What were the reasons why you joined Hortonworks? >> Yeah sure, I've had a really good run in technology, fortunately was associated with two great companies, Parametric Technology and TIBCO Software. I was 16 years at TIBCO, so I've been dealing with data for 16 years. But, over the course of the last couple of years whenever I spoke to a C level executive, or a CIO they were talking to us about the fact that structured data, which is really what we did for 16 years, was not good enough for innovation. Innovation and insights into unstructured data was the seminal challenge of most of the executives that I was talking to, senior level executives. And, when you're talking about unstructured data and making sense of it there isn't a better technology than the one that we are dealing with right now, undoubtedly. So, that was one. Dealing with data because data is really the currency of our times. Every company is a data company. Second was, I've been involved with proprietary software for 23 years. And, if there is a business model that's ready for disruption it's the proprietary software business model because I'm absolutely convinced that open source is what I call a green business model. It's good for planet Earth so as to speak. It's a community based, it's based on innovation and it puts the customer and the technology provider on the same page. The customer success drives the vendor success. Yeah, so the open source community, data-- >> It's sustainables, pun intended, in the sense that it's had a continuing run. And, it's interesting Tier One software is all open source now. >> 100%, and by the way not only that if you see large companies like IBM and Microsoft they have finally woken up to the fact that if they need to attract talent and if they want to be known as talk leaders they have to have some very meaningful open source initiatives. Microsoft loves Linux, when did we ever think that was going to happen, right? And, by the way-- >> I think Steve Bauman once said it was the cancer of the industry. Now, they're behind it. But, this is the Linux foundation has also grown. We saw a project this past week. Intel donated a big project to the Linux now it's taking over, so more projects. >> Raj: Yes. >> There's more action happening than ever before. >> You know absolutely, John. Five years ago when I would go an meet a CIO and I would ask them about open source and they would wink, they say "Of course, "we do open source. But, it's less than 5%, right? Now, when I talk to a CIO they first ask their teams to go evaluated open source as the first choice. And, if they can't they come kicking and screaming towards propriety software. Most organizations, and some organizations with a lot of historical gravity so as to speak have a 50/50 even split between proprietary and open source. And, that's happened in the last three years. And, I can make a bold statement, and I know it'll be true, but in the next three years most organizations the ratio of proprietary to open source would be 20 proprietary 80 open source. >> So, obviously you've made that bet on open source, joining Hortonworks, but open is a spectrum. And, on one end of the spectrum you have Hortonworks which is, as I see it, the purest. Now, even Larry Ellison, when he gets onstage at Oracle Open World will talk about how open Oracle is, I guess that's the other end of the spectrum. So, my question is won't the Microsofts and the Oracles and the IBM, they're like recovering alcoholics and they'll accommodate their platforms through open source, embracing open source. We'll see if AWS is the same, we know it's unidirectional there. How do you see that-- >> Well, not necessarily. >> Industry dynamic, we'll talk about that later. How do you see that industry dynamic shaking out? >> No, absolutely, I think I remember way back in I think the mid to late 90s I still loved that quote by Scott McNeely, who is a friend, Dell, not Dell, Digital came out with a marketing campaign saying open VMS. And, Scott said, "How can someone lie "so much with one word?" (laughs) So, it's the fact that Oracle calling itself open, well I'll just leave it at, it's a good joke. I think the definition of open source, to me, is when you acquire a software you have three real costs. One is the cost of initial procuring that software and the hardware and all the rest of it. The second is implementation and maintenance. However, most people miss the third dimension of cost when acquiring software, which is the cost to exit the technology. Our software and open source has very low exit barriers to our technology. If you don't like our technology, switch it off. You own the software anyways. Switch off our services and the barrier of exits are very, very low. Having worked in proprietary software, as I said, for 23 years I very often had conversations with my customers where I would say, "Look, you really "don't have a choice, because if you want to exit "our technology it's going to probably cost you "ten times more than what you've spent till date." So, it a lock in architecture and then you milk that customer through maintenance, correct? >> Switching costs really are the metric-- >> Raj: Switching costs, exactly. >> You gave the example of Blockbuster Camera, and the rental, the late charge fees. Okay, that's an example of lock in. So, as we look at the company you're most compared with, now that's it's going public, Cloudera, in a way I see more similarities than differences. I mean, you guys are sort of both birds of a feather. But, you are going for what I call the long game with a volume subscription model. And, Cloudera has chosen to build proprietary components on top. So, you have to make big bets on open. You have to support those open technologies. How do you see that affecting the long term distance model? >> Yeah, I think we are committed to open source. There's absolutely no doubt about it. I do feel that we are connected data platform, which is data at rest and data in motion across on prem and cloud is the business model the going to win. We clearly have momentum on our side. You've seen the same filings that I have seen. You're talking about a company that had a three year head start on us, and a billion dollars of funding, all right, at very high valuations. And yet, they're only one year ahead in terms of revenue. And, they have burnt probably three times more cash than we have. So clearly, and it's not my opinion, if you look at the numbers purely, the numbers actually give us the credibility that our business model and what we are doing is more efficient and is working better. One of the arguments that I often hear from analysts and press is how are your margins on open source? According to the filings, again, their margins are 82% on proprietary software, my margins on open source are 84%. So, from a health of the business perspective we are better. Now, the other is they've claimed to have been making a pivot to more machine learning and deep learning and all the rest of it. And, they actually'd like us to believe that their competition is going to be Amazon, IBM, and Google. Now, with a billion dollars of funding with the Intel ecosystem behind them they could effectively compete again Hortonworks. What do you think are their chances of competing against Google, Amazon, and IBM? I just leave that for you guys to decide, to be honest with you. And, we feel very good that they have virtually vacated the space and we've got the momentum. >> On the numbers, what jumps out at you on filing since obviously, I sure, everyone at Hortonworks was digging through the S1 because for the first time now Cloudera exposes some of the numbers. I noticed some striking things different, obviously, besides their multiple on revenue valuation. Pretty obvious it's going to be a haircut coming after the public offering. But, on the sales side, which is your wheelhouse there's a value proposition that you guys at Hortonworks, we've been watching, the cadence of getting new clients, servicing clients. With product evolution is challenging enough, but also expensive. It's not you guys, but it's getting better as Sean Connolly pointed out yesterday, you guys are looking at some profitability targets on the Ee-ba-dep coming up in Q four. Publicly stated on the earnings call. How's that different from Cloudera? Are they burning more cash because of their sales motions or sales costs, or is it the product mix? What's you thoughts on the filings around Cloudera versus the Hortonworks? >> Well, look I just feel that, I can talk more about my business than theirs. Clearly, you've seen the same filings that I have and you've see the same cash burn rates that we have seen. And, we clearly are ore efficient, although we can still get better. But, because of being public for a little more than two years now we've had a thousand watt bulb being shown at us and we have been forced to be more efficient because we were in the limelight. >> John: You're open. >> In the open, right? So, people knew what our figures are, what our efficiency ratios were. So, we've been working diligently at improving them and we've gotten better, and there's still scope for improvement. However, being private did not have the same scrutiny on Cloudera. And, some would say that they were actually spending money like drunken sailors if you really read their S1 filing. So, they will come under a lot of scrutiny as well. I'm sure they'll get more efficient. But right now, clearly, you've seen the same numbers that I have, their numbers don't talk about efficiency either in the R and D side or the sales and marketing side. So, yeah we feel very good about where we are in that space. >> And, open source is this two edged sword. Like, take Yarn for example, at least from my perspective Hortonworks really led the charge to Yarn and then well before Doctor and Kubernetes ascendancy and then all of a sudden that happens and of course you've got to embrace those open source trends. So, you have the unique challenge of having to support sort of all the open source platforms. And, so that's why I call it the long game. In order for you guys to thrive you've got to both put resources into those multiple projects and you've got to get the volume of your subscription model, which you pointed out the marginal economics are just as good as most, if not any software business. So, how do you manage that resource allocation? Yes, so I think a lot of that is the fact that we've got plenty of contributors and committers to the open source community. We are seen as the angel child in open source because we are just pure, kosher open source. We just don't have a single line of proprietary code. So, we are committed to that community. We have over the last six or seven years developed models of our software development which helps us manage the collective bargaining power, so as to speak, of the community to allocate resources and prioritize the allocation of resources. It continues to be a challenge given the breadth of the open source community and what we have to handle, but fortunately I'm blessed that we've got a very, very capable engineering organization that keeps us very efficient and on the cutting edge. >> We're here with Raj Verma, With the new president and COO of Hortonworks, Chief Operating Officer. I've got to ask you because it's interesting. You're coming in with a fresh set of eyes, coming in as you mentioned, from TIBCO, interesting, which was very successful in the generation of it's time and history of TIBCO where it came from and what it did was pretty fantastic. I mean, everyone knows connecting data together was very hard in the enterprise world. TIBCO has some challenges today, as you're seeing, with being disrupted by open source, but I got to ask you. As a perspective, new executive you got, looking at the battlefield, an opportunity with open source there's some significant things happening and what are you excited about because Hortonworks has actually done some interesting things. Some, I would say, the world spun in their direction, their relationship with Microsoft, for instance, and their growth in cloud has been fantastic. I mean, Microsoft stock price when they first started working with Hortonworks I think was like 26, and obviously with Scott Di-na-tell-a on board Azure, more open source, on Open Compute to Kubernetes and Micro Services, Azure doing very, very well. You also have a partnership with Amazon Web Services so you already are living in this cloud era, okay? And so, you have a cloud dynamic going on. Are you excited by that? You bring some partnership expertise in from TIBCO. How do you look at partners? Because, you guys don't really compete with anybody, but you're partners with everybody. So, you're kind of like Switzerland, but you're also doing a lot of partnerships. What are you excited about vis-a-vis the cloud and some of the other partnerships that are happening. >> Yeah, absolutely, I think having a robust partner ecosystem is probably my number one priority, maybe number two after being profitable in a short span of time, which is, again, publicly stated. Now, our partnership with Microsoft is very, very special to us. Being available in Azure we are seeing some fantastic growth rates coming in from Azure. We are also seeing remarkable amount of traction from the market to be able to go and test out our platform with very, very low barriers of entry and, of course, almost zero barriers of exit. So, from a partnership platform cloud providers like Amazon, Microsoft, are very, very important to us. We are also getting a lot of interest from carriers in Europe, for example. Some of the biggest carriers want to offer business services around big data and almost 100%, actually not almost, 100% of the carriers that we have spoken to thus far want to partner with us and offer our platform as a cloud service. So, cloud for us is a big initiative. It gives us the entire capability to reach audiences that we might not be able to reach ringing one door bell at a time. So, it's, as I said, we've got a very robust, integrated cloud strategy. Our customers find that very, very interesting. And, building that with a very robust partner channel, high priority for us. Second, is using our platform as a development platform for application on big data is, again, a priority. And that's, again, building a partner ecosystem. The third is relationships with global SIs, Extensia, Deloitte, KPMG. The Indian SIs of In-flu-ces, and Rip-ro, and HCL and the rest. We have some work to do. We've done some good work there, but there's some work to be done there. And, not only that I think some of the initiatives that we are launching in terms of training as a service, free certification, they are all things which are aimed at reaching out to the partners and building, as I said, a robust partner ecosystem. >> There's a lot of talk a conferences like this about, especially in Hadoop, about complexity, complexity of the ecosystem, new projects, and the difficulties of understanding that. But, in reality it seems as though today anyway the technology's pretty well understood. We talked about Millennials off camera coming out today with social savvy and tooling and understanding gaming and things like that. Technology, getting it to work seems to not be the challenge anymore. It's really understanding how to apply it, how to value data, we heard in your panel today. The business process, which used to be very well known, it's counting, it's payroll, simple. Now, it's kind of ever changing daily. What do you make of that? How do you think that will effect the future of work? Yeah, I think there's some very interesting questions that you've asked in that the first, of course, is what does it take to have a very successful big data, or Hadoop project. And, I think we always talk about the fact that if you have a very robust business case backing a Hadoop project that is the number one key ingredient to delivering a Hadoop project. Otherwise, you can tend to boil the ocean, all right, or try and eat an elephant in one bite as I like to say. So, that's one and I think you're right. It's not the technology, it's not the complexity, it's not the availability of the resources. It is a leadership issue in organizations where the leader demands certain outcomes, business outcomes from the Hadoop project team and we've seen whenever that happens the projects seem to be very, very successful. Now, the second part of the question about future of work, which is a very, very interesting topic and a topic which is very, very close to my heart. There are going to be more people than jobs in the next 20, 25 years. I think that any job that can be automated will be automated, or has been automated, right? So, this is going to have a societal impact on how we live. I've been lucky enough that I joined this industry 25 years ago and I've never had to change or switch industries. But, I can assure you that our kids, and we were talking about kids off camera as well, our kids will have to probably learn a new skill every five years. So, how does that impact education? We, in our generation, were testing champions. We were educated to score well on tests. But, the new form of education, which you and I were talking about, again in California where we live, and where my daughter goes to high school and in her school the number one, the number one priority is to instill a sense of learning and joy of learning in students because that is what is going to contribute to a robust future. >> That's a good point, I want to just interject here because I think that the trend we're seeing in the higher Ed side too also point to the impact of data science, to curriculum and learning. It's not just putting catalogs online. There's now kind of an iterative kind of non-linear discovery to proficiency. But, there's also the emotional quotient aspect. You mentioned the love of learning. The immersion of tech and digital is creating an interdisciplinary requirement. So, all the folks say that, what the statistic's like half the jobs that are going to be available haven't even been figured out yet. There's a value creation around interdisciplinary skill sets and emotional quotient. >> Absolutely. >> Social, emotional because of the human social community connectedness. This is also a big data challenge opportunity. >> Oh, 100% and I think one of the things that we believe is in the future, jobs that require a greater amount of empathy are least susceptible to automation. So, things like caring for old age people in the world, and nursing, and teaching, and artists, and all the rest will be professions which will be highly paid and numerous. I also believe that the entire big data challenge about how you use data to impact communities is going to come into play. And also, I think John, you and I were again talking about it, the entire concept of corporations is only 200 years old, really, 200, 300 years old. Before that, our forefathers were individual contributors who contributed a certain part in a community, barbers, tailors, farmers, what have you. We are going to go back to the future where all of us will go back to being individual contributors. And, I think, and again I'm bringing it back to open source, open source is the start of that community which will allow the community to go back to its roots of being individual contributors rather than being part of a organization or a corporation to be successful and to contribute. >> Yeah, the Coase's Penguin has been a very famous seminal piece of work. Obviously, Ronald Coase who's wrote the book The Nature of the Firm is interesting, but that's been a kind of historical document. You look at blockchain for instance. Blockchain actually has the opportunity to disrupt what the Nature of the Firm is about because of smart contracts, supply chain, and what not. And, we have this debate on the CUBE all the time, there's some naysayers, Tim Conner's a VC and I were talking on our Friday show, Silicon Valley Friday show. He's actually a naysayer on blockchain. I'm actually pro blockchain because I think there's some skeptics that say blockchain is really hard to because it requires an ecosystem. However, we're living in an ecosystem, a world of community. So, I think The Nature of the Firm will be disrupted by people organizing in a new way vis-a-vis blockchain 'cause that's an open source paradigm. >> Yeah, no I concur. So, I'm a believer in that entire concept. I 100%-- >> I want to come back to something you talked about, about individual contributors and the relationship in link to open source and collaboration. I personally, I think we have to have a frank conversation about, I mean machines have always replaced humans, but for the first time in our history it's replacing cognitive functions. To your point about empathy, what are the things that humans can do that machines can't? And, they become fewer and fewer every year. And, a lot of these conferences people don't like to talk about that, but it's a reality that we have to talk about. And, your point is right on, we're going back to individual contribution, open source collaboration. The other point is data, is it going to be at the center of that innovation because it seems like value creation and maybe job creation, in the future, is going to be a result of the combinatorial effects of data, open source, collaboration, other. It's not going to because of Moore's Law, all right. >> 100%, and I think one of the aspects that we didn't touch upon is the new societal model that automation is going to create would need data driven governance. So, a data driven government is going to be a necessity because, remember, in those times, and I think in 25, 30 years countries will have to explore the impact of negative taxation, right? Because of all the automation that actually happens around citizen security, about citizen welfare, about cost of healthcare, cost of providing healthcare. All of that is going to be fueled by data, right? So, it's just, as the Chinese proverb says, "May you live in interesting times." We definitely are living in very interesting times. >> And, the public policy implications are, your friend and one of my business heroes, Scott McNeally says, "There's no privacy in "the internet, get over it." We interviewed John Tapscott last week he said "That's unacceptable, "we have to solve that problem." So, it brings up a lot of public policy issues. >> Well, the social economic impact, right now there's a trend we're seeing where the younger generation, we're talking about the post 9/11 generation that's entering the workforce, they have a social conscience, right? So, there's an emphasis you're seeing on social good. AI for social good is one of the hottest trends out there. But, the changing landscape around data is interesting. So, the word democratization has been used whether you're looking at the early days of blogging and podcasting which we were involved in and research to now in media this notion of data and transparency and open source is probably at a tipping point, an all time high in terms of value creation. So, I want to hear your thoughts on this because as someone who's been in the proprietary world the mode of operation was get something proprietary, lock it dowm, build a fence and a wall, protect it with folks with machine guns and fight for the competitive advantage, right? Now, the competitive advantage is open. Okay, so you're looking at pure open source model with Hortonworks. It changes how companies are competing. What is the competitive advantage of Hortonworks? Actually, to be more open. >> 100%. >> How do you manage that? >> No absolutely, I just think the proprietary nature of software, like software has disrupted a lot of businesses, all right? And, it's not a resistance to disruption itself. I mean, there has never been a business model in the history of time where you charge a lot of money to build a software, or sell a software that you built and then whatever are the defects in that software you get paid more money to fix them, all right? That's the entire perpetual and maintenance model. That model is going to get disrupted. Now, there are hundreds of billions of dollars involved in it so people are going to come kicking and screaming to the open source world, but they will have to come to the open source world. Our advantage that we're seeing is innovation now in a closed loop environment, no matter what size of a company you are, cannot keep up with the changing landscape around you from a data perspective. So, without the collective innovation of the community I don't really think a technology can stay at par with the changes around them. >> This is what I say about, this is what I think is such an important point that you're getting at because we were started SiliconANGLE actually in the Cloudera office, so we have a lot of friends that work there. We have a great admiration for them, but one of the things that Cloudera has done through their execution is they have been very profit oriented, go public at all costs kind of thing that they're doing now. You've seen that happen. Is the competitive advantage that you're pointing out is something we're seeing that similar that Andy Jasseys doing at AWS, which is it's not so much to build something proprietary per se, it's just to ship something faster. So, if you look at Amazon's competitive advantage is that they just continue to ship product faster and faster and faster than companies can build themselves. And also, the scale that they're getting with these economies is increasing the quality. So, open source has also hit the naysayers on security, right? Everyone said, "Oh, open source is not secure." As it turns out, it's more secure. Amazon at scale is actually becoming more secure. So, you're starting to see the new competitive advantage be ship more, be more open as the way to do business. What do you think the impact will be to traditional companies whether it's a startup competing or an existing bank? This is a paradigm shift, what's the impact going to be for a CIO or CEO of a big company? How do they incorporate that competitive advantage? Yeah, I think the proprietary software world is not going to go away tomorrow, John, you know that. There so much of installed software and there's a saying from where I come from that "Even a dead elephant is worth a million dollars," right? So, even that business model even though it is sort of dying it'll still be a good investment for the next ten years because of the locked in business model where customers cannot get out. Now, from a perspective of openness and what that brings as a competitive differentiators to our customer just the very base at which, as I've said I've lived in a proprietary world, you would be lucky if you were getting the next version of our software every 18 months, you'd be lucky. In the open source community you get a few versions in 18 months. So, the cadence at which releases come out have just completely disrupted the proprietary model. It is just the collective, as I said, innovative or innovation ability of the community has allowed us to release, to increase the release cadence to a few months now, all right? And, if our engineering team had it's way it'll further be cut short, right? So, the ability of customers, and what does that allow the customer to do? Ten years ago if you looked for a capability from your proprietary vendor they would say you have to wait 18 months. So, what do you do, you build it yourself, all right? So, that is what the spaghetti architecture was all about. In the new open source model you ask the community and if enough people in the community think that that's important the community builds it for you and gives it to you. >> And, the good news is the business model of open source is working. So, you got you guys have been public, you got Cloudera going public, you have MuleSoft out there, a lot of companies out there now that are public companies are open source companies, a phenomenal change over. But, the other thing that's interesting is that the hiring factor for the large enterprise to the point of, your point about so proprietary not updating, it's the same is true for the enterprise. So, just hiring candidates out of open source is now increased, the talent pool for a large enterprise. >> 100%, 100%. >> Well, I wonder if I could challenge this love fest for a minute. (laughs) So, there's another saying, I didn't grow up there, but a dying snake can still bite you. So, I bring that up because there is this hybrid model that's emerging because these elephants eventually they figure it out. And so, an example would be, we talked about Cloudera and so forth, but the better example, I think, is IBM. What IBM has done to embrace open source with investing years ago a billion dollars into Linux, what it's doing with Spark, essentially trying to elbow its way in and say, "Okay, "now we're going to co-opt the ecosystem. "And then, build our proprietary pieces on top of it." That, to me, that's a viable business model, is it not? >> Yes, I'm sure it is and to John's point with the Mule going IPO and with Cloudera having successfully built a $250 million, $261 million business is testimony, yeah, it's a testimony to the fact that companies can be built. Now, can they be more efficient, sure they can be more efficient. However, my entire comment on this is why are you doing open source? What is your intent of doing open source, to be seen as open, or to be truly open? Because, in our philosophy if you a add a slim layer of proprietariness, why are you doing that? And, as a businessman I'll tell you why you increase the stickiness factor by locking in your customer, right? So, let's not, again, we're having a frank conversation, proprietary code equals customer lock in, period. >> Agreed. And, as a business model-- >> I'm not sure I agree with that. >> As a business model. >> Please. (laughs) We'll come back to that. >> So, it's a customer lock in. Now, as a business model it is, if you were to go with the business models of the past, yes I believe most of the analysts will say it a stickier, better business model, but then we would like to prove them wrong. And, that's our mission as open source purely. >> I would caution though, Amazon's the mother of all lock in's. You kind of bristled at that before. >> They're not, I mean they use a lot of open source. I mean, did they open source it? Getting back to the lock in, the lock in is a function of stickiness, right? So, stickiness can be open source. Now, you could argue that Horonworks through they're relationship with partnering is a lock in spec with their stickiness of being open. Right, so I come back down to the proprietary-- >> Dave: My search engine I like Google. >> I mean Google's certainly got-- >> It's got to be locked in 'cause I like it? >> Well, there's a lot of do you care with proprietary technology that Google's built. >> Switching costs, as we talked about before. >> But, you're not paying for Si-tch >> If the value exceeds the price of the lock in then it's an opportunity. So, Palma Richie's talking about the hardened top, the hardened top. Do you care what's in an Intel processor? Well, Intel is a proprietary platform that provides processing power, but it enables a lot of other value. So, I think the stickiness factor of say IBM is interesting and they've done a lot open source stuff to defend them on Linux, for example they do a (mumbles) blockchain. But, they're priming the pump for their own business, that's clear for their lock In. >> Raj wasn't saying there's not value there. He's saying it's lock in, and it is. >> Well, some customers will pay for convenience. >> Your point is if the value exceeds the lock in risk than it's worth it. >> Yeah, that's my point, yeah. >> 1005, 100%. >> And, that's where the opportunity is. So, you can use open source to get to a value projectory. That's the barriers to entry, we seen 'em on the entrepreneurship side, right? It's easier to start a company now than ever before. Why? Because of open source and cloud, right? So, does that mean that every startup's going to be super successful and beat IBM? No, not really. >> Do you thinK there will be a red hat of big data and will you be it? >> We hope so. (laughs) If I had my that's definitely. That's really why I am here. >> Just an example, right? >> And, the one thing that excites us about this this year is as my former boss used to say you could be as good as you think you are or the best in the world but if you're in the landline business right now you're not going to have a very bright future. However, the business that we are in we pull from the market that we get, and you're seeing here, right? And, these are days that we have very often where customer pool is remarkable. I mean, this industry is growing at, depending on which analyst you're talking to somewhere between 50 to 80% ear on ear. All right, every customer is a prospect for us. There isn't a single conversation that we have with any organization almost of any size where they don't think that they can use their data better, or they can enhance and improve their data strategy. So, if that is in place and I am confident about our execution, very, very happy with the technology platform, the support that we get from out customers. So, all things seem to be lining up. >> Raj, thanks so much for coming on, we appreciate your time. We went a little bit over, I think, the allotted time, but wanted to get your insight as the new President and Chief Operating Officer for Hortonworks. Congratulations on the new role, and looking forward to seeing the results. Since you're a public company we'll be actually able to see the scoreboard. >> Raj: Yes. >> Congratulations, and thanks for coming on the CUBE. There's more coverage here live at Dataworks 2017. I John Furrier, stay with us more great interviews, day two coverage. We'll be right back. (jaunty music)

Published Date : Apr 6 2017

SUMMARY :

Munich, Germany it's the CUBE, of the CUBE here in Munich, Thank you very much, we were commenting when you were on stage. You got the show coming up about the entire data space. and the cycles of of most of the executives in the sense that it's 100%, and by the way of the industry. happening than ever before. a lot of historical gravity so as to speak And, on one end of the How do you see that industry So, it's the fact that and the rental, the late charge fees. the going to win. But, on the sales side, to be more efficient because either in the R and D side or of that is the fact that and some of the other from the market to be the projects seem to be So, all the folks say that, the human social community connectedness. I also believe that the the opportunity to disrupt So, I'm a believer in that entire concept. and maybe job creation, in the future, Because of all the automation And, the public and fight for the innovation of the community allow the customer to do? is now increased, the talent and so forth, but the better the fact that companies And, as a business model-- I agree with that. We'll come back to that. most of the analysts Amazon's the mother is a function of stickiness, right? Well, there's a lot of do you care we talked about before. If the value exceeds there's not value there. Well, some customers Your point is if the value exceeds That's the barriers to If I had my that's definitely. the market that we get, and Congratulations on the new role, on the CUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
IBMORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

TIBCOORGANIZATION

0.99+

Dave VellantePERSON

0.99+

GoogleORGANIZATION

0.99+

Raj VermaPERSON

0.99+

AmazonORGANIZATION

0.99+

ScottPERSON

0.99+

Steve BaumanPERSON

0.99+

CentricaORGANIZATION

0.99+

British GasORGANIZATION

0.99+

JohnPERSON

0.99+

Tim ConnerPERSON

0.99+

John TapscottPERSON

0.99+

HortonworksORGANIZATION

0.99+

EuropeLOCATION

0.99+

KPMGORGANIZATION

0.99+

DeloitteORGANIZATION

0.99+

CaliforniaLOCATION

0.99+

John FurrierPERSON

0.99+

Scott McNeallyPERSON

0.99+

Sean ConnollyPERSON

0.99+

Larry EllisonPERSON

0.99+

Ronald CoasePERSON

0.99+

DellORGANIZATION

0.99+

San JoseLOCATION

0.99+

GermanyLOCATION

0.99+

Amazon Web ServicesORGANIZATION

0.99+

RajPERSON

0.99+

Scott McNeelyPERSON

0.99+

OracleORGANIZATION

0.99+

$261 millionQUANTITY

0.99+

Andy JasseysPERSON

0.99+

AWSORGANIZATION

0.99+

82%QUANTITY

0.99+

$250 millionQUANTITY

0.99+

16 yearsQUANTITY

0.99+

100%QUANTITY

0.99+

DavePERSON

0.99+

84%QUANTITY

0.99+

23 yearsQUANTITY

0.99+

18 monthsQUANTITY

0.99+

Scott DiPERSON

0.99+

ClouderaORGANIZATION

0.99+

last weekDATE

0.99+

ExtensiaORGANIZATION

0.99+

OraclesORGANIZATION

0.99+

Shaun Connolly, Hortonworks - DataWorks Summit Europe 2017 - #DW17 - #theCUBE


 

>> Announcer: Coverage DataWorks Summit Europe 2017 brought to you by Hortonworks. >> Welcome back everyone. Live here in Munich, Germany for theCUBE'S special presentation of Hortonworks Hadoop Summit now called DataWorks 2017. I'm John Furrier, my co-host Dave Vellante, our next guest is Shaun Connolly, Vice President of Corporate Strategy, Chief Strategy Officer. Shaun great to see you again. >> Thanks for having me guys. Always a pleasure. >> Super exciting. Obviously we always pontificating on the status of Hadoop and Hadoop is dead, long live Hadoop, but runs in demise is greatly over-exaggerated, but reality is is that no major shifts in the trends other than the fact that the amplification with AI and machine learning has upleveled the narrative to mainstream around data, big data has been written on on gen one on Hadoop, DevOps, culture, open-source. Starting with Hadoop you guys certainly have been way out in front of all the trends. How you guys have been rolling out the products. But it's now with IoT and AI as that sizzle, the future self driving cars, smart cities, you're starting to really see demand for comprehensive solutions that involve data-centric thinking. Okay, said one. Two, open-source continues to dominate MuleSoft went public, you guys went public years ago, Cloudera filed their S-1. A crop of public companies that are open-source, haven't seen that since Red Hat. >> Exactly. 99 is when Red Hat went public. >> Data-centric, big megatrend with open-source powering it, you couldn't be happier for the stars lining up. >> Yeah, well we definitely placed our bets on that. We went public in 2014 and it's nice to see that graduating class of Taal and MuleSoft, Cloudera coming out. That just I think helps socializes movement that enterprise open-source, whether it's for on-prem or powering cloud solutions pushed out to the edge, and technologies that are relevant in IoT. That's the wave. We had a panel earlier today where Dahl Jeppe from Centric of British Gas, was talking about his ... The digitization of energy and virtual power plant notions. He can't achieve that without open-source powering and fueling that. >> And the thing about it is is just kind of ... For me personally being my age in this generation of computer industry since I was 19, to see the open-source go mainstream the way it is, is even gets better every time, but it really is the thousandth flower bloom strategy. Throwing the seeds out there of innovation. I want to ask you as a strategy question, you guys from a performance standpoint, I would say kind of got hammered in the public market. Cloudera's valuation privately is 4.1 billion, you guys are close to 700 million. Certainly Cloudera's going to get a haircut looks like. The public market is based on the multiples from Dave and I's intro, but there's so much value being created. Where's the value for you guys as you look at the horizon? You're talking about white spaces that are really developing with use cases that are creating value. The practitioners in the field creating value, real value for customers. >> So you covered some of the trends, but I'll translate em into how the customers are deploying. Cloud computing and IoT are somewhat related. One is a centralization, the other is decentralization, so it actually calls for a connected data architecture as we refer to it. We're working with a variety of IoT-related use cases. Coca-Cola, East Japan spoke at Tokyo Summit about beverage replenishment analytics. Getting vending machine analytics from vending machines even on Mount Fuji. And optimizing their flow-through of inventory in just-in-time delivery. That's an IoT-related to run on Azure. It's a cloud-related story and it's a big data analytics story that's actually driving better margins for the business and actually better revenues cuz they're getting the inventory where it needs to be so people can buy it. Those are really interesting use cases that we're seeing being deployed and it's at this convergence of IoT cloud and big data. Ultimately that leads to AI, but I think that's what we're seeing the rise of. >> Can you help us understand that sort of value chain. You've got the edge, you got the cloud, you need something in-between, you're calling it connected data platform. How do you guys participate in that value chain? >> When we went public our primary workhorse platform was Hortonworks Data Platform. We had first class cloud services with Azure HDInsight and Hortonworks Data Cloud for AWS, curated cloud services pay-as-you-go, and Hortonworks DataFlow, I call as our connective tissue, it manages all of your data motion, it's a data logistics platform, it's like FedEx for data delivery. It goes all the way out to the edge. There's a little component called Minify, mini and ify, which does secure intelligent analytics at the edge and transmission. These smart manufacturing lines, you're gathering the data, you're doing analytics on the manufacturing lines, and then you're bringing the historical stuff into the data center where you can do historical analytics across manufacturing lines. Those are the use cases that are connect the data archives-- >> Dave: A subset of that data comes back, right? >> A subset of the data, yep. The key events of that data it may not be full of-- >> 10%, half, 90%? >> It depends if you have operational events that you want to store, sometimes you may want to bring full fidelity of that data so you can do ... As you manufacture stuff and when it got deployed and you're seeing issues in the field, like Western Digital Hard Drives, that failure's in the field, they want that data full fidelity to connect the data architecture and analytics around that data. You need to ... One of the terms I use is in the new world, you need to play it where it lies. If it's out at the edge, you need to play it there. If it makes a stop in the cloud, you need to play it there. If it comes into the data center, you also need to play it there. >> So a couple years ago, you and I were doing a panel at our Big Data NYC event and I used the term "profitless prosperity," I got the hairy eyeball from you, but nonetheless, we talked about you guys as a steward of the industry, you have to invest in open-source projects. And it's expensive. I mean HDFS itself, YARN, Tez, you guys lead a lot of those initiatives. >> Shaun: With the community, yeah, but we-- >> With the community yeah, but you provided contributions and co-leadership let's say. You're there at the front of the pack. How do we project it forward without making forward-looking statements, but how does this industry become a cashflow positive industry? >> Public companies since end of 2014, the markets turned beginning at 2016 towards, prior to that high growth with some losses was palatable, losses were not palatable. That his us, Splunk, Tableau most of the IT sector. That's just the nature of the public markets. As more public open-source, data-driven companies will come in I think it will better educate the market of the value. There's only so much I can do to control the stock price. What I can from a business perspective is hit key measures from a path to profitability. The end of Q4 2016, we hit what we call the just-to-even or breakeven, which is a stepping stone. On our earnings call at the end of 2016 we ended with 185 million in revenue for the year. Only five years into this journey, so that's a hard revenue growth pace and we basically stated in Q3 or Q4 of 17, we will hit operating cashflow neutrality. So we are operating business-- >> John: But you guys also hit a 100 million at record pace too, I believe. >> Yeah, in four years. So revenue is one thing, but operating margins, like if you look at our margins on our subscription business for instance, we've got 84% margin on that. It's a really nice margin business. We can make that better margins, but that's a software margin. >> You know what's ironic, we were talking about Red Hat off camera. Here's Red Hat kicking butt, really hitting all cylinders, three billion dollars in bookings, one would think, okay hey I can maybe project forth some of these open-source companies. Maybe the flip side of this, oh wow we want it now. To your point, the market kind of flipped, but you would think that Red Hat is an indicator of how an open-source model can work. >> By the way Red Hat went public in 99, so it was a different trajectory, like you know I charted their trajectory out. Oracle's trajectory was different. They didn't even in inflation adjusted dollars they didn't hit a 100 million in four years, I think it was seven or eight years or what have you. Salesforce did it in five. So these SaaS models and these subscription models and the cloud services, which is an area that's near and dear to my heart. >> John: Goes faster. >> You get multiple revenue streams across different products. We're a multi-products cloud service company. Not just a single platform. >> So we were actually teasing this out on our-- >> And that's how you grow the business, and that's how Red Hat did it. >> Well I want to get your thoughts on this while we're just kind of ripping live here because Dave and I were talking on our intro segment about the business model and how there's some camouflage out there, at least from my standpoint. One of the main areas that I was kind of pointing at and trying to poke at and want to get your reaction to is in the classic enterprise go-to-market, you have sales force expansive, you guys pay handsomely for that today. Incubating that market, getting the profitability for it is a good thing, but there's also channels, VARs, ISVs, and so on. You guys have an open-source channel that kind of not as a VAR or an ISV, these are entrepreneurs and or businesses themselves. There's got to be a monetization shift there for you guys in the subscription business certainly. When you look at these partners, they're co-developing, they're in open-source, you can almost see the dots connecting. Is this new ecosystem, there's always been an ecosystem, but now that you have kind of a monetization inherently in a pure open distribution model. >> It forces you to collaborate. IBM was on stage talking about our system certified on the Power Systems. Many may look at IBM as competitive, we view them as a partner. Amazon, some may view them as a competitor with us, they've been a great partner in our for AWS. So it forces you to think about how do you collaborate around deeply engineered systems and value and we get great revenue streams that are pulled through that they can sell into the market to their ecosystems. >> How do you vision monetizing the partners? Let's just say Dave and I start this epic idea and we create some connective tissue with your orchestrator called the Data Platform you have and we start making some serious bang. We make a billion dollars. Do you get paid on that if it's open-source? I mean would we be more subscriptions? I'm trying to see how the tide comes in, whose boats float on the rising tide of the innovation in these white spaces. >> Platform thinking is you provide the platform. You provide the platform for 10x value that rides atop that platform. That's how the model works. So if you're riding atop the platform, I expect you and that ecosystem to drive at least 10x above and beyond what I would make as a platform provider in that space. >> So you expect some contributions? >> That's how it works. You need a thousand flowers to be running on the platform. >> You saw that with VMware. They hit 10x and ultimately got to 15 or 16, 17x. >> Shaun: Exactly. >> I think they don't talk about it anymore. I think it's probably trading the other way. >> You know my days at JBoss Red Hat it was somewhere between 15 to 20x. That was the value that was created on top of the platforms. >> What about the ... I want to ask you about the forking of the Hadoop distros. I mean there was a time when everybody was announcing Hadoop distros. John Furrier announced SiliconANGLE was announcing Hadoop distro. So we saw consolidation, and then you guys announced the ODP, then the ODPI initiative, but there seems to be a bit of a forking in Hadoop distros. Is that a fair statement? Unfair? >> I think if you look at how the Linux market played out. You have clearly Red Hat, you had Conicho Ubuntu, you had SUSE. You're always going to have curated platforms for different purposes. We have a strong opinion and a strong focus in the area of IoT, fast analytic data from the edge, and a centralized platform with HDP in the cloud and on-prem. Others in the market Cloudera is running sort of a different play where they're curating different elements and investing in different elements. Doesn't make either one bad or good, we are just going after the markets slightly differently. The other point I'll make there is in 2014 if you looked at the then chart diagrams, there was a lot of overlap. Now if you draw the areas of focus, there's a lot of white space that we're going after that they aren't going after, and they're going after other places and other new vendors are going after others. With the market dynamics of IoT, cloud and AI, you're going to see folks chase the market opportunities. >> Is that dispersity not a problem for customers now or is it challenging? >> There has to be a core level of interoperability and that's one of the reasons why we're collaborating with folks in the ODPI, as an example. There's still when it comes to some of the core components, there has to be a level of predictability, because if you're an ISV riding atop, you're slowed down by death by infinite certification and choices. So ultimately it has to come down to just a much more sane approach to what you can rely on. >> When you guys announced ODP, then ODPI, the extension, Mike Olson wrote a blog saying it's not necessary, people came out against it. Now we're three years in looking back. Was he right or not? >> I think ODPI take away this year, there's more than we can do above and beyond the Hadoop platform. It's expanded to include SQL and other things recently, so there's been some movement on this spec, but frankly you talk to John Mertic at ODPI, you talk to SAS and others, I think we want to be a bit more aggressive in the areas that we go after and try and drive there from a standardization perspective. >> We had Wei Wang on earlier-- >> Shaun: There's more we can do and there's more we should do. >> We had Wei on with Microsoft at our Big Data SV event a couple weeks ago. Talk about the Microsoft relationship with you guys. It seems to be doing very well. Comments on that. >> Microsoft was one of the two companies we chose to partner with early on, so and 2011, 2012 Microsoft and Teradata were the two. Microsoft was how do I democratize and make this technology easy for people. That's manifest itself as Azure Cloud Service, Azure HDInsight-- >> Which is growing like crazy. >> Which is globally deployed and we just had another update. It's fundamentally changed our engineering and delivering model. This latest release was a cloud first delivery model, so one of the things that we're proud of is the interactive SQL and the LLAP technology that's in HDP, that went out through Azure HDInsight what works data cloud first. Then it certified in HDP 2.6 and it went power at the same time. It's that cadence of delivery and cloud first delivery model. We couldn't do it without a partnership with Microsoft. I think we've really learned what it takes-- >> If you look at Microsoft at that time. I remember interviewing you on theCUBE. Microsoft was trading something like $26 a share at that time, around their low point. Now the stock is performing really well. Stockinnetel very cloud oriented-- >> Shaun: They're very open-source. >> They're very open-source and friendly they've been donating a lot to the OCP, to the data center piece. Extremely different Microsoft, so you slipped into that beautiful spot, reacted on that growth. >> I think as one of the stalwarts of enterprise software providers, I think they've done a really great job of bending the curve towards cloud and still having a mixed portfolio, but in sending a field, and sending a channel, and selling cloud and growing that revenue stream, that's nontrivial, that's hard. >> They know the enterprise sales motions too. I want to ask you how that's going over all within Hortonworks. What are some of the conversations that you're involved in with customers today? Again we were saying in our opening segment, it's on YouTube if you're not watching, but the customers is the forcing function right now. They're really putting the pressure one the suppliers, you're one of them, to get tight, reduce friction, lower costs of ownership, get into the cloud, flywheel. And so you see a lot-- >> I'll throw in another aspect some of the more late majority adopters traditionally, over and over right here by 2025 they want to power down the data center and have more things running in the public cloud, if not most everything. That's another eight years or what have you, so it's still a journey, but this journey to making that an imperative because of the operational, because of the agility, because of better predictability, ease of use. That's fundamental. >> As you get into the connected tissue, I love that example, with Kubernetes containers, you've got developers, a big open-source participant and you got all the stuff you have, you just start to see some coalescing around the cloud native. How do you guys look at that conversation? >> I view container platforms, whether they're container services that are running one on cloud or what have you, as the new lightweight rail that everything will ride atop. The cloud currently plays a key role in that, I think that's going to be the defacto way. In particularly if you go cloud first models, particularly for delivery. You need that packaging notion and you need the agility of updates that that's going to provide. I think Red Hat as a partner has been doing great things on hardening that, making it secure. There's others in the ecosystem as well as the cloud providers. All three cloud providers actually are investing in it. >> John: So it's good for your business? >> It removes friction of deployment ... And I ride atop that new rail. It can't get here soon enough from my perspective. >> So I want to ask about clouds. You were talking about the Microsoft shift, personally I think Microsoft realized holy cow, we could actaully make a lot of money if we're selling hardware services. We can make more money if we're selling the full stack. It was sort of an epiphany and so Amazon seems to be doing the same thing. You mentioned earlier you know Amazon is a great partner, even though a lot of people look at them as a competitor, it seems like Amazon, Azure etc., they're building out their own big data stack and offering it as a service. People say that's a threat to you guys, is it a threat or is it a tailwind, is it it is what it is? >> This is why I bring up industry-wide we always have waves of centralization, decentralization. They're playing out simultaneously right now with cloud and IoT. The fact of the matter is that you're going to have multiple clouds on-prem data and data at the edge. That's the problem I am looking to facilitate and solve. I don't view them as competitors, I view them as partners because we need to collaborate because there's a value chain of the flow of the data and some of it's going to be running through and on those platforms. >> The cloud's not going to solve the edge problem. Too expensive. It's just physics. >> So I think that's where things need to go. I think that's why we talk about this notion of connected data. I don't talk hybrid cloud computing, that's for compute. I talk about how do you connect to your data, how do you know where your data is and are you getting the right value out of the data by playing it where it lies. >> I think IoT has been a great sweet trend for the big data industry. It really accelerates the value proposition of the cloud too because now you have a connected network, you can have your cake and eat it too. Central and distributed. >> There's different dynamics in the US versus Europe, as an example. US definitely we're seeing a cloud adoption that's independent of IoT. Here in Europe, I would argue the smart mobility initiatives, the smart manufacturing initiatives, and the connected grid initiatives are bringing cloud in, so it's IoT and cloud and that's opening up the cloud opportunity here. >> Interesting. So on a prospects for Hortonworks cashflow positive Q4 you guys have made a public statement, any other thoughts you want to share. >> Just continue to grow the business, focus on these customer use cases, get them to talk about them at things like DataWorks Summit, and then the more the merrier, the more data-oriented open-source driven companies that can graduate in the public markets, I think is awesome. I think it will just help the industry. >> Operating in the open, with full transparency-- >> Shaun: On the business and the code. (laughter) >> Welcome to the party baby. This is theCUBE here at DataWorks 2017 in Munich, Germany. Live coverage, I'm John Furrier with Dave Vellante. Stay with us. More great coverage coming after this short break. (upbeat music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Hortonworks. Shaun great to see you again. Always a pleasure. in front of all the trends. Exactly. 99 is when you couldn't be happier for the and it's nice to see that graduating class Where's the value for you guys margins for the business You've got the edge, into the data center where you A subset of the data, yep. that failure's in the field, I got the hairy eyeball from you, With the community yeah, of the public markets. John: But you guys like if you look at our margins the market kind of flipped, and the cloud services, You get multiple revenue streams And that's how you grow the business, but now that you have kind on the Power Systems. called the Data Platform you have You provide the platform for 10x value to be running on the platform. You saw that with VMware. I think they don't between 15 to 20x. and then you guys announced the ODP, I think if you look at how and that's one of the reasons When you guys announced and beyond the Hadoop platform. and there's more we should do. Talk about the Microsoft the two companies we chose so one of the things that I remember interviewing you on theCUBE. so you slipped into that beautiful spot, of bending the curve towards cloud but the customers is the because of the operational, and you got all the stuff you have, and you need the agility of updates that And I ride atop that new rail. People say that's a threat to you guys, The fact of the matter is to solve the edge problem. and are you getting the It really accelerates the value and the connected grid you guys have made a public statement, that can graduate in the public Shaun: On the business and the code. Welcome to the party baby.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

Dave VellantePERSON

0.99+

JohnPERSON

0.99+

EuropeLOCATION

0.99+

AmazonORGANIZATION

0.99+

2014DATE

0.99+

John FurrierPERSON

0.99+

MicrosoftORGANIZATION

0.99+

John MerticPERSON

0.99+

Mike OlsonPERSON

0.99+

ShaunPERSON

0.99+

IBMORGANIZATION

0.99+

Shaun ConnollyPERSON

0.99+

CentricORGANIZATION

0.99+

TeradataORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Coca-ColaORGANIZATION

0.99+

John FurrierPERSON

0.99+

2016DATE

0.99+

4.1 billionQUANTITY

0.99+

ClouderaORGANIZATION

0.99+

AWSORGANIZATION

0.99+

90%QUANTITY

0.99+

twoQUANTITY

0.99+

100 millionQUANTITY

0.99+

fiveQUANTITY

0.99+

2011DATE

0.99+

Mount FujiLOCATION

0.99+

USLOCATION

0.99+

sevenQUANTITY

0.99+

185 millionQUANTITY

0.99+

eight yearsQUANTITY

0.99+

four yearsQUANTITY

0.99+

10xQUANTITY

0.99+

Dahl JeppePERSON

0.99+

YouTubeORGANIZATION

0.99+

FedExORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

100 millionQUANTITY

0.99+

oneQUANTITY

0.99+

MuleSoftORGANIZATION

0.99+

2025DATE

0.99+

Red HatORGANIZATION

0.99+

three yearsQUANTITY

0.99+

15QUANTITY

0.99+

two companiesQUANTITY

0.99+

2012DATE

0.99+

Munich, GermanyLOCATION

0.98+

HadoopTITLE

0.98+

DataWorks 2017EVENT

0.98+

Wei WangPERSON

0.98+

WeiPERSON

0.98+

10%QUANTITY

0.98+

eight yearsQUANTITY

0.98+

20xQUANTITY

0.98+

Hortonworks Hadoop SummitEVENT

0.98+

end of 2016DATE

0.98+

three billion dollarsQUANTITY

0.98+

SiliconANGLEORGANIZATION

0.98+

AzureORGANIZATION

0.98+

DataWorks SummitEVENT

0.97+

Day One Kickoff– DataWorks Summit Europe 2017 - #DW17 - #theCUBE


 

>> Narrator: Recovery. DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello everyone, welcome to The Cube's special presentation here in Munich, Germany for DataWorks Summit 2017. This is the Hadoop Summit powered by Hortonworks. This is their event and again, shows the transition from the Hadoop world to the big data world. I'm John Furrier. My co-host Dave Vellante, good to see you Dave. We're back in the seats together, usually on different events, but now here together in Munich. Great beer, great scene here. Small European event for Hortonworks and the ecosystem but it's called DataWorks 2017. Strata Hadoop is calling themselves Strata and Data. They're starting to see the word Hadoop being sunsetted from these events, which is a big theme of this year. The transition from Hadoop being the branded category to Data. >> Well, you're certainly seeing that in a number of ways. The titles of these events. Well, first of all, I love being in Europe. These venues are great, right? They're so Euro, very clean and magnificent. But back to your point. You're seeing the Hadoop Summit now called the DataWorks Summit. You're seeing the Strata Plus Hadoop is now Strata Plus, I don't even know what it is. Right, it's not Hadoop driven anymore. You see it also in Cloudera's IPO. They're going to talk about Hadoop and Hadoop Distro. They're a Hadoop Distro vendor but they talked about being a data management company and John, I think we are entering the era, or well deep into the era of what I have been calling for the last couple of years, profitless prosperity. Really where you see the Cloudera IPO, as you know, they raised money from Intel, over $600 million at a $4.1 billion dollar valuation. The Wall Street Journal says they'll have a tough time getting a billion dollar valuation. For every dollar each of these companies spends, Hortonworks and Cloudera, they lose between $1.70 and $2.50, so we've always said at SiliconANGLE, Wiki Bond and The Cube that people are going to make money in big data or the practitioners of big data, and it's hard to find those guys, it's hard to see them but that's really what's happening is the industries are transforming and those are the guys that are putting money into their bottom line. Not so much for technology vendors. >> Great to unpack that but first of all, I want to just say congratulations to Wiki Bond for getting it right again. As usual Wiki Bond, ahead of the curve and being out there and getting it right because I think you nailed it and I think Wiki Bond saw this first of all the research firms, kind of, you know, pat ourselves on the back here, but the truth is that practitioners are making the money and I think you're going to see more of that. In fact, last night as I'm having a nice beer here in Germany, I just like to listen to the conversations in the bar area and a lot of conversations around, real conversations around, you know, doing deals, and you know, deployments. You know, you're hearing about HBase, you're hearing about clusters, you're hearing about service revenue, and I think this is the focus. Cloudera, I think, in a classic Silicon Valley way, their hubris was tempered by their lack of scale. I mean, they didn't really blow it out. I mean, now they do 200 million in revenue. Nothing to shake a stick at, they did a great job, but they're buying revenue and Hortonworks is as well. But the ecosystem is the factor, and this is the wildcard. I'm making a prediction. Profitless prosperity that you point out is right, but I think that it has longevity with these companies like Hortonworks and Cloudera and others, like MapR because the ecosystem's robust. If you factor in the ecosystem revenue that is enough rising tide in my opinion. The question is how do they become sustainable as a standalone venture, that Red Hat for Hadoop never worked as Pat Gilson, you know, predicted. So, I think you're going to see a quick shift and pivot quickly by Hortonworks, certainly Cloudera's going to be under the microscope once they go public. I'm expecting that valuation to plummet like a rock. They're going to go public, Silicon Valley people are going to get their exits but. >> Excel will be happy. >> Everyone, yeah, they'll be happy. They already sold in 2013. They did a big sale, I mean, all of them cashed out two years ago when that liquidation event happened with Intel but that's fine. But now it's back to business building and Hortonworks has been doing it for years, so when you see your evaluation is less than a billion, so I'm expecting Cloudera to plummet like a rock. I would not buy the IPO at all because I think it's going to go well under a billion dollars. >> And I think it's the right call and as we know, last year, at the end of last year, Fidelity and other mutual funds devalued their holdings in Cloudera and so, you know, you've got this situation where, as you say, a couple hundred, maybe you know, on the way to 300 million in revenue, Hortonworks on the way to 200 million in revenue. Add up the ecosystem, yeah, maybe you get to a billion, throw in all of what IBM and Oracle call big data, and it's kind of a more interesting business, but you've called it same wine, new bottle. Is it a new bottle? Now, what I mean by that is the shift from Hadoop and then again, you read Cloudera's S1, it's all about AI, machine learning, you know, the cloud. Interesting, we'll talk about the cloud a little later, but is it same wine, new bottle, or is this really a shift toward a new era of innovation? >> It's not a new shift. It's the same innovation that the Hortonworks was founded on. Big data is a categorical and Hadoop was the horse they rode in on, but I think what's changing is the fact that customers are now putting real projects on the table and the scrutiny around those projects have to produce value, and the value comes down to total cost of ownership and business value. And that's becoming a data specific thing, and you look at all the successes in the big data world, Spark and others, you're seeing a focus on cloud integration and real-time workloads. These are real projects. This isn't fantasy. This isn't hype. This isn't early adopter. These are real companies saying we are moving to a new paradigm of digital transforming our companies and we need cost efficiencies but revenue-producing applications and workloads that are going to be running in the cloud with data at the heart of it. So, this is a customer-forcing function where the customers are generally excited about machine learning, moving to real-time classification of workloads. This is the deal and no hubris, no technology posturing, no open standards, jockeying can right the situation. Customers have demands and they want them filled, and we're going to have a lot of guests on here and I'm going to ask them those direct questions. What are you looking for and? >> Well, I totally agree with what you're saying and when we first met, it was right around the, you know, the mid point of the web 2.0 era, and I remember Tim Berners-Lee commenting on all this excitement, everybody's doing, he said this is what the web was invented to do, and this is what big data was invented to do. It was to produce deep analytics, deep learning, machine learning, you know, cognitive, as IBM likes to brand that, and so, it really is the next era even though people don't like to use the term big data anymore. We were talking to, you know, some of the folks in our community earlier, John, you and I, about some of the challenges. Why is it profitless, you know? Why is there so much growth but it's no profit? And you know, we have to point out here that people like Hortonworks and Cloudera, they've made some big bets, take HDSF of example. And now you have the cloud guys, particularly Amazon, coming in, you know, with S3. Look at YARN, big open source project. But you got Docker and Kubernetes seem to be mopping that up. Tez was supposed to replace MapReduce and now you've got. >> I mean, I wouldn't say mopping up, I mean. >> You've got Spark. >> At the end of the day the ecosystem's going to revolve around what the customers want, and portability of workloads, Kubernetes and microservices, these are areas that just absolutely make a lot of sense and I think, you know, people will move to where the frictionless action is and that's going to happen with Kubernetes and containers and microservices, but that just speaks to the devops culture, and I think Hadoop ecosystem, again, was grounded in the devops culture. So, yeah, there's some progress that are going to maybe go out of flavor, but there's other stuff coming up trough the ranks in open source and I think it's compelling. >> But where I disagree with what you're saying is well, the point I'm trying to make, is you have to, if you're Cloudera and Hortonworks, you have to support those multiple projects and it's expensive as hell. Whereas the cloud guys put all their wood behind one arrow, to use an old Scott McNealy phrase, and you know, Amazon, I would argue is mopping up in big data. I think the cloud guys, you know, it's ironic to me that Cloudera in the cloud era picked that name, you know, but really never had. >> John: They missed the cloud. >> They've never really had a strong cloud play, and I would say the same thing with Hortonworks and MapR. They have to play in the cloud and they talk about cloud, but they've got to support hybrid, they've got to support on param, they got to pick the clouds that they're going to support, AWS, Azure, maybe IBM's cloud. >> Look, Cloudera completely missed the cloud era, pun intended. However, they didn't miss open source but they're great at and I'm an admirer of Cloudera and Hortonworks on is that their open source ethos is what drove them, and so they kind of got isolated in with some of their product decisions, but that's not a bad thing. I mean, ultimately, I'm really bullish on Cloudera and Hortonworks because the ecosystem points I mentioned earlier are not high on the I wouldn't buy the IPO, I think I'd buy them at a discount, but Cloudera's not going to go away, Dave. They're going to go public. I think the valuation's going to drop like a rock and then settle around a billion, but they have good management. The founders still there, Michael Olson, Amr Awadallah. So, you're going to see Cloudera transform as a company. They have to do business out in the open and they're not afraid to, obviously they're open source. So, we're going to start to see that transition from a private venture backed, scale up, buy revenue. In the playbook of Silicon Valley venture capital's Excel partners and Greylock. Now they go public and get liquid and then now next phase of their journey is going to be build a public company and I think that they will do a good job doing it and I'm not down on them at all for that and I think it's just going to be a transition. >> Well, they're going to raise what? A couple 100 million dollars? But this industry, yeah, this industry's cashflow negative, so I agree with you. Open source is great, let's ra-ra for open source and it drives innovation, but how does this industry pay for itself? That's what I want to know. How you respond to that? >> Well, I think they have sustainable issues around services and I think partnering with the big companies like Intel that have professional services might help them on that front, but Michael Olson said in his founder's letter in his S1, kind of AI washing, he said AI and cognitive. But that's okay because Cloudera could easily pivot with their brain power, and same with Hortonworks to AI. Machine learning is very open source driven. Open source culture is growing, it's not going away, so I think Cloudera's in a very good position. >> I think the cloud guys are going to kill them in that game, and cloud guys and IBM are going to cream these profitless startups in that AI and machine learning game. >> We'll see. >> You disagree? >> I disagree, I think. Well, I mean, it depends. I mean, you know, I'm not going to, you know, forecast what the managements might do, but I mean, if I'm cloud looking at what Cloudera's done. >> What would you do? >> I would do exactly what Mike Olson's doing is I'd basically pivot immediately to machine learning. Look at Google. TensorFlow it's go so much traction with their cloud because it's got machine learning built into it. Open source is where the action is, and that's where you could do a lot of good work and use it as an advantage in that they know that game. I would not count out the open source game. >> So, we know how IBM makes money at that, you know, in theory anyway it wants. We know how Amazon's going to make money at that with their priority approach, Microsoft will do the same thing. How to Cloudera and Hortonworks make money? >> I think it's a product transition around getting to the open source with cloud technologies. Amazon is not out to kill open source, so I think there's an opportunity to wedge in a position there, and so they just got to move quickly. If they don't make these decisions then that's a failed execution on the management team at Cloudera and Hortonworks and I think they're on it. So, we'll keep an eye on that. >> No, Amazon's not trying to kill open source, I would agree, but they are bogarting open source in a big way and profiting amazingly from it. >> Well, they just do what Amy Jessie would say, they're customer driven. So, if a customer doesn't want to do five things to do one thing this is back to my point. The customers want real-time workloads. They want it with open source and they don't want all these steps in the cost of ownership. That's why this is not a new shift, it's the same wine, new bottle because now you're just seeing real projects that are demanding successful and efficient code and support and whoever delivers it builds the better mousetrap. In this case, the better mousetrap will win. >> And I'm arguing that the better mousetrap and the better marginal economics, I know I'm like a broken record on this, but if I take Kinesis and DynamoDB and Red Ship and wrap it into my big data play, offer it as a service with a set of APIs on the cloud, like AWS is going to do, or is doing, and Azure is doing, that's a better business model than, as you say, five different pieces that I have to cobble together. It's just not economically viable for customers to do that. >> Well, we've got some big new coming up here. We're going to have two days of wall-to-wall coverage of DataWorks 2017. Hortonworks announcing 2.6 of their Hadoop Hortonworks data platform. We're going to talk to Scott now, the CTO, coming up shortly. Stay with us for exclusive coverage of DataWorks in Munich, Germany 2017. We'll be back with more after this short break.

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. Hortonworks and the ecosystem and it's hard to find those guys, and you know, deployments. going to go well under and then again, you read Cloudera's S1, and I'm going to ask them and so, it really is the next era I mean, I wouldn't and that's going to happen with Kubernetes and you know, Amazon, that they're going to support, and I think that they will Well, they're going to raise what? and same with Hortonworks to AI. and cloud guys and IBM are going to cream I mean, you know, and that's where you could to make money at that and so they just got to move quickly. to kill open source, and they don't want all these steps and the better marginal economics, We're going to talk to Scott now, the CTO,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Dave VellantePERSON

0.99+

Michael OlsonPERSON

0.99+

IBMORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

AmazonORGANIZATION

0.99+

EuropeLOCATION

0.99+

2013DATE

0.99+

Amy JessiePERSON

0.99+

JohnPERSON

0.99+

ClouderaORGANIZATION

0.99+

FidelityORGANIZATION

0.99+

OracleORGANIZATION

0.99+

Mike OlsonPERSON

0.99+

GermanyLOCATION

0.99+

MunichLOCATION

0.99+

Wiki BondORGANIZATION

0.99+

$2.50QUANTITY

0.99+

DavePERSON

0.99+

ScottPERSON

0.99+

John FurrierPERSON

0.99+

last yearDATE

0.99+

MapRORGANIZATION

0.99+

AWSORGANIZATION

0.99+

MicrosoftORGANIZATION

0.99+

200 millionQUANTITY

0.99+

Pat GilsonPERSON

0.99+

IntelORGANIZATION

0.99+

less than a billionQUANTITY

0.99+

two daysQUANTITY

0.99+

Scott McNealyPERSON

0.99+

Tim Berners-LeePERSON

0.99+

Silicon ValleyLOCATION

0.99+

over $600 millionQUANTITY

0.99+

The CubeORGANIZATION

0.99+

SiliconANGLEORGANIZATION

0.99+

DataWorks SummitEVENT

0.99+

HadoopORGANIZATION

0.98+

Hadoop DistroORGANIZATION

0.98+

300 millionQUANTITY

0.98+

two years agoDATE

0.98+

DataWorks 2017EVENT

0.98+

GoogleORGANIZATION

0.98+

Hadoop SummitEVENT

0.98+

eachQUANTITY

0.98+

a billionQUANTITY

0.97+

DataWorks Summit 2017EVENT

0.97+

billion dollarQUANTITY

0.97+

Amr AwadallahPERSON

0.97+

Munich, GermanyLOCATION

0.97+

Raymie Stata, SAP - Big Data SV 17 - #BigDataSV - #theCUBE


 

>> Announcer: From San Jose, California, it's The Cube, covering Big Data Silicon Valley 2017. >> Welcome back everyone. We are at Big Data Silicon Valley, running in conjunction with Strata + Hadoop World in San Jose. I'm George Gilbert and I'm joined by Raymie Stata, and Raymie was most recently CEO and Founder of Altiscale. Hadoop is a service vendor. One of the few out there, not part of one of the public clouds. And in keeping with all of the great work they've done, they got snapped up by SAP. So, Rami, since we haven't seen you, I think on The Cube since then, why don't you catch us up with all that, the good work that's gone on between you and SAP since then. >> Sure, so the acquisition closed back in September, so it's been about six months. And it's been a very busy six months. You know, there's just a lot of blocking and tackling that needs to happen. So, you know, getting people on board. Getting new laptops, all that good stuff. But certainly a huge effort for us was to open up a data center in Europe. We've long had demand to have that European presence, both because I think there's a lot of interest over in Europe itself, but also large, multi-national companies based in the US, you know, it's important for them to have that European presence as well. So, it was a natural thing to do as part of SAP, so kind of first order of business was to expand over into Europe. So that was a big exercise. We've actually had some good traction on the sales side, right, so we're getting new customers, larger customers, more demanding customers, which has been a good challenge too. >> So let's pause for a minute on, sort of unpack for folks, what Altiscale offered, the core services. >> Sure. >> That were, you know, here in the US, and now you've extended to Europe. >> Right. So our core platform is kind of Hadoop, Hive, and Spark, you know, as a service in the cloud. And so we would offer HDFS and YARN for Hadoop. Spark and Hive kind of well-integrated. And we would offer that as a cloud service. So you would just, you know, get an account, login, you know, store stuff in HDFS, run your Spark programs, and the way we encourage people to think about it is, I think very often vendors have trained folks in the big data space to think about nodes. You know, how many nodes am I going to get? What kind of nodes am I going to get? And the way we really force people to think twice about Hadoop and what Hadoop as a service means is, you know, they don't, why are you asking that? You don't need to know about nodes. Just store stuff, run your jobs. We worry about nodes. And that, you know, once people kind of understood, you know, just how much complexity that takes out of their lives and how that just enables them to truly focus on using these technologies to get business value, rather that operating them. You know, there's that aha moment in the sales cycle, where people say yeah, that's what I want. I want Hadoop as a service. So that's been our value proposition from the beginning. And it's remained quite constant, and even coming into SAP that's not changing, you know, one bit. >> So, just to be clear then, it's like a lot of the operational responsibilities sort of, you took control over, so that when you say, like don't worry about nodes, it's customer pours x amount of data into storage, which in your case would be HDFS, and then compute is independent of that. They need, you spin up however many, or however much capacity they need, with Spark for instance, to process it, or Hive. Okay, so. >> And all on demand. >> Yeah so it sounds like it's, how close to like the Big Query or Athena services, Athena on AWS or Big Query on Google? Where you're not aware of any servers, either for storage or for compute? >> Yeah I think that's a very good comparable. It's very much like Athena and Big Query where you just store stuff in tables and you issue queries and you don't worry about how much compute, you know, and managing it. I think, by throwing, you know, Spark in the equation, and YARN more generally, right, we can handle a broader range of these cases. So, for example, you don't have to store data in tables, you can store them into HDFS files which is good for processing log data, for example. And with Spark, for example, you have access to a lot of machine learning algorithms that are a little bit harder to run in the context of, say, Athena. So I think it's the same model, in terms of, it's fully operated for you. But a broader platform in terms of its capabilities. >> Okay, so now let's talk about what SAP brought to the table and how that changed the use cases that were appropriate for Altiscale. You know, starting at the data layer. >> Yeah, so, I think the, certainly the, from the business perspective, SAP brings a large, very engaged customer base that, you know, is eager to embrace, kind of a data-driven mindset and culture and is looking for a partner to help them do that, right. And so that's been great to be in that environment. SAP has a number of additional technologies that we've been integrating into the Altiscale offering. So one of them is Vora, which is kind of an interactive sequel engine, it also has time series capabilities and graph capabilities and search capabilities. So it has a lot of additive capabilities, if you will, to what we have at Altiscale. And it also integrates very deeply into HANA itself. And so we now have that for a technology available as a service at Altiscale. >> Let me make sure, so that everyone understands, and so I understand too, is that so you can issue queries from HANA and they can, you know, beyond just simple sequel queries, they can handle the time series, and predictive analytics, and access data sort of seamlessly that's in Hadoop, or can it go the other way as well? >> It's both ways. So you can, you know, from HANA you can essentially federate out into Vora. And through that access data that's in a Hadoop cluster. But it's also the other way around. A lot of times there's an analyst who really lives in the big data world, right, they're in the Hadoop world, but they want to join in data that's sitting in a HANA database, you know. Might be dimensions in a warehouse or, you know, customer details even in a transactional system. And so, you know, that Hadoop-based analyst now has access to data that's out in those HANA databases. >> Do you have some Lighthouse accounts that are working with this already? >> Yes, we do. (laughter) >> Yes we do, okay. I guess that was the diplomatic way of saying yes. But no comment. Alright, so tell us more about SAPs big data stack today and how that might evolve. >> Yeah, of course now, especially that now we've got the Spark, Hadoop, Hive offering that we have. And then four sitting on top of that. There's an offering called Predictive Analytics, which is Spark-based predictive analytics. >> Is that something that came from you, or is that, >> That's an SAP thing, so this is what's been great about the acquisition is that SAP does have a lot of technologies that we can now integrate. And it brings new capabilities to our customer base. So those three are kind of pretty key. And then there's something called Data Services as well, which allows us to move data easily in and out of, you know, HANA and other data stores. >> Is it, is this ability to federate queries between Hadoop and HANA and then migration of the data between the stores, does that, has that changed the economics of how much data people, SAP customers, maintain and sort of what types of apps they can build on it now that they might, it's economically feasible to store a lot more data. >> Well, yes and no. I think the context of Altiscale, both before and after the acquisition is very often there's, what you might call a big data source, right. It could be your web logs, it could be some IOT generated log data, it could be social media streams. You know, this is data that's, you know, doesn't have a lot of structure coming in. It's fairly voluminous. It doesn't, very naturally, go into a sequel database, and that's kind of the sweet spot for the big data technologies like Hadoop and Spark. So, those datas come into your big data environment. You can transform it, you can do some data quality on it. And then you can eventually stage it out into something like HANA data mart, where it, you know, to make it available for reporting. But obviously there's stuff that you can do on the larger dataset in Hadoop as well. So, in a way, yes, you can now tame, if you will, those huge data sources that, you know, weren't practical to put into HANA databasing. >> If you were to prioritize, in the context of, sort of, the applications SAP focuses on, would you be, sort of, with the highest priority use case be IOT related stuff, where, you know, it was just prohibitive to put it in HANA since it's mostly in memory. But, you know, SAP is exposed to tons of that type of data, which would seem to most naturally have an afinity to Altiscale. >> Yeah, so, I mean, IOT is a big initiative. And is a great use case for big data. But, you know, financial-to-financial services industry, as another example, is fairly down the path using Hadoop technologies for many different use cases. And so, that's also an opportunity for us. >> So, let me pop back up, you know, before we have to wrap. With Altiscale as part of the SAP portfolio, have the two companies sort of gone to customers with a more, with more transformational options, that, you know, you'll sell together? >> Yeah, we have. In fact, Altiscale actually is no longer called Altiscale, right? We're part of a portfolio of products, you know, known as the SAP Cloud Platform. So, you know, under the cloud platform we're the big data services. The SAP Cloud Platform is all about business transformation. And business innovation. And so, we bring to that portfolio the ability to now bring the types of data sources that I've just discussed, you know, to bear on these transformative efforts. And so, you know, we fit into some momentum SAP already has, right, to help companies drive change. >> Okay. So, along those lines, which might be, I mean, we know the financial services has done a lot of work with, and I guess telcos as well, what are some of the other verticals that look like they're primed to fall, you know, with this type of transformational network? >> So you mentioned one, which I kind of call manufacturing, right, and there tends to be two kind of different use cases there. One of them I call kind of the shop floor thing. Where you're collecting a lot of sensor data, you know, out of a manufacturing facility with the goal of increasing yield. So you've got the shop floor. And then you've got the, I think, more commonly discussed measuring stuff out in the field. You've got a product, you know, out in the field. Bringing the telemetry back. Doing things like predictive meetings. So, I think manufacturing is a big sector ready to go for big data. And healthcare is another one. You know, people pulling together electronic medical records, you know trying to combine that with clinical outcomes, and I think the big focus there is to drive towards, kind of, outcome-based models, even on the payment side. And big data is really valuable to drive and assess, you know, kind of outcomes in an aggregate way. >> Okay. We're going to have to leave it on that note. But we will tune back in at I guess Sapphire or TechEd, whichever of the SAP shows is coming up next to get an update. >> Sapphire's next. Then TechEd. >> Okay. With that, this is George Gilbert, and Raymie Stata. We will be back in few moments with another segment. We're here at Big Data Silicon Valley. Running in conjunction with Strata + Hadoop World. Stay tuned, we'll be right back.

Published Date : Mar 15 2017

SUMMARY :

it's The Cube, covering Big One of the few out there, companies based in the US, you So let's pause for a minute That were, you know, here in the US, And that, you know, once so that when you say, you know, and managing it. You know, starting at the data layer. very engaged customer base that, you know, And so, you know, that Yes, we do. and how that might evolve. the Spark, Hadoop, Hive in and out of, you know, migration of the data You know, this is data that's, you know, be IOT related stuff, where, you know, But, you know, financial-to-financial So, let me pop back up, you know, And so, you know, we fit into you know, with this type you know, out of a manufacturing facility We're going to have to Gilbert, and Raymie Stata.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
EuropeLOCATION

0.99+

George GilbertPERSON

0.99+

George GilbertPERSON

0.99+

SeptemberDATE

0.99+

USLOCATION

0.99+

Raymie StataPERSON

0.99+

AltiscaleORGANIZATION

0.99+

San JoseLOCATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

RaymiePERSON

0.99+

OneQUANTITY

0.99+

six monthsQUANTITY

0.99+

TechEdORGANIZATION

0.99+

two companiesQUANTITY

0.99+

HANATITLE

0.99+

SAPORGANIZATION

0.99+

RamiPERSON

0.99+

HadoopORGANIZATION

0.99+

HadoopTITLE

0.99+

Big DataORGANIZATION

0.99+

threeQUANTITY

0.99+

SapphireORGANIZATION

0.99+

bothQUANTITY

0.98+

twiceQUANTITY

0.98+

SAP Cloud PlatformTITLE

0.98+

oneQUANTITY

0.98+

about six monthsQUANTITY

0.98+

SparkTITLE

0.98+

AWSORGANIZATION

0.98+

GoogleORGANIZATION

0.97+

both waysQUANTITY

0.97+

AthenaTITLE

0.97+

Strata + Hadoop WorldORGANIZATION

0.96+

StrataORGANIZATION

0.92+

Predictive AnalyticsTITLE

0.91+

AthenaORGANIZATION

0.91+

one bitQUANTITY

0.9+

first orderQUANTITY

0.89+

The CubeORGANIZATION

0.89+

VoraTITLE

0.88+

Big QueryTITLE

0.87+

todayDATE

0.86+

Yuanhao Sun, Transwarp Technology - BigData SV 2017 - #BigDataSV - #theCUBE


 

>> Announcer: Live from San Jose, California, it's theCUBE, covering Big Data Silicon Valley 2017. (upbeat percussion music) >> Okay, welcome back everyone. Live here in Silicon Valley, San Jose, is the Big Data SV, Big Data Silicon Valley in conjunction with Strata Hadoop, this is theCUBE's exclusive coverage. Over the next two days, we've got wall-to-wall interviews with thought leaders, experts breaking down the future of big data, future of analytics, future of the cloud. I'm John Furrier with my co-host George Gilbert with Wikibon. Our next guest is Yuanhao Sun, who's the co-founder and CTO of Transwarp Technologies. Welcome to theCUBE. You were on, during the, 166 days ago, I noticed, on theCUBE, previously. But now you've got some news. So let's get the news out of the way. What are you guys announcing here, this week? >> Yes, so we are announcing 5.0, the latest version of Transwarp Hub. So in this version, we will call it probably revolutionary product, because the first one is we embedded communities in our product, so we will allow people to isolate different kind of workloads, using dock and containers, and we also provide a scheduler to better support mixed workloads. And the second is, we are building a set of tools allow people to build their warehouse. And then migrate from existing or traditional data warehouse to Hadoop. And we are also providing people capability to build a data mart, actually. It allow you to interactively query data. So we build a column store in memory and on SSD. And we totally write the whole SQL engine. That is a very tiny SQL engine, allow people to query data very quickly. And so today that tiny SQL engine is like about five to ten times faster than Spark 2.0. And we also allow people to build cubes on top of Hadoop. And then, once the cube is built, the SQL performance, like the TBCH performance, is about 100 times faster than existing database, or existing Spark 2.0. So it's super-fast. And in, actually we found a Paralect customer, so they replace their data with software, to build a data mart. And we already migrate, say 100 reports, from their data to our product. So the promise is very good. And the first one is we are providing tool for people to build the machine learning pipelines and we are leveraging TensorFlow, MXNet, and also Spark for people to visualize the pipeline and to build the data mining workflows. So this is kind of like Datasense tools, it's very easy for people to use. >> John: Okay, so take a minute to explain, 'cus that was great, you got the performance there, that's the news out of the way. Take a minute to explain Transwarp, your value proposition, and when people engage you as a customer. >> Yuanhao: Yeah so, people choose our product and the major reason is our compatibility to Oracle, DV2, and teradata SQL syntax, because you know, they have built a lot of applications onto those databases, so when they migrate to Hadoop, they don't want to rewrote whole program, so our compatibility, SQL compatibility is big advantage to them, so this is the first one. And we also support full ANCIT and distribute transactions onto Hadoop. So that a lot of applications can be migrate to our product, with few modification or without any changes. So this is the first our advantage. The second is because we are providing, even the best streaming engine, that is actually derived from Spark. So we apply this technology to IOT applications. You know the IOT pretty soon, they need a very low latency but they also need very complicated models on top of streams. So that's why we are providing full SQL support and machine learning support on top of streaming events. And we are also using event-driven technology to reduce the latency, to five to ten milliseconds. So this is second reason people choose our product. And then today we are announcing 5.0, and I think people will find more reason to choose our product. >> So you have the compatibility SQL, you have the tooling, and now you have the performance. So kind of the triple threat there. So what's the customer saying, when you go out and talk with your customers, what's the view of the current landscape for customers? What are they solving right now, what are the key challenges and pain points that customers have today? >> We have customers in more than 12 vertical segments, and in different verticals they have different pain points, actually so. Take one example: in financial services, the main pain point for them is to migrate existing legacy applications to Hadoop, you know they have accumulated a lot of data, and the performance is very bad using legacy database, so they need high performance Hadoop and Spark to speed up the performance, like reports. But in another vertical, like in logistic and transportation and IOT, the pain point is to find a very low latency streaming engine. At the same time, they need very complicated programming model to write their applications. And that example, like in public sector, they actually need very complicated and large scale search engine. They need to build analytical capability on top of search engine. They can search the results and analyze the result in the same time. >> George: Yuanhao, as always, whenever we get to interview you on theCube, you toss out these gems, sort of like you know diamonds, like big rocks that under millions of years, and incredible pressure, have been squeezed down into these incredibly valuable, kind of, you know, valuable, sort of minerals with lots of goodness in them, so I need you to unpack that diamond back into something that we can make sense out of, or I should say, that's more accessible. You've done something that none of the Hadoop Distro guys have managed to do, which is to build databases that are not just decision support, but can handle OLTP, can handle operational applications. You've done the streaming, you've done what even Databricks can't do without even trying any of the other stuff, which is getting the streaming down to event at a time. Let's step back from all these amazing things, and tell us what was the secret sauce that let you build a platform this advanced? >> So actually, we are driven by our customers, and we do see the trends people are looking for, better solutions, you know there are a lot of pain to set up a habitable class to use the Hadoop technology. So that's why we found it's very meaningful and also very necessary for us to build a SQL database on top of Hadoop. Quite a lot of customers in FS side, they ask us to provide asset until the transaction can be put on top of Hadoop, because they have to guarantee the consistency of their data. Otherwise they cannot use the technology. >> At the risk of interrupting, maybe you can tell us why others have built the analytic databases on top of Hadoop, to give the familiar SQL access, and obviously have a desire also to have transactions next to it, so you can inform a transaction decision with the analytics. One of the questions is, how did you combine the two capabilities? I mean it only took Oracle like 40 years. >> Right, so. Actually our transaction capability is only for analytics, you know, so this OLTP capability it is not for short term transactional applications, it's for data warehouse kind of workloads. >> George: Okay, so when you're ingesting. >> Yes, when you're ingesting, when you modify your data, in batch, you have to guarantee the consistency. So that's the OLTP capability. But we are also building another distributed storage, and distributed database, and that are providing that with OLTP capability. That means you can do concurrent transactions, on that database, but we are still developing that software right now. Today our product providing the digital transaction capability for people to actually build their warehouse. You know quite a lot of people believe data warehouse do not need transaction capability, but we found a lot of people modify their data in data warehouse, you know, they are loading their data continuously to data warehouse, like the CRM tables, customer information, they can be changed over time. So every day people need to update or change the data, that's why we have to provide transaction capability in data warehouse. >> George: Okay, and then so then well tell us also, 'cus the streaming problem is, you know, we're told that roughly two thirds of Spark deployments use streaming as a workload. And the biggest knock on Spark is that it can't process one event at a time, you got to do a little batch. Tell us some of the use cases that can take advantage of doing one event at a time, and how you solved that problem? >> Yuanhao: Yeah so the first use case we encounter is the anti-fraud, or fraud detection application in FSI, so whenever you swipe your credit card, the bank needs to tell you if the transaction is a fraud or not in a few milliseconds. But if you are using Spark streaming, it will usually take 500 milliseconds, so the latency is too high for such kind of application. And that's why we have to provide event per time, like means event-driven processing to detect the fraud, so that we can interrupt the transaction in a few milliseconds, so that's one kind of application. The other can come from IOT applications, so we already put our streaming framework in large manufacture factory. So they have to detect the main function of their equipments in a very short time, otherwise it may explode. So if you... So if you are using Spark streaming, probably when you submit your application, it will take you hundreds of milliseconds, and when you finish your detection, it usually takes a few seconds, so that will be too long for such kind of application. And that's why we need a low latency streaming engine, but you can see it is okay to use Storm or Flink, right? And problem is, we found it is: They need a very complicated programming model, that they are going to solve equation on the streaming events, they need to do the FFT transformation. And they are also asking to run some linear regression or some neural network on top of events, so that's why we have to provide a SQL interface and we are also embedding the CEP capability into our streaming engine, so that you can use pattern to match the events and to send alerts. >> George: So, SQL to get a set of events and maybe join some in the complex event processing, CEP, to say, does this fit a pattern I'm looking for? >> Yuanhao: Yes. >> Okay, and so, and then with the lightweight OLTP, that and any other new projects you're looking at, tell us perhaps the new use cases you'd be appropriated for. >> Yuanhao: Yeah so that's our official product actually, so we are going to solve the problem of large scale OLTP transaction problems like, so you know, a lot of... You know, in China, there is so many population, like in public sector or in banks, they need build a highly scalable transaction systems so that they can support a very high concurrent transactions at the same time, so that's why we are building such kind of technology. You know, in the past, people just divide transaction into multiple databases, like multiple Oracle instances or multiple mySQL instances. But the problem is: if the application is simple, you can very easily divide a transaction over the multiple instances of databases. But if the application is very complicated, especially when the ISV already wrote the applications based on Oracle or traditional database, they already depends on the transaction systems so that's why we have to build a same kind of transaction systems, so that we can support their legacy applications, but they can scale to hundreds of nodes, and they can scale to millions of transactions per second. >> George: On the transactional stuff? >> Yuanhao: Yes. >> Just correct me if I'm wrong, I know we're running out of time but I thought Oracle only scales out when you're doing decision support work, not when you're doing OLTP, not that it, that it can only, that it can maybe stretch to ten nodes or something like that, am I mistaken? >> Yuanhao: Yes, they can scale to 16 to all 32 nodes. >> George: For transactional work? >> For transaction works, but so that's the theoretical limit, but you know, like Google F1 and Google Spanner, they can scale to hundreds of nodes. But you know, the latency is higher than Oracle because you have to use distributed particle to communicate with multiple nodes, so the latency is higher. >> On Google? >> Yes. >> On Google. The latency is higher on the Google? >> 'Cus it has to go like all the way to Europe and back. >> Oracle or Google latency, you said? >> Google, because if you are using two phase commit protocol you have to talk to multiple nodes to broadcast your request to multiple nodes, and then wait for the feedback, so that mean you have a much higher latency, but it's necessary to maintain the consistency. So in a distributed OLTP databases, the latency is usually higher, but the concurrency is also much higher, and scalability is much better. >> George: So that's a problem you've stretched beyond what Oracle's done. >> Yuanhao: Yes, so because customer can tolerant the higher latency, but they need to scale to millions of transactions per second, so that's why we have to build a distributed database. >> George: Okay, for this reason we're going to have to have you back for like maybe five or ten consecutive segments, you know, maybe starting tomorrow. >> We're going to have to get you back for sure. Final question for you: What are you excited about, from a technology, in the landscape, as you look at open source, you're working with Spark, you mentioned Kubernetes, you have micro services, all the cloud. What are you most excited about right now in terms of new technology that's going to help simplify and scale, with low latency, the databases, the software. 'Cus you got IOT, you got autonomous vehicles, you have all this data, what are you excited about? >> So actually, so this technology we already solve these problems actually, but I think the most exciting thing is we found... There's two trends, the first trend is: We found it's very exciting to find more competition framework coming out, like the AI framework, like TensorFlow and MXNet, Torch, and tons of such machine learning frameworks are coming out, so they are solving different kinds of problems, like facial recognition from video and images, like human computer interactions using voice, using audio. So it's very exciting I think, but for... And also it's very, we found it's very exciting we are embedding these, we are combining these technologies together, so that's why we are using competitors you know. We didn't use YARN, because it cannot support TensorFlow or other framework, but you know, if you are using containers and if you have good scheduler, you can schedule any kind of competition frameworks. So we found it's very interesting to, to have these new frameworks, and we can combine together to solve different kinds of problems. >> John: Thanks so much for coming onto theCube, it's an operating system world we're living in now, it's a great time to be a technologist. Certainly the opportunities are out there, and we're breaking it down here inside theCube, live in Silicon Valley, with the best tech executives, best thought leaders and experts here inside theCube. I'm John Furrier with George Gilbert. We'll be right back with more after this short break. (upbeat percussive music)

Published Date : Mar 14 2017

SUMMARY :

Jose, California, it's theCUBE, So let's get the news out of the way. And the first one is we are providing tool and when people engage you as a customer. And then today we are announcing 5.0, So kind of the triple threat there. the pain point is to find so I need you to unpack because they have to guarantee next to it, so you can you know, so this OLTP capability So that's the OLTP capability. 'cus the streaming problem is, you know, the bank needs to tell you Okay, and so, and then and they can scale to millions scale to 16 to all 32 nodes. so the latency is higher. The latency is higher on the Google? 'Cus it has to go like all so that mean you have George: So that's a the higher latency, but they need to scale segments, you know, to get you back for sure. like the AI framework, like it's a great time to be a technologist.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

GeorgePERSON

0.99+

JohnPERSON

0.99+

John FurrierPERSON

0.99+

ChinaLOCATION

0.99+

fiveQUANTITY

0.99+

EuropeLOCATION

0.99+

Transwarp TechnologiesORGANIZATION

0.99+

40 yearsQUANTITY

0.99+

500 millisecondsQUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

San Jose, CaliforniaLOCATION

0.99+

hundreds of nodesQUANTITY

0.99+

HadoopTITLE

0.99+

TodayDATE

0.99+

ten nodesQUANTITY

0.99+

firstQUANTITY

0.99+

OracleORGANIZATION

0.99+

100 reportsQUANTITY

0.99+

tomorrowDATE

0.99+

secondQUANTITY

0.99+

first oneQUANTITY

0.99+

Yuanhao SunPERSON

0.99+

second reasonQUANTITY

0.99+

Spark 2.0TITLE

0.99+

todayDATE

0.99+

this weekDATE

0.99+

ten timesQUANTITY

0.99+

16QUANTITY

0.99+

two trendsQUANTITY

0.99+

YuanhaoPERSON

0.99+

SQLTITLE

0.99+

SparkTITLE

0.99+

first trendQUANTITY

0.99+

two capabilitiesQUANTITY

0.98+

Silicon Valley, San JoseLOCATION

0.98+

TensorFlowTITLE

0.98+

one eventQUANTITY

0.98+

32 nodesQUANTITY

0.98+

theCUBEORGANIZATION

0.98+

TorchTITLE

0.98+

166 days agoDATE

0.98+

one exampleQUANTITY

0.98+

more than 12 vertical segmentsQUANTITY

0.97+

ten millisecondsQUANTITY

0.97+

hundreds of millisecondsQUANTITY

0.97+

two thirdsQUANTITY

0.97+

MXNetTITLE

0.97+

DatabricksORGANIZATION

0.96+

GoogleORGANIZATION

0.96+

ten consecutive segmentsQUANTITY

0.95+

first useQUANTITY

0.95+

WikibonORGANIZATION

0.95+

Big Data Silicon ValleyORGANIZATION

0.95+

Strata HadoopORGANIZATION

0.95+

about 100 timesQUANTITY

0.94+

Big Data SVORGANIZATION

0.94+

One ofQUANTITY

0.94+

Yaron Haviv | BigData SV 2017


 

>> Announcer: Live from San Jose, California, it's the CUBE, covering Big Data Silicon Valley 2017. (upbeat synthesizer music) >> Live with the CUBE coverage of Big Data Silicon Valley or Big Data SV, #BigDataSV in conjunction with Strata + Hadoop. I'm John Furrier with the CUBE and my co-host George Gilbert, analyst at Wikibon. I'm excited to have our next guest, Yaron Haviv, who's the founder and CTO of iguazio, just wrote a post up on SiliconANGLE, check it out. Welcome to the CUBE. >> Thanks, John. >> Great to see you. You're in a guest blog this week on SiliconANGLE, and always great on Twitter, cause Dave Alante always liked to bring you into the contentious conversations. >> Yaron: I like the controversial ones, yes. (laughter) >> And you add a lot of good color on that. So let's just get right into it. So your company's doing some really innovative things. We were just talking before we came on camera here, about some of the amazing performance improvements you guys have on many different levels. But first take a step back, and let's talk about what this continuous analytics platform is, because it's unique, it's different, and it's got impact. Take a minute to explain. >> Sure, so first a few words on iguazio. We're developing a data platform which is unified, so basically it can ingest data through many different APIs, and it's more like a cloud service. It is for on-prem and edge locations and co-location, but it's managed more like a cloud platform so very similar experience to Amazon. >> John: It's software? >> It's software. We do integrate a lot with hardware in order to achieve our performance, which is really about 10 to 100 times faster than what exists today. We've talked to a lot of customers and what we really want to focus with customers in solving business problems, Because I think a lot of the Hadoop camp started with more solving IT problems. So IT is going kicking tires, and eventually failing based on your statistics and Gardner statistics. So what we really wanted to solve is big business problems. We figured out that this notion of pipeline architecture, where you ingest data, and then curate it, and fix it, et cetera, which was very good for the early days of Hadoop, if you think about how Hadoop started, was page ranking from Google. There was no time sensitivity. You could take days to calculate it and recalibrate your search engine. Based on new research, everyone is now looking for real time insights. So there is sensory data from (mumbles), there's stock data from exchanges, there is fraud data from banks, and you need to act very quickly. So this notion of and I can give you examples from customers, this notion of taking data, creating Parquet file and log files, and storing them in S3 and then taking Redshift and analyzing them, and then maybe a few hours later having an insight, this is not going to work. And what you need to fix is, you have to put some structure into the data. Because if you need to update a single record, you cannot just create a huge file of 10 gigabyte and then analyze it. So what we did is, basically, a mechanism where you ingest data. As you ingest the data, you can run multiple different processes on the same thing. And you can also serve the data immediately, okay? And two examples that we demonstrate here in the show, one is video surveillance, very nice movie-style example, that you, basically, ingest pictures for S3 API, for object API, you analyze the picture to detect faces, to detect scenery, to extract geolocation from pictures and all that, all those through different processes. TensorFlow doing one, serverless functions that we have, do other simpler tasks. And in the same time, you can have dashboards that just show everything. And you can have Spark, that basically does queries of where was this guys last seen? Or who was he with, you know, or think about the Boston Bomber example. You could just do it in real time. Because you don't need this notion of pipeline. And this solves very hard business problems for some of the customers we work with. >> So that's the key innovation, there's no pipe lining. And what's the secret sauce? >> So first, our system does about a couple of million of transactions per second. And we are a multi-modal database. So, basically, you can ingest data as a stream, exactly the same data could be read by Spark as a table. So you could, basically, issue a query on the same data. Give me everything that has a certain pattern or something, and could also be served immediately through RESTful APIs to a dashboard running AngularJS or something like that. So that's the secret sauce, is by having this integration, and this unique data model, it allows you all those things to work together. There are other aspects, like we have transactional semantics. One of the challenges is how do you make sure that a bunch of processes don't collide when they update the same data. So first you need a very low ground alert. 'cause each one may update to different field. Like this example that I gave with GeoData, the serverless function that does the GeoData extraction only updates the GeoData fields within the records. And maybe TensorFlow updates information about the image in a different location in the record or, potentially, a different record. So you have to have that, along with transaction safety, along with security. We have very tight security at the field level, identity level. So that's re-thinking the entire architecture. And I think what many of the companies you'll see at the show, they'll say, okay, Hadoop is given, let's build some sort of convenience tools around it, let's do some scripting, let's do automation. But serve the underlying thing, I won't use dirty words, but is not well-equipped to the new challenges of real time. We basically restructured everything, we took the notions of cloud-native architectures, we took the notions of Flash and latest Flash technologies, a lot of parallelism on CPUs. We didn't take anything for granted on the underlying architecture. >> So when you found the company, take a personal story here. What was the itch you were scratching, why did you get into this? Obviously, you have a huge tech advantage, which is, will double-down with the research piece and George will have some questions. What got you going with the company? You got a unique approach, people would love to do away with the pipeline, that sounds great. And the performance, you said about 100x. So how did you get here? (laughs) Tell the story. >> So if you know my background, I ran all the data center activities in Mellanox, and you know Mellanox, I know Kevin was here. And my role was to take Mellanox technology, which is 100 gig networking and silicon, and fit it into the different applications. So I worked with SAP HANA, I worked with Teradata, I worked on Oracle Exadata, I work with all the cloud service providers on building their own object storage and NoSQL and other solutions. I also owned all the open source activities around Hadoop and Saf and all those projects, and my role was to fix many of those. If a customer says I don't need 100 gig, it's too fast for me, how do I? And my role was to convince him that yes, I can open up all the bottleneck all the way up to your stack so you can leverage those new technologies. And for that we basically sowed inefficiencies in those stacks. >> So you had a good purview of the marketplace. >> Yaron: Yes. >> You had open source on one hand, and then all the-- >> All the storage players, >> vendors, network. >> all the database players and all the cloud service providers were my customers. So you're a very unique point where you see the trajectory of cloud. Doing things totally different, and sometimes I see the trajectory of enterprise storage, SAN, NAS, you know, all Flash, all that, legacy technologies where cloud providers are all about object, key value, NoSQL. And you're trying to convince those guys that maybe they were going the wrong way. But it's pretty hard. >> Are they going the wrong way? >> I think they are going the wrong way. Everyone, for example, is running to do NVMe over Fabric now that's the new fashion. Okay, I did the first implementation of NVMe over Fabric, in my team at Mellanox. And I really loved it, at that time, but databases cannot run on top of storage area networks. Because there are serialization problems. Okay, if you use a storage area network, that mean that every node in the cluster have to go and serialize an operation against the shared media. And that's not how Google and Amazon works. >> There's a lot more databases out there too, and a lot more data sources. You've got the Edge. >> Yeah, but all the new databases, all the modern databases, they basically shared the data across the different nodes so there are no serialization problems. So that's why Oracle doesn't scale, or scale to 10 nodes at best, with a lot of RDMA as a back plane, to allow that. And that's why Amazon can scale to a thousand nodes, or Google-- >> That's the horizontally-scalable piece that's happening. >> Yeah, because, basically, the distribution has to move into the higher layers of the data, and not the lower layers of the data. And that's really the trajectory where the traditional legacy storage and system vendors are going, and we sort of followed the way the cloud guys went, just with our knowledge of the infrastructure, we sort of did it better than what the cloud guys did. 'Cause the cloud guys focused more on the higher levels of the implementation, the algorithms, the Paxos, and all that. Their implementation is not that efficient. And we did both sides extremely efficient. >> How about the Edge? 'Cause Edge is now part of cloud, and you got cloud has got the compute, all the benefits, you were saying, and still they have their own consumption opportunities and challenges that everyone else does. But Edge is now exploding. The combination of those things coming together, at the intersection of that is deep learning, machine learning, which is powering the AI hype. So how is the Edge factoring into your plan and overall architectures for the cloud? >> Yeah, so I wrote a bunch of posts that are not published yet about the Edge, But my analysis along with your analysis and Pierre Levin's analysis, is that cloud have to start distribute more. Because if you're looking at the trends. Five gig, 5G Wi-Fi in wireless networking is going to be gigabit traffic. Gigabit to the homes, they're going to buy Google, 70 bucks a month. It's going to push a lot more bend with the Edge. On the same time, a cloud provider, is in order to lower costs and deal with energy problems they're going to rural areas. The traditional way we solve cloud problems was to put CDNs, so every time you download a picture or video, you got to a CDN. When you go to Netflix, you don't really go to Amazon, you got to a Netflix pop, one of 250 locations. The new work loads are different because they're no longer pictures that need to be cashed. First, there are a lot of data going up. Sensory data, upload files, et cetera. Data is becoming a lot more structured. Censored data is structured. All this car information will be structured. And you want to (mumbles) digest or summarize the data. So you need technologies like machine learning, NNI and all those things. You need something which is like CDNs. Just mini version of cloud that sits somewhere in between the Edge and the cloud. And this is our approach. And now because we can string grab the mini cloud, the mini Amazon in a way more dense approach, then this is a play that we're going to take. We have a very good partnership with Equinox. Which has 170 something locations with very good relations. >> So you're, essentially, going to disrupt the CDN. It's something that I've been writing about and tweeting about. CDNs were based on the old Yahoo days. Cashing images, you mentioned, give me 1999 back, please. That's old school, today's standards. So it's a whole new architecture because of how things are stored. >> You have to be a lot more distributive. >> What is the architecture? >> In our innovation, we have two layers of innovation. One is on the lower layers of, we, actually, have three main innovations. One is on the lower layers of what we discussed. The other one is the security layer, where we classify everything. Layer seven at 100 gig graphic rates. And the third one is all this notion of distributed system. We can, actually, run multiple systems in multiple locations and manage them as one logical entity through high level semantics, high level policies. >> Okay, so when we take the CUBE global, we're going to have you guys on every pop. This is a legit question. >> No it's going to take time for us. We're not going to do everything in one day and we're starting with the local problems. >> Yeah but this is digital transmissions. Stay with me for a second. Stay with this scenario. So video like Netflix is, pretty much, one dimension, it's video. They use CDNs now but when you start thinking in different content types. So, I'm going to have a video with, maybe, just CGI overlayed or social graph data coming in from tweets at the same time with Instagram pictures. I might be accessing multiple data everywhere to watch a movie or something. That would require beyond a CDN thinking. >> And you have to run continuous analytics because it can not afford batch. It can not afford a pipeline. Because you ingest picture data, you may need to add some subtext with the data and feed it, directly, to the consumer. So you have to move to those two elements of moving more stuff into the Edge and running into continuous analytics versus a batch on pipeline. >> So you think, based on that scenario I just said, that there's going to be an opportunity for somebody to take over the media landscape for sure? >> Yeah, I think if you're also looking at the statistics. I seen a nice article. I told George about it. That analyzing the Intel cheap distribution. What you see is that there is a 30% growth on Intel's cheap Intel Cloud which is faster than what most analysts anticipate in terms of cloud growth. That means, actually, that cloud is going to cannibalize Enterprise faster than what most think. Enterprise is shrinking about 7%. There is another place which is growing. It's Telcos. It's not growing like cloud but part of it is because of this move towards the Edge and the move of Telcos buying white boxes. >> And 5G and access over the top too. >> Yeah but that's server chips. >> Okay. >> There's going to be more and more computation in the different Telco locations. >> John: Oh you're talking about computer, okay. >> This is an opportunity that we can capitalize on if we run fast enough. >> It sounds as though because you've implemented these industry standard APIs that come from the, largely, the open source ecosystem, that you can propagate those to areas on the network that the vendors, who are behind those APIs can't, necessarily, do. Into the Telcos, towards the Edge. And, I assume, part of that is cause of the density and the simplicity. So, essentially, your footprint's smaller in terms of hardware and the operational simplicity is greater. Is that a fair assessment? >> Yes and also, we support a lot of Amazon compatible APIs which are RESTful, typically, HTTP based. Very convenient to work with in a cloud environment. Another thing is, because we're taking all the state on ourself, the different forms of states whether it's a message queue or a table or an object, et cetera, that makes the computation layer very simple. So one of the things that we are, also, demonstrating is the integration we have with Kubernetes that, basically, now simplifies Kubernetes. Cause you don't have to build all those different data services for cloud native infrastructure. You just run Kubernetes. We're the volume driver, we're the database, we're the message queues, we're everything underneath Kubernetes and then, you just run Spark or TensorFlow or a serverless function as a Kubernetes micro service. That allows you now, elastically, to increase the number of Spark jobs that you need or, maybe, you have another tenant. You just spun a Spark job. YARN has some of those attributes but YARN is very limited, very confined to the Hadoop Ecosystem. TensorFlow is not a Hadoop player and a bunch of those new tools are not in Hadoop players and everyone is now adopting a new way of doing streaming and they just call it serverless. serverless and streaming are very similar technologies. The advantage of serverless is all this pre-packaging and all this automation of the CICD. The continuous integration, the continuous development. So we're thinking, in order to simplify the developer in an operation aspects, we're trying to integrate more and more with cloud native approach around CICD and integration with Kubernetes and cloud native technologies. >> Would it be fair to say that from a developer or admin point of view, you're pushing out from the cloud towards the Edge faster than if the existing implementations say, the Apache Ecosystem or the AWS Ecosystem where AWS has something on the edge. I forgot whether it's Snowball or Green Grass or whatever. Where they at least get the lambda function. >> They're field by the way and it's interesting to see. One of the things they allowed lambda functions in their CDS which is going the direction I mentioned just for a minimal functionality. Another thing is they have those boxes where they have a single VM and they can run lambda function as well. But I think their ability to run computation is very limited and also, their focus is on shipping the boxes through mail and we want it to be always connected. >> Our final question for you, just to get your thoughts. Great save up, by the way. This is very informative. Maybe be should do a follow up on Skype in our studio for Silocon Friday show. Google Next was interesting. They're serious about the Enterprise but you can see that they're not yet there. What is the Enterprise readiness from your perspective? Cause Google has the tech and they try to flaunt the tech. We're great, we're Google, look at us, therefore, you should buy us. It's not that easy in the Enterprise. How would you size up the different players? Because they're all not like Amazon although Amazon is winning. You got Amazon, Azure and Google. Your thoughts on the cloud players. >> The way we attack Enterprise, we don't attack it from an Enterprise perspective or IT perspective, we take it from a business use case perspective. Especially, because we're small and we have to run fast. You need to identify a real critical business problem. We're working with stock exchanges and they have a lot of issues around monitoring the daily trade activities in real time. If you compare what we do with them on this continuous analytics notion to how they work with Excel's and Hadoops, it's totally different and now, they could do things which are way different. I think that one of the things that Hadook's customer, if Google wants to succeed against Amazon, they have to find the way of how to approach those business owners and say here's a problem Mr. Customer, here's a business challenge, here's what I'm going to solve. If they're just going to say, you know what? My VM's are cheaper than Amazon, it's not going to be a-- >> Also, they're doing the whole, they're calling lift and shift which is code word for rip and replace in the Enterprise. So that's, essentially, I guess, a good opportunity if you can get people to do that but not everyone's ripping and replacing and lifting and shifting. >> But a lot of Google advantages around areas of AI and things like that. So they should try and leverage, if you think about Amazon approach to AI, this fund the university to build a project and then set it's hours where Google created TensorFlow and created a lot of other IPs and Dataflow and all those solutions and consumered it to the community. I really love Google's approach of contributing Kubernetes, to contributing TensorFlow. And this way, they're planting the seeds so the new generation this is going to work with Kubernetes and TensorFlow who are going to say, "You know what?" "Why would I mess with this thing on (mumbles) just go and. >> Regular cloud, do multi-cloud. >> Right to the cloud. But I think a lot of criticism about Google is that they're too research oriented. They don't know how to monetize and approach the-- >> Enterprise is just a whole different drum beat and I think that's the only thing on my complaint with them, they got to get that knowledge and/or buy companies. Have a quick final point on Spanner or any analysis of Spanner that went from paper, pretty quickly, from paper to product. >> So before we started iguazio, I started Spanner quite a bit. All the publication was there and all the other things like Spanner. Spanner has the underlying layer called Colossus. And our data layer is very similar to how Colossus works. So we're very familiar. We took a lot of concepts from Spanner on our platform. >> And you like Spanner, it's legit? >> Yes, again. >> Cause you copied it. (laughs) >> Yaron: We haven't copied-- >> You borrowed some best practices. >> I think I cited about 300 research papers before we did the architecture. But we, basically, took the best of each one of them. Cause there's still a lot of issues. Most of those technologies, by the way, are designed for mechanical disks and we can talk about it in a different-- >> And you have Flash. Alright, Yaron, we have gone over here. Great segment. We're here, live in Silicon Valley, breakin it down, getting under the hood. Looking a 10X, 100X performance advantages. Keep an eye on iguazio, they're looking like they got some great products. Check them out. This is the CUBE. I'm John Furrier with George Gilbert. We'll be back with more after this short break. (upbeat synthesizer music)

Published Date : Mar 14 2017

SUMMARY :

it's the CUBE, covering Big Welcome to the CUBE. to bring you into the Yaron: I like the about some of the amazing and it's more like a cloud service. And in the same time, So that's the key innovation, So that's the secret sauce, And the performance, you said about 100x. and fit it into the purview of the marketplace. and all the cloud service that's the new fashion. You've got the Edge. Yeah, but all the new databases, That's the horizontally-scalable and not the lower layers of the data. So how is the Edge digest or summarize the data. going to disrupt the CDN. One is on the lower layers of, we're going to have you guys on every pop. the local problems. So, I'm going to have a video with, maybe, of moving more stuff into the Edge and the move of Telcos buying white boxes. in the different Telco locations. John: Oh you're talking This is an opportunity that we and the operational simplicity is greater. is the integration we have with Kubernetes the Apache Ecosystem or the AWS Ecosystem One of the things they It's not that easy in the Enterprise. to say, you know what? and replace in the Enterprise. and consumered it to the community. Right to the cloud. that's the only thing and all the other things like Spanner. Cause you copied it. and we can talk about it in a different-- This is the CUBE.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
George GilbertPERSON

0.99+

GeorgePERSON

0.99+

AmazonORGANIZATION

0.99+

TelcosORGANIZATION

0.99+

Yaron HavivPERSON

0.99+

GoogleORGANIZATION

0.99+

EquinoxORGANIZATION

0.99+

JohnPERSON

0.99+

MellanoxORGANIZATION

0.99+

AWSORGANIZATION

0.99+

TelcoORGANIZATION

0.99+

KevinPERSON

0.99+

Dave AlantePERSON

0.99+

George GilbertPERSON

0.99+

YaronPERSON

0.99+

Silicon ValleyLOCATION

0.99+

Pierre LevinPERSON

0.99+

100 gigQUANTITY

0.99+

AngularJSTITLE

0.99+

San Jose, CaliforniaLOCATION

0.99+

30%QUANTITY

0.99+

John FurrierPERSON

0.99+

OneQUANTITY

0.99+

two examplesQUANTITY

0.99+

FirstQUANTITY

0.99+

third oneQUANTITY

0.99+

SkypeORGANIZATION

0.99+

one dayQUANTITY

0.99+

NetflixORGANIZATION

0.99+

10 gigabyteQUANTITY

0.99+

TeradataORGANIZATION

0.99+

two elementsQUANTITY

0.99+

CUBEORGANIZATION

0.99+

SpannerTITLE

0.99+

OracleORGANIZATION

0.99+

S3TITLE

0.99+

firstQUANTITY

0.99+

1999DATE

0.98+

two layersQUANTITY

0.98+

ExcelTITLE

0.98+

both sidesQUANTITY

0.98+

SparkTITLE

0.98+

Five gigQUANTITY

0.98+

KubernetesTITLE

0.98+

PaxosORGANIZATION

0.98+

IntelORGANIZATION

0.98+

100XQUANTITY

0.98+

AzureORGANIZATION

0.98+

ColossusTITLE

0.98+

about 7%QUANTITY

0.98+

YahooORGANIZATION

0.98+

HadoopTITLE

0.97+

Boston BomberORGANIZATION

0.97+

Arun Murthy, Hortonworks - Spark Summit East 2017 - #SparkSummit - #theCUBE


 

>> [Announcer] Live, from Boston, Massachusetts, it's the Cube, covering Spark Summit East 2017, brought to you by Data Breaks. Now, your host, Dave Alante and George Gilbert. >> Welcome back to snowy Boston everybody, this is The Cube, the leader in live tech coverage. Arun Murthy is here, he's the founder and vice president of engineering at Horton Works, father of YARN, can I call you that, godfather of YARN, is that fair, or? (laughs) Anyway. He's so, so modest. Welcome back to the Cube, it's great to see you. >> Pleasure to have you. >> Coming off the big keynote, (laughs) you ended the session this morning, so that was great. Glad you made it in to Boston, and uh, lot of talk about security and governance, you know we've been talking about that years, it feels like it's truly starting to come into the main stream Arun, so. >> Well I think it's just a reflection of what customers are doing with the tech now. Now, three, four years ago, a lot of it was pilots, a lot of it was, you know, people playing with the tech. But increasingly, it's about, you know, people actually applying stuff in production, having data, system of record, running workloads both on prem and on the cloud, cloud is sort of becoming more and more real at mainstream enterprises. So a lot of it means, as you take any of the examples today any interesting app will have some sort of real time data feed, it's probably coming out from a cell phone or sensor which means that data is actually not, in most cases not coming on prem, it's actually getting collected in a local cloud somewhere, it's just more cost effective, why would we put up 25 data centers if you don't have to, right? So then you got to connect that data, production data you have or customer data you have or data you might have purchased and then join them up, run some interesting analytics, do geobased real time threat detection, cyber security. A lot of it means that you need a common way to secure data, govern it, and that's where we see the action, I think it's a really good sign for the market and for the community that people are pushing on these dimensions of the broader, because, getting pushed in this dimension because it means that people are actually using it for real production work loads. >> Well in the early days of Hadoop you really didn't talk that much about cloud. >> Yeah. >> You know, and now, >> Absolutely. >> It's like, you know, duh, cloud. >> Yeah. >> It's everywhere, and of course the whole hybrid cloud thing comes into play, what are you seeing there, what are things you can do in a hybrid, you know, or on prem that you can't do in a public cloud and what's the dynamic look like? >> Well, it's definitely not an either or, right? So what we're seeing is increasingly interesting apps need data which are born in the cloud and they'll stay in the cloud, but they also need transactional data which stays on prem, you might have an EDW for example, right? >> Right. >> There's not a lot of, you know, people want to solve business problems and not just move data from one place to another, right? Or back from one place to another, so it's not interesting to move an EDW to the cloud, and similarly it's not interesting to bring your IOT data or sensor data back into on-prem, right? Just makes sense. So naturally what happens is, you know, at Hortonworks we talk of kinds of modern app or a modern data app, which means a modern data app has to spare, has to sort of, you know, it can pass both on-prem data and cloud data. >> Yeah, you talked about that in your keynote years ago. Furio said that the data is the new development kit. And now you're seeing the apps are just so dang rich, >> Exactly, exactly. >> And they have to span >> Absolutely. >> physical locations, >> Yeah. >> But then this whole thing of IOT comes up, we've been having a conversation on The Cube, last several Cubes of, okay, how much stays out, how much stays in, there's a lot of debates about that, there's reasons not to bring it in, but you talked today about some of the important stuff will come back. >> Yeah. >> So the way this is, this all is going to be, you know, there's a lot of data that should be born in the cloud and stay there, the IOT data, but then what will happen increasingly is, key summaries of the data will move back and forth, so key summaries of your EDW will move to the cloud, sometimes key summaries of your IOT data, you know, you want to do some sort of historical training in analytics, that will come back on-prem, so I think there's a bi-directional data movement, but it just won't be all the data, right? It'll be key interesting summaries of the data but not all of it. >> And a lot of times, people say well it doesn't matter where it lives, cloud should be an operating model, not a place where you put data or applications, and while that's true and we would agree with that, from a customer standpoint it matters in terms of performance and latency issues and cost and regulation, >> And security and governance. >> Yeah. >> Absolutely. >> You need to think those things through. >> Exactly, so I mean, so that's what we're focused on, to make sure that you have a common security and governance model regardless of where data is, so you can think of it as, infrastructure you own and infrastructure you lease. >> Right. >> Right? Now, the details matter of course, when you go to the cloud you lose S3 for example or ADLS from Microsoft, but you got to make sure that there's a common sort of security governance front and top of it, in front of it, as an example one of the things that, you know, in the open source community, Ranger's a really sort of key project right now from a security authorization and authentication standpoint. We've done a lot of work with our friends at Microsoft to make sure, you can actually now manage data in Wasabi which is their object store, data stream, natively with Ranger, so you can set a policy that says only Dave can access these files, you know, George can access these columns, that sort of stuff is natively done on the Microsoft platform thanks to the relationship we have with them. >> Right. >> So that's actually really interesting for the open source communities. So you've talked about sort of commodity storage at the bottom layer and even if they're different sort of interfaces and implementations, it's still commodity storage, and now what's really helpful to customers is that they have a common security model, >> Exactly. >> Authorization, authentication, >> Authentication, lineage prominence, >> Oh okay. >> You want to make sure all of these are common sources across. >> But you've mentioned off of the different data patterns, like the stuff that might be streaming in on the cloud, what, assuming you're not putting it into just a file system or an object store, and you want to sort of merge it with >> Yeah. >> Historical data, so what are some of the data stores other than the file system, in other words, newfangled databases to manage this sort of interaction? >> So I think what you're saying is, we certainly have the raw data, the raw data is going to line up in whatever cloud native storage, >> Yeah. >> It's going to be Amazon, Wasabi, ADLS, Google Storage. But then increasingly you want, so now the patterns change so you have raw data, you have some sort of an ETL process, what's interesting in the cloud is that even the process data or, if you take the unstructured raw data and structure it, that structured data also needs to live on the cloud platform, right? The reason that's important is because A, it's cheaper to use the native platform rather than set up your own database on top of it. The other one is you also want to take advantage of all the native sources that the cloud storage provides, so for example, linking your application. So automatically data in Wasabi, you know, if you can set up a policy and easily say this structured data stable that I have of which is a summary of all the IOT activity in the last 24 hours, you can, using the cloud provider's technologies you can actually make it show up easily in Europe, like you don't have to do any work, right? So increasingly what we Hortonworks focused a lot on is to make sure that we, all of the computer engines, whether it's Spark or Hive or, you know, or MapReduce, it doesn't really matter, they're all natively working on the cloud provider's storage platform. >> [George] Okay. >> Right, so, >> Okay. >> That's a really key consideration for us. >> And the follow up to that, you know, there's a bit of a misconception that Spark replaces Hadoop, but it actually can be a processing, a compute engine for, >> Yeah. >> That can compliment or replace some of the compute engines in Hadoop, help us frame, how you talk about it with your customers. >> For us it's really simple, like in the past, the only option you had on Hadoop to do any computation was MapReduce, that was, I started working in MapReduce 11 years ago, so as you can imagine, it's a pretty good run for any technology, right? Spark is definitely the interesting sort of engine for sort of the, anything from mission learning to ETL for data on top of Hadoop. But again, what we focus a lot on is to make sure that every time we bring in, so right now, when we started on HTP, the first on HTP had about nine open source projects literally just nine. Today, the last one we shipped was 2.5, HTP 2.5 had about 27 I think, like it's a huge sort of explosion, right? But the problem with that is not just that we have 27 projects, the problem is that you're going to make sure each of the 27 work with all the 26 others. >> It's a QA nightmare. >> Exactly. So that integration is really key, so same thing with Spark, we want to make sure you have security and YARN (mumbles), like you saw in the demo today, you can now run Spark SQL but also make sure you get low level (mumbles) masking, all of the enterprise capabilities that you need, and I was at a financial services three or four weeks ago in Chicago. Today, to do equivalent of what I showed today on demo, they need literally, they have a classic ADW, and they have to maintain anywhere between 1500 to 2500 views of the same database, that's a nightmare as you can imagine. Now the fact that you can do this on the raw data using whether it's Hive or Spark or Peg or MapReduce, it doesn't really matter, it's really key, and that's the thing we push to make sure things like YARN security work across all the stacks, all the open source techs. >> So that makes life better, a simplification use case if you will, >> Yeah. >> What are some of the other use cases that you're seeing things like Spark enable? >> Machine learning is a really big one. Increasingly, every product is going to have some, people call it, machine learning and AI and deep learning, there's a lot of techniques out there, but the key part is you want to build a predictive model, in the past (mumbles) everybody want to build a model and score what's happening in the real world against model, but equally important make sure the model gets updated as more data comes in on and actually as the model scores does get smaller over time. So that's something we see all over, so for example, even within our own product, it's not just us enabling this for the customer, for example at Hortonworks we have a product called SmartSense which allows you to optimize how people use Hadoop. Where the, what are the opportunities for you to explore deficiencies within your own Hadoop system, whether it's Spark or Hive, right? So we now put mesh learning into SmartSense. And show you that customers who are running queries like you are running, Mr. Customer X, other customers like you are tuning Hadoop this way, they're running this sort of config, they're using these sort of features in Hadoop. That allows us to actually make the product itself better all the way down the pipe. >> So you're improving the scoring algorithm or you're sort of replacing it with something better? >> What we're doing there is just helping them optimize their Hadoop deploys. >> Yep. >> Right? You know, configuration and tuning and kernel settings and network settings, we do that automatically with SmartSense. >> But the customer, you talked about scoring and trying to, >> Yeah. >> They're tuning that, improving that and increasing the probability of it's accuracy, or is it? >> It's both. >> Okay. >> So the thing is what they do is, you initially come with a hypothesis, you have some amount of data, right? I'm a big believer that over time, more data, you're better off spending more, getting more data into the system than to tune that algorithm financially, right? >> Interesting, okay. >> Right, so you know, for example, you know, talk to any of the big guys on Facebook because they'll do the same, what they'll say is it's much better to get, to spend your time getting 10x data to the system and improving the model rather than spending 10x the time and improving the model itself on day one. >> Yeah, but that's a key choice, because you got to >> Exactly. >> Spend money on doing either, >> One of them. >> And you're saying go for the data. >> Go for the data. >> At least now. >> Yeah, go for data, what happens is the good part of that is it's not just the model, it's the, what you got to really get through is the entire end to end flow. >> Yeah. >> All the way from data aggregation to ingestion to collection to scoring, all that aspect, you're better off sort of walking through the paces like building the entire end to end product rather than spending time in a silo trying to make a lot of change. >> We've talked to a lot of machine learning tool vendors, application vendors, and it seems like we got to the point with Big Data where we put it in a repository then we started doing better at curating it and understanding it then starting to do a little bit exploration with business intelligence, but with machine learning, we don't have something that does this end to end, you know, from acquiring the data, building the model to operationalizing it, where are we on that, who should we look to for that? >> It's definitely very early, I mean if you look at, even the EDW space, for example, what is EDW? EDW is ingestion, ETL, and then sort of fast query layer, Olap BI, on and on and on, right? So that's the full EDW flow, I don't think as a market, I mean, it's really early in this space, not only as an overall industry, we have that end to end sort of industrialized design concept, it's going to take time, but a lot of people are ahead, you know, the Google's a world ahead, over time a lot of people will catch up. >> We got to go, I wish we had more time, I had so many other questions for you but I know time is tight in our schedule, so thanks so much Arun, >> Appreciate it. For coming on, appreciate it, alright, keep right there everybody, we'll be back with our next guest, it's The Cube, we're live from Spark Summit East in Boston, right back. (upbeat music)

Published Date : Feb 9 2017

SUMMARY :

brought to you by Data Breaks. father of YARN, can I call you that, Glad you made it in to Boston, So a lot of it means, as you take any of the examples today you really didn't talk that has to sort of, you know, it can pass both on-prem data Yeah, you talked about that in your keynote years ago. but you talked today about some of the important stuff So the way this is, this all is going to be, you know, And security and You need to think those so that's what we're focused on, to make sure that you have as an example one of the things that, you know, in the open So that's actually really interesting for the open source You want to make sure all of these are common sources in the last 24 hours, you can, using the cloud provider's in Hadoop, help us frame, how you talk about it with like in the past, the only option you had on Hadoop all of the enterprise capabilities that you need, Where the, what are the opportunities for you to explore What we're doing there is just helping them optimize and network settings, we do that automatically for example, you know, talk to any of the big guys is it's not just the model, it's the, what you got to really like building the entire end to end product rather than but a lot of people are ahead, you know, the Google's everybody, we'll be back with our next guest, it's The Cube,

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavePERSON

0.99+

George GilbertPERSON

0.99+

Dave AlantePERSON

0.99+

Arun MurthyPERSON

0.99+

EuropeLOCATION

0.99+

MicrosoftORGANIZATION

0.99+

10xQUANTITY

0.99+

BostonLOCATION

0.99+

ChicagoLOCATION

0.99+

AmazonORGANIZATION

0.99+

GeorgePERSON

0.99+

ArunPERSON

0.99+

WasabiORGANIZATION

0.99+

25 data centersQUANTITY

0.99+

TodayDATE

0.99+

HadoopTITLE

0.99+

WasabiLOCATION

0.99+

YARNORGANIZATION

0.99+

FacebookORGANIZATION

0.99+

ADLSORGANIZATION

0.99+

HortonworksORGANIZATION

0.99+

Horton WorksORGANIZATION

0.99+

todayDATE

0.99+

Data BreaksORGANIZATION

0.99+

1500QUANTITY

0.98+

SmartSenseTITLE

0.98+

S3TITLE

0.98+

Boston, MassachusettsLOCATION

0.98+

OneQUANTITY

0.98+

27 projectsQUANTITY

0.98+

threeDATE

0.98+

GoogleORGANIZATION

0.98+

FurioPERSON

0.98+

SparkTITLE

0.98+

2500 viewsQUANTITY

0.98+

firstQUANTITY

0.97+

Spark Summit EastLOCATION

0.97+

bothQUANTITY

0.97+

Spark SQLTITLE

0.97+

Google StorageORGANIZATION

0.97+

26QUANTITY

0.96+

RangerORGANIZATION

0.96+

four weeks agoDATE

0.95+

oneQUANTITY

0.94+

eachQUANTITY

0.94+

four years agoDATE

0.94+

11 years agoDATE

0.93+

27 workQUANTITY

0.9+

MapReduceTITLE

0.89+

HiveTITLE

0.89+

this morningDATE

0.88+

EDWTITLE

0.88+

about nine open sourceQUANTITY

0.88+

day oneQUANTITY

0.87+

nineQUANTITY

0.86+

yearsDATE

0.84+

OlapTITLE

0.83+

CubeORGANIZATION

0.81+

a lot of dataQUANTITY

0.8+

Joel Horwitz, IBM & David Richards, WANdisco - Hadoop Summit 2016 San Jose - #theCUBE


 

>> Narrator: From San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016. Brought to you by Hortonworks. Here's your host, John Furrier. >> Welcome back everyone. We are here live in Silicon Valley at Hadoop Summit 2016, actually San Jose. This is theCUBE, our flagship program. We go out to the events and extract the signal to the noise. Our next guest, David Richards, CEO of WANdisco. And Joel Horowitz, strategy and business development, IBM analyst. Guys, welcome back to theCUBE. Good to see you guys. >> Thank you for having us. >> It's great to be here, John. >> Give us the update on WANdisco. What's the relationship with IBM and WANdisco? 'Cause, you know. I can just almost see it, but I'm not going to predict. Just tell us. >> Okay, so, I think the last time we were on theCUBE, I was sitting with Re-ti-co who works very closely with Joe. And we began to talk about how our partnership was evolving. And of course, we were negotiating an OEM deal back then, so we really couldn't talk about it very much. But this week, I'm delighted to say that we announced, I think it's called IBM Big Replicate? >> Joel: Big Replicate, yeah. We have a big everything and Replicate's the latest edition. >> So it's going really well. It's OEM'd into IBM's analytics, big data products, and cloud products. >> Yeah, I'm smiling and smirking because we've had so many conversations, David, on theCUBE with you on and following your business through the bumpy road or the wild seas of big data. And it's been a really interesting tossing and turning of the industry. I mean, Joel, we've talked about it too. The innovation around Hadoop and then the massive slowdown and realization that cloud is now on top of it. The consumerization of the enterprise created a little shift in the value proposition, and then a massive rush to build enterprise grade, right? And you guys had that enterprise grade piece of it. IBM, certainly you're enterprise grade. You have enterprise everywhere. But the ecosystem had to evolve really fast. What happened? Share with the audience this shift. >> So, it's classic product adoption lifecycle and the buying audience has changed over that time continuum. In the very early days when we first started talking more at these events, when we were talking about Hadoop, we all really cared about whether it was Pig and Hive. >> You once had a distribution. That's a throwback. Today's Thursday, we'll do that tomorrow. >> And the buying audience has changed, and consequently, the companies involved in the ecosystem have changed. So where we once used to really care about all of those different components, we don't really care about the machinations below the application layer anymore. Some people do, yes, but by and large, we don't. And that's why cloud for example is so successful because you press a button, and it's there. And that, I think, is where the market is going to very, very quickly. So, it makes perfect sense for a company like WANdisco who've got 20, 30, 40, 50 sales people to move to a company like IBM that have 4 or 5,000 people selling our analytics products. >> Yeah, and so this is an OEM deal. Let's just get that news on the table. So, you're an OEM. IBM's going to OEM their product and brand it IBM, Big Replication? >> Yeah, it's part of our Big Insights Portfolio. We've done a great job at growing this product line over the last few years, with last year talking about how we decoupled all the value-as from the core distribution. So I'm happy to say that we're both part of the ODPI. It's an ODPI-certified distribution. That is Hadoop that we offer today for free. But then we've been adding not just in terms of the data management capabilities, but the partnership here that we're announcing with WANdisco and how we branded it as Big Replicate is squarely aimed at the data management market today. But where we're headed, as David points out, is really much bigger, right? We're talking about support for not only distributed storage and data, but we're also talking about a hybrid offering that will get you to the cloud faster. So not only does Big Replicate work with HDFS, it also works with the Swift objects store, which as you know, kind of the underlying storage for our cloud offering. So what we're hoping to see from this great partnership is as you see around you, Hadoop is a great market. But there's a lot more here when you talk about managing data that you need to consider. And I think hybrid is becoming a lot larger of a story than simply distributing your processing and your storage. It's becoming a lot more about okay, how do you offset different regions? How do you think through that there are multiple, I think there's this idea that there's one Hadoop cluster in an enterprise. I think that's factually wrong. I think what we're observing is that there's actually people who are spinning up, you know, multiple Hadoop distributions at the line of business for maybe a campaign or for maybe doing fraud detection, or maybe doing log file, whatever. And managing all those clusters, and they'll have Cloud Arrow. They'll have Hortonworks. They'll have IBM. They'll have all of these different distributions that they're having to deal with. And what we're offering is sanity. It's like give me sanity for how I can actually replicate that data. >> I love the name Big Replicate, fantastic. Big Insights, Big Replicate. And so go to market, you guys are going to have bigger sales force. It's a nice pop for you guys. I mean, it's good deal. >> We were just talking before we came on air about sort of a deal flow coming through. It's coming through, this potential deal flow coming through, which has been off the charts. I mean, obviously when you turn on the tap, and then suddenly you enable thousands and thousands of sales people to start selling your products. I mean, IBM, are doing a great job. And I think IBM are in a unique position where they own both cloud and on-prem. There are very few companies that own both the on-prem-- >> They're going to need to have that connection for the companies that are going hybrid. So hybrid cloud becomes interesting right now. >> Well, actually, it's, there's a theory that says okay, so, and we were just discussing this, the value of data lies in analytics, not in the data itself. It lies in you've been able to pull out information from that data. Most CIOs-- >> If you can get the data. >> If you can get the data. Let's assume that you've got the data. So then it becomes a question of, >> That's a big assumption. Yes, it is. (laughs) I just had Nancy Handling on about metadata. No, that's an issue. People have data they store they can't do anything with it. >> Exactly. And that's part of the problem because what you actually have to have is CPU slash processing power for an unknown amount of data any one moment in time. Now, that sounds like an elastic use case, and you can't do elastic on-prem. You can only do elastic in cloud. That means that virtually every distribution will have to be a hybrid distribution. IBM realized this years ago and began to build this hybrid infrastructure. We're going to help them to move data, completely consistent data, between on-prem and cloud, so when you query things in the cloud, it's exactly the same results and the correct results you get. >> And also the stability too on that. There's so many potential, as we've discussed in the past, that sounds simple and logical. To do an enterprise grade is pretty complex. And so it just gives a nice, stable enterprise grade component. >> I mean, the volumes of data that we're talking about here are just off the charts. >> Give me a use case of a customer that you guys are working with, or has there been any go-to-market activity or an ideal scenario that you guys see as a use case for this partnership? >> We're already seeing a whole bunch of things come through. >> What's the number one pattern that bubbles up to the top? Use case-wise. >> As Joel pointed out, that he doesn't believe that any one company just has one version of Hadoop behind their firewall. They have multiple vendors. >> 100% agree with that. >> So how do you create one, single cluster from all of those? >> John: That's one problem you solved. >> That's of course a very large problem. Second problem that we're seeing in spades is I have to move data to cloud to run analytics applications against it. That's huge. That required completely guaranteed consistent data between on-prem and cloud. And I think those two use cases alone account for pretty much every single company. >> I think there's even a third here. I think the third is actually, I think frankly there's a lot of inefficiencies in managing just HDFS and how many times you have to actually copy data. If I looked across, I think the standard right now is having like three copies. And actually, working with Big Replicate and WANdisco, you can actually have more assurances and actually have to make less copies across the cluster and actually across multiple clusters. If you think about that, you have three copies of the data sitting in this cluster. Likely, an analysts have a dragged a bunch of the same data in other clusters, so that's another multiple of three. So there's amount of waste in terms of the same data living across your enterprise. That I think there's a huge cost-savings component to this as well. >> Does this involve anything with Project Atlas at all? You guys are working with, >> Not yet, no. >> That project? It's interesting. We're seeing a lot of opening up the data, but all they're doing is creating versions of it. And so then it becomes version control of the data. You see a master or a centralization of data? Actually, not centralize, pull all the data in one spot, but why replicate it? Do you see that going on? I guess I'm not following the trend here. I can't see the mega trend going on. >> It's cloud. >> What's the big trend? >> The big trend is I need an elastic infrastructure. I can't build an elastic infrastructure on-premise. It doesn't make economic sense to build massive redundancy maybe three or four times the infrastructure I need on premise when I'm only going to use it maybe 10, 20% of the time. So the mega trend is cloud provides me with a completely economic, elastic infrastructure. In order to take advantage of that, I have to be able to move data, transactional data, data that changes all the time, into that cloud infrastructure and query it. That's the mega trend. It's as simple as that. >> So moving data around at the right time? >> And that's transaction. Anybody can say okay, press pause. Move the data, press play. >> So if I understand this correctly, and just, sorry, I'm a little slow. End of the day today. So instead of staging the data, you're moving data via the analytics engines. Is that what you're getting at? >> You use data that's being transformed. >> I think you're accessing data differently. I think today with Hadoop, you're accessing it maybe through like Flume or through Oozy, where you're building all these data pipelines that you have to manage. And I think that's obnoxious. I think really what you want is to use something like Apache Spark. Obviously, we've made a large investment in that earlier, actually, last year. To me, what I think I'm seeing is people who have very specific use cases. So, they want to do analysis for a particular campaign, and so they may just pull a bunch of data into memory from across their data environment. And that may be on the cloud. It may be from a third-party. It may be from a transactional system. It may be from anywhere. And that may be done in Hadoop. It may not, frankly. >> Yeah, this is the great point, and again, one of the themes on the show is, this is a question that's kind of been talked about in the hallways. And I'd love to hear your thoughts on this. Is there are some people saying that there's really no traction for Hadoop in the cloud. And that customers are saying, you know, it's not about just Hadoop in the cloud. I'm going to put in S3 or object store. >> You're right. I think-- >> Yeah, I'm right as in what? >> Every single-- >> There's no traction for Hadoop in the cloud? >> I'll tell you what customers tell us. Customers look at what they actually need from storage, and they compare whatever it is, Hadoop or any on-premise proprietor storage array and then look at what S3 and Swift and so on offer to them. And if you do a side-by-side comparison, there isn't really a difference between those two things. So I would argue that it's a fact that functionally, storage in cloud gives you all the functionality that any customer would need. And therefore, the relevance of Hadoop in cloud probably isn't there. >> I would add to that. So it really depends on how you define Hadoop. If you define Hadoop by the storage layer, then I would say for sure. Like HDFS versus an objects store, that's going to be a difficult one to find some sort of benefit there. But if you look at Hadoop, like I was talking to my friend Blake from Netflix, and I was asking him so I hear you guys are kind of like replatforming on Spark now. And he was basically telling me, well, sort of. I mean, they've invested a lot in Pig and Hive. So if you think it now about Hadoop as this broader ecosystem which you brought up Atlas, we talk about Ranger and Knox and all the stuff that keeps coming out, there's a lot of people who are still invested in the peripheral ecosystem around Hadoop as that central point. My argument would be that I think there's still going to be a place for distributed computing kind of projects. And now whether those will continue to interface through Yarn via and then down to HDFS, or whether that'll be Yarn on say an objects store or something and those projects will persist on their own. To me that's kind of more of how I think about the larger discussion around Hadoop. I think people have made a lot of investments in terms of that ecosystem around Hadoop, and that's something that they're going to have to think through. >> Yeah. And Hadoop wasn't really designed for cloud. It was designed for commodity servers, deployment with ease and at low cost. It wasn't designed for cloud-based applications. Storage in cloud was designed for storage in cloud. Right, that's with S3. That's what Swift and so on were designed specifically to do, and they fulfill most of those functions. But Joel's right, there will be companies that continue to use-- >> What's my whole argument? My whole argument is that why would you want to use Hadoop in the cloud when you can just do that? >> Correct. >> There's object store out. There's plenty of great storage opportunities in the cloud. They're mostly shoe-horning Hadoop, and I think that's, anyway. >> There are two classes of customers. There were customers that were born in the cloud, and they're not going to suddenly say, oh you know what, we need to build our own server infrastructure behind our own firewall 'cause they were born in the cloud. >> I'm going to ask you guys this question. You can choose to answer or not. Joel may not want to answer it 'cause he's from IBM and gets his wrist slapped. This is a question I got on DM. Hadoop ecosystem consolidation question. People are mailing in the questions. Now, keep sending me your questions if you don't want your name on it. Hold on, Hadoop system ecosystem. When will this start to happen? What is holding back the M and A? >> So, that's a great question. First of all, consolidation happens when you sort of reach that tipping point or leveling off, that inflection point where the market levels off, and we've reached market saturation. So there's no more market to go after. And the big guys like IBM and so on come in-- >> Or there was never a market to begin with. (laughs) >> I don't think that's the case, but yes, I see the point. Now, what's stopping that from happening today, and you're a naughty boy by the way for asking this question, is a lot of these companies are still very well funded. So while they still have cash on the balance sheet, of course, it's very, very hard for that to take place. >> You picked up my next question. But that's a good point. The VCs held back in 2009 after the crash of 2008. Sequoia's memo, you know, the good times role, or RIP good times. They stopped funding companies. Companies are getting funded, continually getting funding. Joel. >> So I don't think you can look at this market as like an isolated market like there's the Hadoop market and then there's a Spark market. And then even there's like an AI or cognitive market. I actually think this is all the same market. Machine learning would not be possible if you didn't have Hadoop, right? I wouldn't say it. It wouldn't have a resurgence that it has had. Mahout was one of the first machine learning languages that caught fire from Ted Dunning and others. And that kind of brought it back to life. And then Spark, I mean if you talk to-- >> John: I wouldn't say it creates it. Incubated. >> Incubated, right. >> And created that Renaissance-like experience. >> Yeah, deep learning, Some of those machine learning algorithms require you to have a distributed kind of framework to work in. And so I would argue that it's less of a consolidation, but it's more of an evolution of people going okay, there's distributed computing. Do I need to do that on-premise in this Hadoop ecosystem, or can I do that in the cloud, or in a growing Spark ecosystem? But I would argue there's other things happening. >> I would agree with you. I love both areas. My snarky comment there was never a market to begin with, what I'm saying there is that the monetization of commanding the hill that everyone's fighting for was just one of many hills in a bigger field of hills. And so, you could be in a cul-de-sac of being your own champion of no paying customers. >> What you have-- >> John: Or a free open-source product. >> Unlike the dotcom era where most of those companies were in the public markets, and you could actually see proper valuations, most of the companies, the unicorns now, most are not public. So the valuations are really difficult to, and the valuation metrics are hard to come by. There are only few of those companies that are in the public market. >> The cash story's right on. I think to Joel' point, it's easy to pivot in a market that's big and growing. Just 'cause you're in the wrong corner of the market pivoting or vectoring into the value is easier now than it was 10 years ago. Because, one, if you have a unicorn situation, you have cash on the bank. So they have a good flush cash. Your runway's so far out, you can still do your thing. If you're a startup, you can get time to value pretty quickly with the cloud. So again, I still think it's very healthy. In my opinion, I kind of think you guys have good analysis on that point. >> I think we're going to see some really cool stuff happen working together, and especially from what I'm seeing from IBM, in the fact that in the IT crowd, there is a behavioral change that's happening that Hadoop opened the door to. That we're starting to see more and more It professionals walk through. In the sense that, Hadoop has opened the door to not thinking of data as a liability, but actually thinking about data differently as an asset. And I think this is where this market does have an opportunity to continue to grow as long as we don't get carried away with trying to solve all of the old problems that we solved for on-premise data management. Like if we do that, then we're just, then there will be a consolidation. >> Metadata is a huge issue. I think that's going to be a big deal. And on the M and A, my feeling on the M and A is that, you got to buy something of value, so you either have revenue, which means customers, and or initial property. So, in a market of open source, it comes back down to the valuation question. If you're IBM or Oracle or HP, they can pivot too. And they can be agile. Now slower agile, but you know, they can literally throw some engineers at it. So if there's no customers in I and P, they can replicate, >> Exactly. >> That product. >> And we're seeing IBM do that. >> They don't know what they're buying. My whole point is if there's nothing to buy. >> I think it depends on, ultimately it depends on where we see people deriving value, and clearly in WANdisco, there's a huge amount of value that we're seeing our customers derive. So I think it comes down to that, and there is a lot of IP there, and there's a lot of IP in a lot of these companies. I think it's just a matter of widening their view, and I think WANdisco is probably the earliest to do this frankly. Was to recognize that for them to succeed, it couldn't just be about Hadoop. It actually had to expand to talk about cloud and talk about other data environments, right? >> Well, congratulations on the OEM deal. IBM, great name, Big Replicate. Love it, fantastic name. >> We're excited. >> It's a great product, and we've been following you guys for a long time, David. Great product, great energy. So I'm sure there's going to be a lot more deals coming on your. Good strategy is OEM strategy thing, huh? >> Oh yeah. >> It reduces sales cost. >> Gives us tremendous operational leverage. Getting 4,000, 5,000-- >> You get a great partner in IBM. They know the enterprise, great stuff. This is theCUBE bringing all the action here at Hadoop. IBM OEM deal with WANdisco all happening right here on theCUBE. Be back with more live coverage after this short break.

Published Date : Jul 1 2016

SUMMARY :

Brought to you by Hortonworks. extract the signal to the noise. What's the relationship And of course, we were Replicate's the latest edition. So it's going really well. The consumerization of the enterprise and the buying audience has changed That's a throwback. And the buying audience has changed, Let's just get that news on the table. of the data management capabilities, I love the name Big that own both the on-prem-- for the companies that are going hybrid. not in the data itself. If you can get the data. I just had Nancy Handling and the correct results you get. And also the stability too on that. I mean, the volumes of bunch of things come through. What's the number one pattern that any one company just has one version And I think those two use cases alone of the data sitting in this cluster. I guess I'm not following the trend here. data that changes all the time, Move the data, press play. So instead of staging the data, And that may be on the cloud. And that customers are saying, you know, I think-- Swift and so on offer to them. and all the stuff that keeps coming out, that continue to use-- opportunities in the cloud. and they're not going to suddenly say, What is holding back the M and A? And the big guys like market to begin with. hard for that to take place. after the crash of 2008. And that kind of brought it back to life. John: I wouldn't say it creates it. And created that or can I do that in the cloud, that the monetization that are in the public market. I think to Joel' point, it's easy to pivot And I think this is where this market I think that's going to be a big deal. there's nothing to buy. the earliest to do this frankly. Well, congratulations on the OEM deal. So I'm sure there's going to be Gives us tremendous They know the enterprise, great stuff.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
DavidPERSON

0.99+

JoelPERSON

0.99+

IBMORGANIZATION

0.99+

OracleORGANIZATION

0.99+

JoePERSON

0.99+

David RichardsPERSON

0.99+

Joel HorowitzPERSON

0.99+

2009DATE

0.99+

JohnPERSON

0.99+

4QUANTITY

0.99+

WANdiscoORGANIZATION

0.99+

John FurrierPERSON

0.99+

20QUANTITY

0.99+

San JoseLOCATION

0.99+

HPORGANIZATION

0.99+

thousandsQUANTITY

0.99+

Joel HorwitzPERSON

0.99+

Ted DunningPERSON

0.99+

Big ReplicateORGANIZATION

0.99+

last yearDATE

0.99+

Silicon ValleyLOCATION

0.99+

Big ReplicateORGANIZATION

0.99+

40QUANTITY

0.99+

30QUANTITY

0.99+

Silicon ValleyLOCATION

0.99+

thirdQUANTITY

0.99+

todayDATE

0.99+

HadoopTITLE

0.99+

San Jose, CaliforniaLOCATION

0.99+

threeQUANTITY

0.99+

two thingsQUANTITY

0.99+

2008DATE

0.99+

5,000 peopleQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

100%QUANTITY

0.99+

David RichardsPERSON

0.99+

BlakePERSON

0.99+

4,000, 5,000QUANTITY

0.99+

S3TITLE

0.99+

two classesQUANTITY

0.99+

tomorrowDATE

0.99+

Second problemQUANTITY

0.99+

both areasQUANTITY

0.99+

three copiesQUANTITY

0.99+

Hadoop Summit 2016EVENT

0.99+

SwiftTITLE

0.99+

bothQUANTITY

0.99+

Big InsightsORGANIZATION

0.99+

one problemQUANTITY

0.98+

TodayDATE

0.98+

Rob Bearden, Hortonworks - Executive On-the-Ground #theCUBE


 

>> Voiceover: On the Ground, presented by The Cube. Here's your host John Furrier. (techno music) >> Hello, everyone. Welcome to a special On the Ground executive interview with Rob Bearden, the CEO of Hortonworks. I'm John Furrier with The Cube. Rob, welcome to this On the Ground. >> Thank you. >> So I got to ask you, you're five years old this year, your company Hortonworks in June, have Hadoop Summit coming up, what a magical run. You guys went public. Give us a quick update on Hortonworks and what's going on. The five-year birthday, any special plans? >> Well, we're going to actually host the 10-year birthday party of Hadoop, which is you know, started at Yahoo! and open-source community. So everyone's invited. Hopefully you'll be able to make it as well. We've accomplished a lot in the last five years. We've grown to over 1000 employees, over 900 customers. This year is our first full year of being a public company, and the street has us at $265 million dollars in billings. So tremendous progress has happened and we've seen the entire data architecture begin to re-platform around Hadoop now. >> CEOs across the globe are facing profound challenges, data, cloud, mobile, obviously this digital transformation. What are you seeing our there as you talk to your customers? >> Well they view that the digital transformation is a massive opportunity for value creation, for that enterprise. And they realize that they can really shift their business models from being very reactive post-transaction to actually being able to consolidate all of the new paradigm data with the existing transaction data and actually get to a very pro-active model pre-transaction. And so they understand their customer's patterns. They understand the kinds of things that their customers want to buy before they ever engage in the procurement process. And they can make better and more compelling offers at better price points and be able to serve their customers better, and that's really the transformation that's happening and they realize the value of that creation between them and their customer. >> And one of the exciting things about The Cube is we go to all these different industry events and you were speaking last week at an event where data is at the center of the value proposition around digital transformation, and that's really been the key trend that we've been seeing consistently, that buzz word digital transformation. What does that mean to you? Because this is coming up over and over again around this digital platform, digital weathers, digital media or digital engagement. It's all around data. What's your thoughts and what is from your perspective digital transformation? >> Well, it's about being able to derive value from your data and be able to take that value back to your customers under your supply chain, and to be able to create a completely new engagement with how you're managing your interaction with your customers and your supply chain from the data that they're generating and the data that you have about them. >> When you talk to CEOs and people in the business out in the field, how much of this digital transformation do you see as real in terms of progress, real progress? In terms of total transitions, or is it just being talked about now? What's your progress bar meter? How would you peg this trend? >> I would say we're at four and I believe we'll be at six by the end of 2016. And it's one of the biggest movements I've seen since the '90s and ERP, because it's so transformational into the business model by being able to transform the data that we have about our collective entity and our collective customer and collective supply chain, and be able to apply predictive and real-time interactions against that data as events and occurrences are happening, and to be able to quickly offer products and services, and the velocity that that creates to modernization and the value creation back is at a pace that's never been able to happen. And they've really understood the importance of doing that or being disintermediated in their existing spaces. >> You mention ERP, it kind of shows our age, but I'll ask the question. Back in the '90s ERP, CRM, these were processes that were well known, that people automated with technology which was at that time unknown. You got a riser-client server technology, local area networking, TCP IP was emerging, so you got some unknown technology stuff happening, but known processes that were being automated and hence saw that boom. Now you mention today, it's interesting because Peter Burris at Wikibon's thesis says today the processes are unknown and the technology's known, so there's now a new dynamic. It's almost flipped upside-down where this digital transformation is exact opposite. IoT is a great use case where all these unknown things are coming into the enterprise that are value opportunities. Get the technology knows, so now the challenge is how to use technology, to deploy it, and be agile to capture and automate these future and/or real-time unknown processes. Your thoughts on that premise. >> The answers are buried in the data, is the great news, and so the technology as you said is there, and you have these new, unknown processes through Internet of Things, the new paradigm data sets with sensors and clickstream and mobile data. And the good news is they generate the data and we can apply technology to the data through AI and machine learning to really make sure that we understand how to transform the value out of that, out of those data sets. >> So how does IT deal with this? 'Cause going back 30 years IT was a clear line of sight, again, automating those known processes. Now you have unknown opportunities, but you have to be in a position for that. Call that cloud, call that DevOps, call that data driven, whatever the metaphor is. People are being agile, be ready for it. How is that different now and what is the future of data in that paradigm? And how does a customer come to grips and rationalize this notion of I need a clear line of sight of the value, not knowing what the processes is about data. What should they be doing? >> Well, we don't know the processes necessarily, per se, but we do know what the data is telling us because we can bring all that data under management. We can apply the right kind of algorithms, the right kind of tools on it, to give us the outcomes that we want and have the ability to monetize and unlock that value very quickly. >> Hortonworks architecture is kind of designed now at the last Hadoop Summit in Dublin. We heard about the platform. Your architecture's going beyond Hadoop, and it says Hadoop Summit and Hadoop was the key to big data. Going beyond Hadoop means other things. What does that mean for the customer? Because now they're seeing these challenges. How does Hortonworks describe that and what value do you bring to those customers? >> Big data was about data at rest and being able to drive the transformation that it has, being able to consolidate all the transactional platforms into central data architecture. Being able to bring all the new paradigm data sets to the mobile, the clickstream, the IoT data, and bring that together and be able to really transition from being reactive post-transaction to be able to be predictive and interactive pre-transaction. And that's a very, very powerful value proposition and you create a lot of value doing that, but what's really learned through that process is in the digital transformation journey, that actually the further upstream that we can get to engaging with the data, even if we can get to it at the point of origination at the furthest edge, at the point of center, at the actual time of clickstream and we can engage with that data as those events and occurrences are happening and we can process against those events as their happening, it creates higher levels of value. So from the Hortonworks platform we have the ability to manage data at rest with Hadoop, as well as data in motion with the Hortonworks data flow platform. And our view is that we must be able to engage with all the data all the time. And so we bring the platforms to bring data under management from the point of origination all the way through as it's in motion, and to the point it comes at rest and be able to aggregate those interactions through the entire process. >> It's interesting, you mention real-time, and one of the ideas of Hadoop was it was always going to be a data warehouse killer, 'cause it makes a lot of sense. You can store the data. It's unstructured data and you can blend in structured on top of that and build on top of that. Has that happened? And does real-time kind of change that equation? Because there's still a role for a data warehouse. If someone has an investment are they being modernized? Clear that up for me because I just can't kind of rationalize that yet. Data warehouses are old, the older ones, but they're not going away any time soon from what we're hearing. Your thoughts as Hadoop as the data warehouse killer. >> Yeah, well, our strategy from day one has never been to go in and disintermediate any of the existing platforms or any of the existing applications or services. In fact, to the contrary. What we wanted to do and have done from day one is be able to leverage Hadoop as an extension of those data platforms. The DW architecture has limitations to it in terms of how much data pragmatically and economically is really viable to go into the data warehouse. And so our model says let's bring more data under management as an extension to the existing data warehouses and give the existing data warehouses the ability to have a more holistic view of data. Now I think the next generation of evolution is happening right now and the enterprise is saying that's great. We're able to get more value longer from our existing data warehouse and tools investment by bringing more data under management, leveraging a combined architecture of Hadoop and data warehouse. But now they're trying to redefine really what does the data warehouse of the future look like, and it's really about how we make decisions, right? And at what point do we make decisions because in the world of DW today it assumes that data's aggregated post-transaction, right? In the new world of data architecture that's across the IT landscape, it says we want to engage with data from the point it's originated, and we want to be able to process and make decisions as events and as occurrences and as opportunities arise before that transaction potentially ever happens. And so the data warehouse of the future is much different in terms of how and when a decision's made and when that data's processed. And in many cases it's pre-transaction versus post-transaction. >> Well also I would just add, and I want to get your thoughts on this, real-time, 'cause now in the moment at the transaction we now have cloud resources and potentially other resources that could become available. Why even go to the data warehouses? So how has real-time changed the game? 'Cause data in motion kind of implies real-time whether it's IoT or some sort of bank transaction or something else. How has real-time changed the game? >> Well, it's at what point can we engage with the customer, but what it really has established is the data has to be able to be processed whether it be on Prim, in the cloud, or in a hybrid architecture. And we can't be constrained by where the data's processed. We need to be able to take the processing to the data versus having to wait for the data to come to the processing. And I think that's the very powerful part of cloud, the on Prim, and software to find networking, and when you bring all of those platforms together, you get the ability to have a very powerful and elastic processing capability at any point in the life cycle of the data. And we've never been able to put all those pieces together on an economically viable model. >> So I got to ask you, you guys are five years old in June, Hadoop's only 10 years old. Still young, still kind of in the early days, but yet you guys are public company. How are you guys looking at the growth strategy for you guys? 'Cause the trend is for people to go private. You guys went public. You're out in the open. Certainly your competitor Cloud ARIS is private, but people can get that they're kind of behind the curtain. Some say public with a $3 billion dollar graduation, but for the most part you're public. So the question is how are you guys going to sustain the growth? What is the growth strategy? What's your innovation strategy? >> Well if you look at the companies that are going private, those are the companies that are the older platforms, the older technologies, in a very mature market that have not been able to innovate those core platforms and they sort of reached their maturity cycle, and I think going private gives them the ability to do that innovation, maybe change their licensing model, the subscription, and make some of the transformations they need to make. I have no doubt they'll be very successful doing that. Our situation's much different. As the modern IT landscape is re-architecting itself almost across every layer. If you look at what's happening in the networking layer going to SDN. Certainly in our space with data and it's moving away from just transactional siloed environments to central data architectures and next generation data platforms. And being able to go all the way out to the edge and bring data under management through the entire movement cycle. We're in a market that we're able to innovate rapidly. Not only in terms of the architecture of the data platform being able to bring batch, real-time applications together simultaneously on a central data set and consolidate all of the data, but also then be able to move out and do the data in motion and be able to control an entire life cycle. There's a tremendous amount of innovation that's going to happen there, and these are significant growth markets. Both the data in motion and the data at rest market. The data at rest market's a $50 billion dollar marketplace. The data in motion market is a $1 trillion dollar TAM. So when you look at the massive opportunity to create value in these high growth markets, in the ability to innovate and create the next generation data platforms, there's a lot of room for growth and a lot of room for scale. And that's exactly why you should be public when you're going though these large growth markets in a space that's re-platforming, because the CIO wants to understand and have transparent visibility into their platform partners. They want to know how you're doing. Are you executing the plan? Or are you hiding behind a facade of one perception or another. >> Or pivoting or some sort of re-architecture. >> Right, so I think it's very appropriate in a high growth, high innovation market where the IT platforms are going through a re-architecture that you actually are public going through that growth phase. Now it forces discipline around how you operationalize the business and how you run the business, but I think that's very healthy for both the tech and the company. >> Michael Dell told me he wanted to go private mainly because he had to do some work essentially behind the curtain. Didn't want the 90-day shot clock, the demands of Wall Street. Other companies do it because the can't stand alone. They don't have a platform and they're constantly pivoting internally to try to grope and find that groove swing, if you will. You're saying that you guys have your groove swing and as Dave Velanti always says, always get behind a growing total adjustment market or TAM, you saying that. Okay, I buy that. So the TAM's growing. What are you guys doing on the platform side that's enabling your customers to re-platform and take advantage of their current data situation as well as the upcoming IoT boom that's being forecasted? >> Well, the first thing is the genesis of which we started the company around, which is we transformed Hadoop from being a batch architecture, single data set, single application, to being able to actually manage a central data architecture where all data comes under management and be able to drive and evolve from batch to batch interactive and real-time simultaneously over that central data set. And then making sure that it's truly an enterprise viable, enterprise ready platform to manage mission critical workloads at scale. And those are the areas where we're continuing to innovate around security, around data governance, around life cycle management, the operations and the management consoles. But then we want to expand the markets that we operate in and be world class and best tech on planet Earth for that data at rest and our core Hadoop business. But as we then see the opportunities to go out to the edge and from the point of origination truly manage and bring that data under management through its entire life cycle, through the movement process and create value. And so we want to continue to extend the reach of when we have data under management and the value we bring to the data through its entire life cycle. And then what's next is you have that data in its life cycle. You then move into the modern data applications, and if you look at what we've done with cyber security and some of the offerings that we've engaged in the cyber security space, that was our first entry. And that's proven to be a significant game changer for us and our customers both. >> Cyber security certainly a big data problem. Also a cloud opportunity with the horsepower you can get with computing. Give us the update. What are you seeing there from a traction standpoint? What's some of the level of engagements your having with enterprises outside of the NSA and the big government stuff, which I'm sure they're customers don't have to disclose that, but for the most part a normal enterprise are constantly planning as if they are already attacked and they're having different schemes that they're deploying. How are they using your platform for that right now? >> Well, the nature of attacks has changed. And it's evolved from just trying to find the hole in the firewall or where we get into the gateway, to how we find a way through a back door and just hang out in your network and watch for patterns and watch for the ability to aggregate relationships and then pose as a known entity that you can then cascade in. And in the world of cyber security you have to be able to understand those anomalies and be able to detect those anomalies that sit there and watch for their patterns to change. And as you go through a whole life cycle of data management between a cloud on Prim and a hybrid architecture, it opens up many, many opportunities for the bad guys to get in and have very new schemes. And our cyber security models give the ability to really track how those anomalies are attaching, where the patterns are emerging, and to be able to detect that in real-time and we're seeing the major enterprises shift to these new models, and it's become a very big part of our growth. >> So I got to change gears and ask you about open-source. You've been an open-source really from the beginning, I would call first generation commercial. But it was not a tier one citizen at that time. It was an alternative to other privatery platforms, whether you look at the network stack or certainly from software. Now today it's tier one. Still we hear business people kind of like, well, open-source. Why should a business executive care about opens-source now? And what would you say to that person who's watching about the benefits of open-source and some of the new models that could help them. >> Well, open-source in general's going to give a number of things. One, it's going to probably provide the best tech, the most innovation in a space, whether that be at the network layer or whether that be at the middle wear layer, the tools layer or certainly the data layer. And you're going to see more innovation typically happen on those platforms much faster and you've got transparent visibility into it. And it brings an ecosystem with it and I think that's really one of the fundamental issues that someone should be concerned with is what does the ecosystem around my tech look like? An open-source really draws forward a very big ecosystem in terms of innovators of the tech, but also enablers of the tech and adopters of the tech in terms of incremental applications, incremental tool sets. And what it does and the benefit to the end customer is the best tech, the most innovation, and typically operating models that don't generate lock in for 'em, and it gives them optionality to use the tech in the most appropriate architecture in the best economic model without being locked in to a proprietary path that they end up with no optionality. >> So talk about the do-it-yourself mentality. In IT that's always been frowned upon because it's been expensive, time-consuming, yet now with organic open-source and now with cloud, you saw that first generation do-it-yourself, standing up stuff on Amazon, whatnot, is being very viable. It funded shadow IT and a variety of other great things around virtualization, visualization, and so on. Today we're seeing that same pattern swing back to do-it-yourself, is good for organic innovation but causes some complexities. So I want to get your thoughts on this because this seems to be a common thread on our Cube interviews and at Hadoop Summit and at Big Data SV as part of Big Data Week when we were in town. We heard from customers and we heard the following: It's still complex and the total cost of ownership's still too high. That seems to be the common theme for slowing down the rapid acceleration of Hadoop and its ecosystem in general. One, do you agree with that? And two, if so, or what would be than answer to make that go faster? >> Well, I think you're seeing it accelerate. I think you're seeing the complexities dwindle away through both innovation and the tech and the maturing of the tech, as well as just new tool sets and applications that are leveraging it, that take away any complexity that was there. But what I think has been acknowledged is, the value that it creates and that it's worth the do-it-yourself and bringing together the spare techs because the innovation that it brings, the new architectures and the value that it creates as these platforms move into the different use cases that they're enabling. >> So I got to ask you this question. I know you're not going to like it and all the people always say, well John, why does everyone always ask that same question? You guys have a radically different approach than Cloudera. It's the number one question. I get ask them about Cloudera. Cloudera, ask them about Hortonworks. You guys have been battling. They were first. You guys came right fast followers second. With the Yahoo! thing we've been following you guys since day one. Explain the difference between Cloudera, because now a couple things have changed over the past few years. One is, Hadoop wasn't the be all end all for big data. There's been a lot of other things certainly SPARK and some other stuff happening, but yet now enterprises are adopting and coexisting with other stuff. So we've seen Cloudera make some pivots. They certainly got some good technology, but they've had some good right answers and some wrong answers. How've you guys been managing it because you're now public, so we can see all the numbers. We know what the business is doing. But relative to the industry, how are you guys compared to Cloudera? What's the differences? And what are you guys doing differently that makes Hortonworks a better vendor than Cloudera? >> I can't speak to all the Cloudera models and strategies. What I'll tell you is the foundation of our model and strategy is based on. When we founded the company we were as you mentioned, three of four years post Cloudera's founding. We felt like we needed to evolve Hadoop in terms of the architecture, and we didn't want to adopt the batch-oriented architecture. Instead we took the core Hadoop platform and through YARN enabled it to bring a central data architecture together as well as be able to be generating batch interactive in real-time applications, leveraging YARN as the data operating system for Hadoop. And then the real strategy behind that was to open up the data sets, open up the different types of use cases, be able to do it on a central data architecture. But then as other processing engines emerged, whether it be a SPARK as you brought up or some of the other ones that we see coming down the pipe, we can then integrate those engines through YARN onto the central data platform. And we open up the number of opportunities, and that's the core basis. I think that's different than some of the other competitor's technology architecture. >> Looking back now five years, are there moves that you were going to make that others have made, that you look back and say I'm glad we didn't do that given today's landscape? >> What I'm glad we did do is open up to the most use cases and workloads and data sets as possible through YARN, and that's proven to be a very, very, fundamentally differentiation of our model and strategy for anybody in the Hadoop space certainly. And I'm also very happy that we saw the opportunity about a year ago that it needed to be more than just about data at rest on Hadoop, and that actually to truly be the next generation data architecture, that you've got to be able to provide the platforms for data at rest and data in motion and our acquisition of Onyara, to be able to get the NiFi technology so that we're truly capturing the data from the point of origination all the way through the movement cycle until it comes at rest has given us now the ability to do a complete life cycle management for an entire data supply chain. And those decisions have proven to be very, very differentiation between us and any of our other competitors and it's opened up some very, very big markets. More importantly, it's accelerated the time to value that our customers get in the use cases that they're enabling through us. >> How would you talk about the scenario that people are saying about Hadoop not being the end all be all industry? At the same time, 'cause big data, as Aroon Merkey said on the Keblan Dublin. It's bigger than Hadoop now, but Hadoop has become synonymous with big data generally. Where's the leadership coming from in your mind? Because we're certainly not seeing it on the data warehouse side, 'cause those guys still have the old technology, trying to co-exist and re=platform for the future. So question is, is Hortonworks viewing Hadoop as still leading generically as a big data industry or has it become a sidebar of the big data industry? >> Of Hadoop? Hadoop is the platform, and we believe ground zero for big data. But we believe it's bigger than that. It's about all data and being able to manage the entire life cycle of all data, and that starts from the point of origination, until it comes at rest, and be able to continue to drive that entire life cycle. Hadoop certainly is the underpinning of the platform for big data, but it's really got to be about all data. Data at rest, data in motion, and what you'll see is the next leg in this is, the modern data applications that then emerge from that. >> How has the ecosystem in the Hadoop industry, I would agree with by the way the Hadoop players are leading big data in general in terms of innovation. The ecosystem's been a big part of it. You guys have invested in it. Certainly a lot of developers and open-source. How has the ecosystem changed given the current situation from where it was? And where do you see the ecosystem going? With the re-platforming not everyone can have a platform. There's a ton of guys out there that have tools, that are looking for a home, they're trying to figure out the chessboard on what's going on with the ecosystem. What's your thoughts of the current situation and how it will evolve in your view? >> Well, I think one of the strongest statements from day one is whether it's EDW or BI or relational, none of the traditional platform players say the way you solve your big data problem is with my platform. They to a company have a Hadoop platform strategy of some form to bring all of that huge volume of big data under management, and it fits our model very well in that we're not trying to disintermediate, but extend those platforms by leveraging HDP as an extension of their platform. And what that's done is it's created pool markets. It's brought Hadoop into the enterprise with a very specific value proposition in use case, bringing more data under management for that tool, that application, or that platform. And then the enterprises has realized there's other opportunities beyond that. And new use cases and new data sets, we can also gain more leverage from. And that's what's really accelerated-- >> So you see growth in the ecosystem? >> We're actually seeing exponential acceleration of the growth around the ecosystem. Not only in terms of the existing platform and tools and applications for either adopting Hadoop, but now new start-up companies building completely from scratch applications just for the big data sets. >> Let's talk about STARS. We were talking before we sat down about the challenges being an entrepreneur. You mentioned the exponential acceleration of entrepreneurs coming into the ecosystem. That's a safe harbor right now. It seems to be across the board. And a lot of the big platforms have robust, growing ecosystems. What's the current landscape of STARS? I know you're an active investor yourself and you're involved in a lot of different start-up conversations and advisor. What's your view of the current landscape right now? Series A, B, C, growth. Stalling. What needs to be in place for these companies to be successful? What are some of the things that you're seeing? >> You have to be surgically focused right now or on a very particular problem set, maybe even by industry. And understand how to solve the problem and have an absolute correlation to a value proposition and a very well defined and clear model of how you're going to go solve that problem, monetize it, and scale. Or you have to have an incredibly well-financed and deep war chest to go after a platform play that's going after a very large TAM that is enabling a re-platforming at one of the levels and the new IT landscape. >> So laser focus in a stack or vertical, and/or a huge cash from funded benchmark or other VCs, tier one VCs, to have a differentiator. They have to have some sort of enabler. >> To enable a next generation platform and something that's very transformational as a platform that really evolves the IT stack. >> What strategies would you advise entrepreneurs in terms of either white spaces to attack and/or their orientation to this new data layer? Because if this plays out as we were talking about, you're going to have a horizontal data layer where you need eye dropper ability. Need to have data in motion, but data aware. Smart data you integrate into disparate systems. Breaking down the siloed concept. How should an entrepreneur develop or look at that? Is there a certain model you've seen work successfully? Is there a certain open-source group they can jump into? What thoughts would you share? 'Cause this seems to be the toughest nut to crack for entrepreneurs. >> Right now you're seeing a massive shift in the IT data architecture, is one example. You're seeing another massive shift in the network architecture. For example, the SDN, right? You're seeing I think a big shift in the kinds of applications getting away from application functionality to data enabled applications. And I think it's important for the entrepreneur to understand where in the landscape do they really want to position? Where do they bring intellectual capital that can be monetized? Some of the areas that I think you'll see emerge very quickly in the next four, six, eight quarters are the new optimization engines, and so things around AI and machine learning. And now that we have all of the data under management through its entire life cycle, how do I now optimize both where that data's processed, in the cloud or on Prim, or as it's in motion. And there's a massive opportunity through software defined networking to actually come in and now optimize at the purest price point and/or efficiency where that data's managed, where that data's stored, and let it continue to reap the benefits. Just as Amazon's done in retail, if you like this, you should look at that. Just as Yahoo! did, I'll point out with Hadoop, it's advertising models and strategies of being able to put specific content in front of you. Those kinds of opportunities are now available for the processing and storage of data through the entire life cycle across any architectural strategy. >> Are you seeing data from a developer's standpoint being instrumental in their use cases? Meaning as I'm developing on top a data platforms like Hortonworks or others, where there's disparate data, what's their interaction? What's their relationship to the data? How are they using it? What do they need to know? Where's the line in terms of their involvement in the data? >> Well, what we're seeing is very big movement with the developed community that they now want to be able to just let the data tell them where the application service needs to be. Because in the new world of data they understand what the entity relationships are with their customers and the patterns that their customers happening. They now can highly optimize when their customers are about to cross over into from one event to the other, and what that typically means and therefore what the inverted action should be to create the best experience with their customer, to create a higher level of service, to be able to create a better packaged price point at a better margin. They also have the ability to understand it in real-time based on what the data trend is flowing, how well their product's performing. Any obstacles or issues that are happening with their product. So they don't want to have to have application logic that then they run a report on three days, three weeks after some events happened. They now are taking the data and as that data and events are happening in the data and it's telling them what to do and they're able to prescriptively act on whatever event or circumstance unfold from that. >> So they want the data now. They want real-time data embedded in the apps as on the front line developer. >> And they want to optimize what that data is doing as it's unfolding through its natural life cycle. >> Let's talk with your customer base and what their expectations are. What questions should a customer or potential customer ask to their big data vendor as they look at the future? What are the key questions they should ask? >> They should really be comparing what is your architectural strategy, first and foremost. For managing data. And what kinds of data can I manage? What are the limitations in your architecture? What workloads and data sets can't I manage? What are the latency issues that your architecture would create for me? What's your business model that's associated with us engaging together? How much of the life cycle can you enable of my data? How secure are you making my data? What kind of long tail of visibility and chain of custody can I have around the governance? What kind of governance standards are you applying to the data? How much of my governance standards can you help me automate? How easy is it to operate and how intuitive is it? How big is your ecosystem? What's your road map and your strategy? What's next in your application stack? >> So enterprises are looking at simplicity. They're looking for total cost of ownership. How is big data innovation going to solve that problem? Because with IoT, again, a lot of new stuff's happening really, really fast. How do they get their arms around this simplicity question in this total cost of ownership? How should they be thinking about it? >> Well, what the Hadoop platforms have to do and the data in motion platforms have to do is to be able to bring the data under management and bring all of the enterprise services that they have in their existing data platforms, in the areas of security, in the areas of management, in the areas of data governance, so they can truly run mission critical workloads at scale with all the same levels of predictability that they have in isolation, in their existing proprietary platforms. And be able to do it in a way that's very intuitive for their existing platforms to be able to access it, very intuitive for their operations teams to be able to manage it, and very clean and easy for their existing tools and platforms investments to leverage it. >> On the industry landscape right now what are you seeing if a consolidation? Some are saying we're seeing some consolidation. Lot of companies going private. You're seeing people buckle down. It's almost a line. If you weren't born before a certain date for the company, you might have the wrong architecture. Certainly enterprises re-platform, I would agree with that, but as a supplier to customers, you're one of the young guys. You were born in the cloud. You were born in open-source, Hortonworks. Not everyone else is like that, and certainly Oracle's like one of the big guys that keep on doing well. IBM's been around. But they're all changing, as well. And certainly a lot of these growth companies pre-IPO are kind of being sold off. What's your take on the current situation with the bubble, the softening, whatever people calling it. What's your thoughts? >> I think you see some companies who got caught up and if we sort of unpack that to the ones who are going private now, those are the companies that have operated in a very mature market space. They were able to not innovate as much as they would probably have liked to, they're probably locked into a proprietary technology in a non-subscription model of some sort. Maybe a perpetual license model. And those are very different models than the enterprise wants to adopt today and their ability to innovate and grow because the market shrank, forced them to go into very constrained environments. And ultimately, they can be great companies. They have great value propositions, but they need to go through transformations that don't include a 90-day shot clock in the public market. In the markets where there's maybe, I was in the B round or the C round and I was focused on providing a niche offering into one of those mature spaces that's becoming disintermediated or evolve quickly because an open-source company has come into the space or that section of IT stack has morphed into more of a cloud-centric or SAP-centric or an open-source centric environment. They got cut short. Their market's gone away. Their market shrunk. They can't innovate their way out of it. And they then ultimately have to find a different approach, and they may or may not be able to get the financing to do that. We're in a much different position. >> Certainly the down round. We're seeing down rounds from the high valuations. That's the first sign of trouble. >> That's the first sign. I've gotten three calls this week from companies that are liquidating and have two weeks to find a new home. >> Great, we'll look for some furniture for our new growing SiliconANGLE office. >> I think you'll have some good values. >> You personally, looking back over five year now in this journey, what an incredible run you guys have had and fun to watch you guys. What's the biggest thing that surprised you and what's the biggest thing that's happened? If you can talk about those two things 'cause again, a lots happened. The markets changed significantly. You guys went public. You got a big office here. What surprised you and what was the biggest thing that you think was the catalyst of the current trajectory? >> How quickly the market grew. We saw from day one when we started the company that this was a billion dollar opportunity, and that was the bar for starting whatever we did. We were looking for new opportunities. We had to see a billion dollar opportunity. How quickly we have seen the growth and the formation of the market in general. And then how quickly some of the new opportunities have opened up, in particular around streaming, Internet of Things, the new paradigm data sets, and how quickly the enterprises have seen the ability to create a next generation data architecture and the aggressiveness in which their moving to do that with Hadoop. And then how quickly in the last year it swung to also being able to want to bring data in motion under management, as well. >> If you could talk to a customer right here, right now, and they asked you the following question, Rob, look around the corner five years out. Tell me something that someone else can't see that you see, that I should be aware of in my business. And why should I go with Hortonworks? >> It's going to be a table stake requirement to be able to understand from whether it be your customer or your supply chain from the point they begin to engage and the first step towards engaging with your product or your service, what they're trying to accomplish, and to be able to interact with them from that first inception point. It's also going to be table stakes to understand to be able to monitor your product in real-time, and be able to understand how well it's performing, down to the component level so that you can make real-time corrections, improvements, and be able to do that on the fly. The other thing that you're going to see is that it's going to be a table stake requirement to be able to aggregate the data that's happened in that life cycle and give your customer the ability to monetize the data about them. But you as the enterprise will be responsible for creating anonymity, confidentiality and security of the data. But you're going to have to be able to provide the data about your customers and give them the ability to if they choose to monetize the data about them, that the ability to do so. >> So I get that correct, you're basically saying 100% digital. >> Oh, it's by far, within the next five years, absolutely. If you do not have a full digital model, in most industries you'll be disintermediated. >> Final question. What's the big bet that you're making right now at Hortonworks? That you say we're pinning the company on blank, fill in the blank. >> It's not about big data. It's about all data under management. >> Rob, thanks so much for spending the time here On the Ground. Rob Bearden, CEO of Hortonworks here for an executive On the Ground. I'm John for The Cube. Thanks for watching. (techno music)

Published Date : Jun 24 2016

SUMMARY :

Voiceover: On the Ground, Welcome to a special On the Ground executive interview So I got to ask you, and the street has us at $265 million dollars in billings. CEOs across the globe are facing profound challenges, and that's really the transformation that's happening and that's really been the key trend and the data that you have about them. and the value creation back is at a pace so now the challenge is how to use technology, and so the technology as you said is there, line of sight of the value, and have the ability to monetize and unlock What does that mean for the customer? the ability to manage data at rest with Hadoop, and one of the ideas of Hadoop was it was And so the data warehouse of the future So how has real-time changed the game? the data has to be able to be processed whether it be So the question is how are you guys going to of the data platform being able to bring batch, for both the tech and the company. So the TAM's growing. and the value we bring to the data What's some of the level of engagements for the bad guys to get in and have very new schemes. and some of the new models that could help them. and adopters of the tech in terms of So talk about the do-it-yourself mentality. and the tech and the maturing of the tech, and all the people always say, and that's the core basis. it's accelerated the time to value that our customers get or has it become a sidebar of the big data industry? and that starts from the point of origination, How has the ecosystem in the Hadoop industry, say the way you solve your big data problem acceleration of the growth around the ecosystem. And a lot of the big platforms have robust, and have an absolute correlation to a value proposition They have to have some sort of enabler. that really evolves the IT stack. 'Cause this seems to be the toughest nut and let it continue to reap the benefits. They also have the ability to understand it as on the front line developer. And they want to optimize what that data is doing What are the key questions they should ask? How much of the life cycle can you How is big data innovation going to solve that problem? and the data in motion platforms have to do and certainly Oracle's like one of the big guys and their ability to innovate and grow We're seeing down rounds from the high valuations. That's the first sign. for our new growing SiliconANGLE office. and fun to watch you guys. have seen the ability to create and they asked you the following question, that the ability to do so. So I get that correct, If you do not have a full digital model, What's the big bet that you're making right now It's about all data under management. for an executive On the Ground.

SENTIMENT ANALYSIS :

ENTITIES

EntityCategoryConfidence
Rob BeardenPERSON

0.99+

Dave VelantiPERSON

0.99+

Peter BurrisPERSON

0.99+

RobPERSON

0.99+

AmazonORGANIZATION

0.99+

Michael DellPERSON

0.99+

$3 billionQUANTITY

0.99+

HortonworksORGANIZATION

0.99+

John FurrierPERSON

0.99+

two weeksQUANTITY

0.99+

IBMORGANIZATION

0.99+

JohnPERSON

0.99+

100%QUANTITY

0.99+

Aroon MerkeyPERSON

0.99+

OracleORGANIZATION

0.99+

90-dayQUANTITY

0.99+

three daysQUANTITY

0.99+

JuneDATE

0.99+

two thingsQUANTITY

0.99+

TAMORGANIZATION

0.99+

first signQUANTITY

0.99+

first entryQUANTITY

0.99+

five yearsQUANTITY

0.99+

last weekDATE

0.99+

oneQUANTITY

0.99+

DublinLOCATION

0.99+

bothQUANTITY

0.99+

$1 trillion dollarQUANTITY

0.99+

over 900 customersQUANTITY

0.99+

twoQUANTITY

0.99+

todayDATE

0.99+

over 1000 employeesQUANTITY

0.99+

$50 billion dollarQUANTITY

0.99+

three callsQUANTITY

0.99+

firstQUANTITY

0.99+

HadoopTITLE

0.99+

last yearDATE

0.99+

sixQUANTITY

0.99+

$265 million dollarsQUANTITY

0.98+

Big Data WeekEVENT

0.98+

three weeksQUANTITY

0.98+

one exampleQUANTITY

0.98+

Series AOTHER

0.98+

Keblan DublinORGANIZATION

0.98+

this weekDATE

0.98+

first stepQUANTITY

0.98+

Hadoop SummitEVENT

0.98+

Yahoo!ORGANIZATION

0.98+

BothQUANTITY

0.97+

first generationQUANTITY

0.97+

this yearDATE

0.97+

OneQUANTITY

0.97+

fourQUANTITY

0.96+

This yearDATE

0.96+

TodayDATE

0.96+

10-year birthdayQUANTITY

0.96+

HadoopORGANIZATION

0.95+

end of 2016DATE

0.95+