Christian Kleinerman, Snowflake | Snowflake Summit 2022

>>Hey everyone. Welcome back to the Cube's live coverage of snowflake summit 22. We are live at Caesar's forum in Vegas, Lisa Martin, with Dave ante, excited to welcome a VIP fresh from the keynote stage, the SAP, a product at snowflake Christian C Claman Christian. Thank you so much for joining us on the queue today. >>Thank you for having me very exciting. >>And thanks for bringing your energy, loved your keynote. I thought, wow. He is really excited about all of the announcements jam packed. We, and we didn't even get to see the entire keynote talk to us about, and, and for the audience, some of the things going on the product revenue in Q1 fiscal 23, 390 4 million, 85% growth, lot of momentum at snowflake. No doubt. >>So I think that the, the punch line is our innovation is if anything, gaining speed. Uh, we were over the moon excited to share many of these projects with customers and partners, cuz some of these efforts have been going on for multiple years. So, um, lots of interesting announcements across the board from making the existing workloads faster, but also we announced some new workloads getting into cyber security, getting into more transactional workloads with uni store. Um, so we're very excited. >>Well first time being back, this is the fourth summit, but the first time being back since 2019 a tremendous amount has changed for snowflake in that time, the IPO, the massive growth in customers, the massive growth in growth in customers with over 1 million in ARR, you talked about one of the things that clearly did not slow down during the last two years is innovation at snowflake. >>Yeah, that, that, that for, for sure, like, um, even though we, we had a, um, highly in the office culture, we did not miss a beat the moment that we said, Hey, let's all start doing zoom based calls. We, we did. So, uh, I dunno if you saw the, the first five minute minutes of my section in the keynote. Yeah. We, we originally talked about summarizing it and no we're gonna spend 40 minutes here. So we did a one minute clip and whatever gets flashed there. So no, the, the pace of innovation, I think it's second to none and maybe I'll highlight the something that we're very proud of. Snowflake is a single product, a single engine. So if we're making a query performance enhancement, it will help the cyber security workload and the low high concurrency, low latency workload. And eventually we're starting to see some of those enhancements all the way to uni store. So, so we get a lot of leverage out of our investments. What's >>Your favorite announcement? >>That's like picking children. Of course. Um, I think the native applications is the one that looks like, eh, I don't know about it on the surface, but it has the biggest potential to change everything like create an entire ecosystem of solutions for within a company or across companies that I don't know that we know what's possible. >>Well, I I've been saying for a while now that you have this application development stack over here, the database is kind of here and then you have the analytics and data pipeline stack. Those are those separate worlds. We, we talk about bringing data and AI and machine intelligence into applications. The only way that that is actually gonna move forward is if you bring those worlds together is a good example of that happening, um, within a proprietary framework, uh, it's probably gonna happen open source organically and you can sort of roll your own. Is that by design or is it just sort of happening? Well, >>The, the, they bring it all into a single platform obviously by design, cuz there is so much friction today on making all the pieces work together, which database do I use for transactions and how do I move data to my analytics system? And how do I keep system, uh, reference data in sync between the two? So, so it's complicated and our mission was remove all of this friction from, from, from the equation. Uh, the open source versus not the way we think about it is opensourcing open formats or even open APIs it's does it help us deliver the solution that we want for our customer? Does it help us solve their problems? In certain instances, it has done in the past and we've opened source frameworks in, in others. We mentioned at the keynote today, the, the integration of iceberg tables, that's an strong embrace of open technologies, but that does not mean that we want to continue to innovate in our formats. A lot of what you see in the open formats is because snowflake proprietary, uh, innovation. So, uh, we have a very clear philosophy around this. Well >>Like any cloud player, you have to bring open source tools in and make them available for your application developers. But take us through an example of, of uni store and specifically how you're embracing transaction data. What's a customer gonna actually do take us paint a picture >>For us. I I'm gonna give you a very simple use case, but I love it because it, it shows the power of the scenario today. When people are ingesting data into snowflake, you wanna do some book capping associating with those loads. So imagine I have, I dunno, a million files. How many of those files have I loaded? Imagine that one of those loads fail, how do you keep in sync? Whether the data made or not with your bookkeeping today, if you had to do it with a separate transactional database for the bookkeeping and the loading in, in snowflake, it is a lot of complexity for you to know what's where with uni store, you can just say, I'm gonna do the bookkeeping with these new table. It's called hybrid tables. The lows are transactional and all of this is a single transaction. So for, for anyone that has dealt with inconsistencies in database world, this is like a godsend. >>Okay. So my interpretation of that's all about what happens when something goes wrong >><laugh> which is a lot of the, everything about transactions. Yeah. It's what happens when goes wrong and goes wrong. Doesn't mean failures like goes wrong is when you're debiting money from your bank account, not having enough balance that counts as go wrong and the transactions should be aborted. So yes, transactions are all about conflict management and we're simplifying that in a broader set of use cases >>And, and in recovery. So you're, you're in fast recovery. So you're, you're the, the business impact of what you're doing is to sort of simplify that process. Is that the easy way to >>Boil down? Pretty much everything we do is about simplification. Like we, we we've seen organizations are large focusing on wrestling infrastructure as opposed to what are the business problems for a Frank who reference something that, that, that I believe very much in like, which is mission alignment. We are working on helping our customers achieve what they're set out to achieve, not giving them more technology for them to their goal to become, to wrestle the infrastructure. So it's all about ease of use all about simplification removal, friction, >>Just so if I may, so mission alignment, you know, you always hear about technology companies that, you know, provide infrastructure or a service, and then the customer takes that and, and, you know, monetizes it pretty much on their own. What the big change that I'm discerning from these announcements is you're talking about directly monetizing and participating in that monetization as a technology partner, but also the marketplace as well. >>Correct. And I would say in some ways this is not new. This has been happening for the last couple of years with data. Like if you just saw our industry data cloud launches, the financial services cloud, it comes with data providers that help you achieve specific outcomes on a specific industry. Mm-hmm <affirmative> what we're doing now is saying, it's not just data. Maybe it's some business logic, maybe it's some machine learning, maybe it's some user interface. So I think we're just turning the knob on collaboration and it's a continuation of what we've been doing. >>Talk a little bit more about mission alignment. When I heard Frank, Sweetman talk about that this morning. I always love that when I hear cultural alignment with organizations, but as you just said, it's really about enabling our customers to deliver outcomes to their customers as the SVP product. Can you, uh, talk a little bit about how the customers are influencing the product roadmap, the innovations and the speed with which things are coming out at snowflake? >>Yeah, so great question. We have several organizations at snowflake that are organized by vertical by industry. So the, the major sales organization is part of ed that the marketplace business development team is organized like that. We have a separate team that provides top leadership by industry vertical, um, globally. And then even within our solution engineering, there is verticals. So we have a longitudinal view of all the different functions and what do we need to do to achieve a set of use cases in a vertical? And all of those functions are in con constant communication with us on this is where the product is, um, seeing an opportunity or could do better for that vertical. So yeah, I can tell you, and obviously we love when, when there's alignment between those, but that's not always the case. You heard us talk about clean rooms now for some time, clean rooms are applicable to almost any industry, but it's red hot for media and advertising, third party, cookie deprecation, and all of that. So we, we get to, to see that lens, that our innovation is informed by industries. >>So we, we're seeing, obviously the evolution of snowflake we talked about in the keynotes today, you guys talked about 2019 and, you know, pre 2019, even it was to me anyway, your first phase was, Hey, we got a simpler EDW. You know, we're gonna pick that off and put it in the cloud and make it elastic and separate compute from storage, all that kind of cool stuff. And then during the pandemic, it was really IPO, but also the data cloud concept, you sort of laid that vision out. And now you're talking about application development, monetization, what I call the super cloud that layer. Right. Okay. So I, are >>You determin it best? >>Yes. You talk about this, uh, these announcements, how they fit into that larger vision where you're >>Going. Great question. The, the, the notion of the data cloud has not changed one bit. The data cloud thesis is that we want to provide amazing technology for our customers, but also facilitate collaboration and content exchange VR platform. And all that we did today is expand what that content can be. It's not just data or little helper function, it's entire applications, entire experiences. That is the, the summing up the, the, the impact of our announcements today. That, that that's the end of it. So it's still about the data cloud. >>So what is impressive to me is that you guys wouldn't couldn't have a company without the hyperscalers, right? It would be a lot different, right? So you built on top of that and, and now you have your customers building their own super clouds. I call it, I get a lot of grief for that term it's but the, the, the big area of criticism I get is, ah, that's just SAS. And I'm like, no, it's not, no, uh, I, I is everybody public who's announcing stuff. I, I better be careful, but you have customers that are actually building services, taking their data, their tooling, their proprietary information, and putting it on the snowflake data cloud and building their own clouds. Yeah. That's different. Then that's not multi-cloud, which is I can run on a different cloud and it's not, is it sass? If it feels like it's something new from a, from your perspective, is, is it different? >>I, I, I love that you called out that running on all clouds is not what we do right. This days, everyone is multi-cloud, you, you run on a VM or a container, and I multi-cloud check, no, we have a single platform that does multi-region multi-cloud but also cross region cross cloud globally, that that is the essence of what we're doing. So it, it is enabling new capabilities. >>I've I've also said, you know, in many respects, the super cloud hides, the underlying complexity, you think about things like exploiting graviton and a developer. Doesn't need to worry about that. You're gonna worry about that. Uh, but at the same time, they, the, as you get into the develop, the world of application development, some of your developers may want access to some of those cloud primitives. Are you providing both? What's the strategy there? >>Generally not in some areas, we, we, we, I would say bleed through some details that are material, but think of the reality of someone that wants to build a solution, it's really difficult to build an awesome solution in one cloud, Hey, you need to do this. What's the latest instance, and is gravity tank gonna help you or not all of that. Now do it for another one and then do it for another one. And I can tell you it's really difficult because we go through that exercise. Snowflake pouring to a new cloud is somewhere between one and two years of effort and not, not a small number of people because you're looking at security models and storage models. So that's the value that we give to anyone know, wants to build a solution and target customers in all three clouds. I >>Mean, people are still gonna do it themselves, but they're gonna spend a lot more and they're gonna lose their focus on what their real business is. And there'll still be that. I think that D DIY market is enormous for you guys, huge >>Opportunity. And there's also the question on what is the cost of that analysis and that effort. And can we amortize it on behalf of all of our customers? Like we talk about graviton, we have not talked about the many things that we evaluated that were not better price performance for our customers. That evaluation happened. That value was delivered by not moving there. >>And when you do it yourself, the curve looks like, okay, Hey, we can do it ourselves. We can make it pretty Inex. And then, and then the costs are gonna decline, but what really happens, like developing a mobile app, you gotta maintain it. And then if you don't have the scale and you don't have the engineering resources, you're just, the, the costs are gonna continue to go through the roof. I, >>I, I love that you compare it to mobile apps. Like, yeah. I still don't understand why every company that wants to build an app has to build two <laugh>. They got it. Yeah. There is no super cloud for the phone. >>Right. >>That's sort of our, our, our broad vision. Not yet. Not, not the phone, but the super cloud. Yeah, >>Yeah, absolutely. >>You >>Get it. This is, and you look out the ecosystem here. I mean, what a difference that you've been pointing this out, Lisa from, from, from 2019, a lot of buzz, it's all about innovation. You see this at, at thing at the reinvent is like the super bowl obviously. And you see that and it used to be, oh, how is, how is AWS gonna compete with snowflake and separate compute with stores? That's I, I feel like in a large way, that's all gone. It's like, okay, how do we like rise the whole, the whole industry? And that's really where the innovation is. >>We have an amazing partnership with AWS and they benefit from what we do. Yes. There's some competitive elements, but we're changing so many things creating so much opportunity that we're more aligned than not. Yeah. >>Last question for you is continuing on the part AWS partnership front, how does a partner like AWS and other partners, how do they fit into the data cloud narrative that you're talking about to customers? >>I would say that other than the one or two teams that are directly competitive, the rest of their teams are part of in data cloud. Like, uh, our relationship with SageMaker as an example is amazing. And a lot of what we wanna deliver to our customers is choice around machine learning, frameworks and tools. And they're part of the data cloud. We're working with them on how do you push down computation to avoid getting data out, to reinforce governance? So I, I would say that and, and go look at it that they have a hundred and something teams. So if two teams out of hundreds, uh, are, are the competitive element, we are largely aligned. And they're part of data cloud. >>Yeah. I mean, you, your customers consume a lot of compute and storage for, >>For a lot. Yes. >>AWS and, and also, you know, increasingly Azure and, and Google. I mean, it's, um, pretty amazing times, uh, Christian, I want to ask you about, um, couple of terms. Uh, one term that came up a couple of times today in Frank's keynote, he said, I'm not gonna call it a data mesh out kind of out of respect for the purists, which is cool, I thought, but then you had a customer stand up Geico and said, we're building a data. Mesh JPMC is, is speaking at this event, building a data mesh. And I look at things through that prism and say, okay, data mesh is about, you know, decentralization. Some, I I'd be curious as to whether or not you tick that box, but it's about building data products. It's about, uh, uh, self-service infrastructure. And it's about automated computational governance. You are actually tipping a lot of the ticking, a lot of those boxes and, and Mike, I guess the big one is, are, are you building a bigger walled garden? But I, I think you would say, no, it's a, it's a giant distributed network, but, but what, what, what do you say to that? We, >>The latter, the latter, yeah, giant distributed, open cloud and open in the sense that we want anyone to plug in and, and someone can say, well, but I cannot read your file formats. Sure. You can with what we announced today, but it's not about that. Our APIs are open. We have rest APIs. We have JDC ODC, probably most popular interfaces ever. Um, and we want everyone to be part of it. If anything, there's lots of areas that we would not want to go into ourselves cause we want partners and customers to go in there. So, no, we we're looking at a very broad ecosystem. We win based on the value created on top of the platform. Yeah. >>And I makes total sense to me. I mean, I think the imaculate conception of data mesh might be a purely open source version of snowflake. I just don't see that happening anytime soon. And so I, I think you're gonna, you are, I wrote about this creating a defacto standard and >>Exactly, and, and I don't like to get into the terminology that, oh, is the data measure? Not, no go look at the concepts like people used to say, but snowflake is not a data lake. Okay. What is the data lake? It's just a pattern. And if you follow the pattern and you can do it, that's fine. Then there's the, uh, emotional quasi-religious overlay open versus not, I think that's a choice. Not necessarily the concept, >>It's a moving target. I mean, I Unix used to be open. You know, that was the, I agree. Now, the reason why I do think the data mesh conversation is important is because Shaak Dani, when she defined data mesh, she pointed out in my view. Anyway, the problems of getting value outta data is that you go through these hyper specialized teams and they're they're blockers in the organization. And I think you in many respects are attacking that. And it's an organizational issue. >>The, the insights in the pattern are a hundred percent value and aligned with what we do, which is they, you want some amount of centralization, some amount of decentralization living in harmony. Uh, yeah. I have no problem with, with terminology. >>And the governance piece is, is, is massive. Especially it's the, the picture's becoming much more clear. Um, whatever's in the data cloud is a first class citizen, right? And you give all these wonderful benefits. I mean, the interesting thing, what you're doing with Dell and, and pure, I, I asked you that on the analyst call, it's a start. You know, I, I, I mean, >>And I said it briefly in, in, in the keynote this morning, we're publishing a set of standard conformance tests. So any storage system can plug into data cloud. >>Yeah. >>And by the way, it's based on S three APIs, another defect of standard. Like it's not a standard, but everyone is emulating that. And we're plugging >>Into that. Yeah. Nobody's complaining against, against S3 API >>About it is a, oh, it's not a Apache project. We shouldn't, who cares. Everyone has standard horizon net. That's it? >>Well, we've seen the mistakes of the past with this. I mean, look at, look at Hadoop, right? There was this huge battle between, you know, Cloudera and Horton works and map, oh, map bar is proprietary. Oh, Horton works is purely open. Cloudera is open. They're, they're all gone now. I mean, not gone, but they're just, they didn't have it. Right. You know, they, they got unfocused. I go back to Frank's book. They were trying to do too much to, to too many of those, the, the, the zoo animals and you can't fund it all >>To be effective for us. It's very important. I can give you, I don't know, 20 announcements or 50 announcements from the conference, but they're all going a singular goal. And it's, this do not trade off governance of data with the ability to get value out of data. That's everything we do. >>And that's critical for every company in every industry these days that has to be a data company to be, to survive, to be competitive, to be able to extract value from data. If data's currency, how do I leverage a tool like snowflake to be able to extract insights from it that I can act on and create value for my organization, Geico was on stage this morning. Everyone knows Geico and their beloved, um, gecko. Yeah. Is there another customer that you had that you think really articulates the value of the data cloud and to Dave's point how snowflake is becoming that defacto standard data platform? >>Well, we had Goldman Goldman Sachs on stage as well today. And he, he, he, he mentioned it that people think of Goldman as investment banking and all of that, but no, at the heart of what they do, there's a lot of data. And how do they make better decisions? So I think we could run through 20 different examples cuz your premise is the most important. Everything is a data problem. If it is not a data problem, you're not collecting the right data and getting the sense that you could be getting. >>These guys are public, right. >>Adobe. >>Yeah. Right. Adobe's doing it. Yeah. I dunno if the other one is, I don't wanna say, I'll have to ask you off camera, but the other financial firm building a super cloud, right. <laugh> yeah. I call it super cloud. So let be taking advantage of uni store. Yeah. To bring different data types in and monetize it. That's to me, that's the future of data. That's that's been the holy grail, right. >>We, we tried to emphasize that this is, is not a, Hey six, six months ago. We decided to do this. No, this is years in the making mm-hmm <affirmative>, which is why we were so excited to finally share it. Cuz you don't wanna say three years from now, we're gonna have something. No, it was the, now we have it. We have it in preview and it's working at it is as close to the holy grail as it gets. >>Yeah. I mean, look, pressure's on Kristin. Let's face it. Enterprise data warehouse failed to live up to the promises. Uh, certainly the data lakes fail to deliver master data management, all that's a Hadoop, all that stuff. There was a lot of hype around that. And a lot of us got really excited. Me included and then customers spent and they were underwhelmed. Yeah. So you know, you, you, you gotta deliver, you say it, you gotta do it. >>And correct. And then the, the other thing is I would say all of those waves of technology, there was no real better choice. >>Right. They added value. I wouldn't >>Debate that. You have to give it a shot. Like when you've bought 20 different appliances and you have all these silos and someone sells you, Hey, Hadoop will unify it. It sounds good. Just didn't do it. >>Yeah. And no debate that it brought some value for those that were agree. Sophisticated enough to deploy it. And I agree. Yeah. But, but this is a whole different ball game. >>Oh, everything we want to do is democratize and simplify mm-hmm <affirmative> yeah. We could go build something that I don't know. 10 companies in the world could use. That's not the sweet spot. Like how do we advance like the, the state of value generation in the world? That's the scale that we're talking about is go make it easy, accessible for everyone. >>Governed >>Governance and imperative this these days it's law. Yes. So >>Yeah, you have to, but it's not, it's, that's a, that's a ch really difficult challenge to create what I'll call automated or computational governance in a federated manner. That's not trivial. >>And that's our thesis. Everything we're doing is snow park, big announcement today. Python. I I've had people tell me well, but Python should be easy to host the Python run time. Like you can do it. Like I think in a week it took us years. Why? Oh, secure. Oh, details a lot. And <inaudible> mentioned it like securing. That is no easy, uh, feed >>Christian. Thank you so much for joining Dave and me bringing your energy from the keynote stage to the cube, set, breaking down some of the major announcements that have come out today. There's no doubt that the flywheel of innovation at snowflake is alive well and moving quickly, >>Innovation is, uh, at an all time hat snowflake. Thank you for having me. All >>Right. Our pleasure Christian from our guest, Dave ante, Lisa Martin here live in Las Vegas at Caesar's forum covering snowflake summit 22. We right back with our next guest.

Published Date : Jun 14 2022

SUMMARY :

Thank you so much for joining us on the queue today. of the announcements jam packed. Uh, we were over the moon excited to share the massive growth in customers, the massive growth in growth in customers with over 1 million not miss a beat the moment that we said, Hey, let's all start doing zoom based calls. eh, I don't know about it on the surface, but it has the biggest potential to stack over here, the database is kind of here and then you have the analytics A lot of what you see in the open formats is Like any cloud player, you have to bring open source tools in and make them available for your application developers. is a lot of complexity for you to know what's where with uni store, bank account, not having enough balance that counts as go wrong and the transactions the business impact of what you're doing is to sort of simplify that process. infrastructure as opposed to what are the business problems for a Frank who reference Just so if I may, so mission alignment, you know, you always hear about technology companies that, the financial services cloud, it comes with data providers that help you achieve I always love that when I hear cultural alignment with organizations, but as you just said, is part of ed that the marketplace business development team is organized like that. it was really IPO, but also the data cloud concept, you sort of laid that vision out. where you're And all that we did today is expand what that content can be. So what is impressive to me is that you guys wouldn't couldn't have a company without the I, I, I love that you called out that running on all clouds is not what we do right. Uh, but at the same time, they, the, as you get into the develop, And I can tell you it's really difficult because we go for you guys, huge And can we amortize it on behalf of all of our customers? And then if you don't have the scale and you don't have the engineering resources, I, I love that you compare it to mobile apps. Not, not the phone, but the super cloud. And you see that and it used to be, oh, how is, how is AWS gonna compete with snowflake creating so much opportunity that we're more aligned than not. And a lot of what we wanna deliver to our customers is choice around machine learning, For a lot. I guess the big one is, are, are you building a bigger walled garden? The latter, the latter, yeah, giant distributed, open cloud and open in the sense that we And I makes total sense to me. And if you follow the pattern and you can do it, that's fine. And I think you in many respects are attacking that. The, the insights in the pattern are a hundred percent value and aligned with what we do, I mean, the interesting thing, what you're doing with Dell and, And I said it briefly in, in, in the keynote this morning, And by the way, it's based on S three APIs, another defect of standard. Into that. About it is a, oh, it's not a Apache project. There was this huge battle between, you know, Cloudera and Horton works and map, And it's, this do had that you think really articulates the value of the data cloud and to Dave's point how getting the sense that you could be getting. I dunno if the other one is, I don't wanna say, I'll have to ask you off camera, it. Cuz you don't wanna say three years from now, we're gonna have something. So you know, you, you, you gotta deliver, And then the, the other thing is I would say all of those waves of technology, there was I wouldn't You have to give it a shot. And I agree. That's the scale that we're talking about is go make it easy, accessible for So Yeah, you have to, but it's not, it's, that's a, that's a ch really difficult challenge to create what Like you can do it. There's no doubt that the flywheel of innovation at snowflake is alive well and moving quickly, Thank you for having me. We right back with our next

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Goldman	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
50 announcements	QUANTITY	0.99+
20 announcements	QUANTITY	0.99+
one	QUANTITY	0.99+
Christian Kleinerman	PERSON	0.99+
Frank	PERSON	0.99+
2019	DATE	0.99+
40 minutes	QUANTITY	0.99+
Las Vegas	LOCATION	0.99+
two teams	QUANTITY	0.99+
Lisa	PERSON	0.99+
Vegas	LOCATION	0.99+
Dell	ORGANIZATION	0.99+
Geico	ORGANIZATION	0.99+
one minute	QUANTITY	0.99+
Adobe	ORGANIZATION	0.99+
85%	QUANTITY	0.99+
two years	QUANTITY	0.99+
10 companies	QUANTITY	0.99+
Python	TITLE	0.99+
Mike	PERSON	0.99+
Shaak Dani	PERSON	0.99+
two teams	QUANTITY	0.99+
two	QUANTITY	0.99+
Kristin	PERSON	0.99+
hundreds	QUANTITY	0.99+
first time	QUANTITY	0.99+
JPMC	ORGANIZATION	0.99+
20 different examples	QUANTITY	0.99+
both	QUANTITY	0.99+
today	DATE	0.99+
pandemic	EVENT	0.99+
20 different appliances	QUANTITY	0.99+
Goldman Goldman Sachs	ORGANIZATION	0.98+
first phase	QUANTITY	0.98+
S three	TITLE	0.98+
Christian	PERSON	0.98+
over 1 million	QUANTITY	0.98+
S3	TITLE	0.98+
Christian C Claman Christian	PERSON	0.98+
Google	ORGANIZATION	0.98+
fourth summit	QUANTITY	0.98+
hundred percent	QUANTITY	0.97+
six months ago	DATE	0.97+
single product	QUANTITY	0.97+
one term	QUANTITY	0.96+
single transaction	QUANTITY	0.96+
second	QUANTITY	0.95+
first five minute minutes	QUANTITY	0.95+
single platform	QUANTITY	0.95+
SageMaker	ORGANIZATION	0.95+
single engine	QUANTITY	0.95+
this morning	DATE	0.94+
Horton	ORGANIZATION	0.94+
390 4 million	QUANTITY	0.94+
Apache	ORGANIZATION	0.93+
Snowflake Summit 2022	EVENT	0.93+
SAP	ORGANIZATION	0.91+
Q1 fiscal 23	DATE	0.91+
a hundred	QUANTITY	0.9+
Cube	ORGANIZATION	0.9+

Data Power Panel V3

(upbeat music) >> The stampede to cloud and massive VC investments has led to the emergence of a new generation of object store based data lakes. And with them two important trends, actually three important trends. First, a new category that combines data lakes and data warehouses aka the lakehouse is emerged as a leading contender to be the data platform of the future. And this novelty touts the ability to address data engineering, data science, and data warehouse workloads on a single shared data platform. The other major trend we've seen is query engines and broader data fabric virtualization platforms have embraced NextGen data lakes as platforms for SQL centric business intelligence workloads, reducing, or somebody even claim eliminating the need for separate data warehouses. Pretty bold. However, cloud data warehouses have added complimentary technologies to bridge the gaps with lakehouses. And the third is many, if not most customers that are embracing the so-called data fabric or data mesh architectures. They're looking at data lakes as a fundamental component of their strategies, and they're trying to evolve them to be more capable, hence the interest in lakehouse, but at the same time, they don't want to, or can't abandon their data warehouse estate. As such we see a battle royale is brewing between cloud data warehouses and cloud lakehouses. Is it possible to do it all with one cloud center analytical data platform? Well, we're going to find out. My name is Dave Vellante and welcome to the data platform's power panel on theCUBE. Our next episode in a series where we gather some of the industry's top analysts to talk about one of our favorite topics, data. In today's session, we'll discuss trends, emerging options, and the trade offs of various approaches and we'll name names. Joining us today are Sanjeev Mohan, who's the principal at SanjMo, Tony Baers, principal at dbInsight. And Doug Henschen is the vice president and principal analyst at Constellation Research. Guys, welcome back to theCUBE. Great to see you again. >> Thank guys. Thank you. >> Thank you. >> So it's early June and we're gearing up with two major conferences, there's several database conferences, but two in particular that were very interested in, Snowflake Summit and Databricks Data and AI Summit. Doug let's start off with you and then Tony and Sanjeev, if you could kindly weigh in. Where did this all start, Doug? The notion of lakehouse. And let's talk about what exactly we mean by lakehouse. Go ahead. >> Yeah, well you nailed it in your intro. One platform to address BI data science, data engineering, fewer platforms, less cost, less complexity, very compelling. You can credit Databricks for coining the term lakehouse back in 2020, but it's really a much older idea. You can go back to Cloudera introducing their Impala database in 2012. That was a database on top of Hadoop. And indeed in that last decade, by the middle of that last decade, there were several SQL on Hadoop products, open standards like Apache Drill. And at the same time, the database vendors were trying to respond to this interest in machine learning and the data science. So they were adding SQL extensions, the likes Hudi and Vertical we're adding SQL extensions to support the data science. But then later in that decade with the shift to cloud and object storage, you saw the vendor shift to this whole cloud, and object storage idea. So you have in the database camp Snowflake introduce Snowpark to try to address the data science needs. They introduced that in 2020 and last year they announced support for Python. You also had Oracle, SAP jumped on this lakehouse idea last year, supporting both the lake and warehouse single vendor, not necessarily quite single platform. Google very recently also jumped on the bandwagon. And then you also mentioned, the SQL engine camp, the Dremios, the Ahanas, the Starbursts, really doing two things, a fabric for distributed access to many data sources, but also very firmly planning that idea that you can just have the lake and we'll help you do the BI workloads on that. And then of course, the data lake camp with the Databricks and Clouderas providing a warehouse style deployments on top of their lake platforms. >> Okay, thanks, Doug. I'd be remiss those of you who me know that I typically write my own intros. This time my colleagues fed me a lot of that material. So thank you. You guys make it easy. But Tony, give us your thoughts on this intro. >> Right. Well, I very much agree with both of you, which may not make for the most exciting television in terms of that it has been an evolution just like Doug said. I mean, for instance, just to give an example when Teradata bought AfterData was initially seen as a hardware platform play. In the end, it was basically, it was all those after functions that made a lot of sort of big data analytics accessible to SQL. (clears throat) And so what I really see just in a more simpler definition or functional definition, the data lakehouse is really an attempt by the data lake folks to make the data lake friendlier territory to the SQL folks, and also to get into friendly territory, to all the data stewards, who are basically concerned about the sprawl and the lack of control in governance in the data lake. So it's really kind of a continuing of an ongoing trend that being said, there's no action without counter action. And of course, at the other end of the spectrum, we also see a lot of the data warehouses starting to edit things like in database machine learning. So they're certainly not surrendering without a fight. Again, as Doug was mentioning, this has been part of a continual blending of platforms that we've seen over the years that we first saw in the Hadoop years with SQL on Hadoop and data warehouses starting to reach out to cloud storage or should say the HDFS and then with the cloud then going cloud native and therefore trying to break the silos down even further. >> Now, thank you. And Sanjeev, data lakes, when we first heard about them, there were such a compelling name, and then we realized all the problems associated with them. So pick it up from there. What would you add to Doug and Tony? >> I would say, these are excellent points that Doug and Tony have brought to light. The concept of lakehouse was going on to your point, Dave, a long time ago, long before the tone was invented. For example, in Uber, Uber was trying to do a mix of Hadoop and Vertical because what they really needed were transactional capabilities that Hadoop did not have. So they weren't calling it the lakehouse, they were using multiple technologies, but now they're able to collapse it into a single data store that we call lakehouse. Data lakes, excellent at batch processing large volumes of data, but they don't have the real time capabilities such as change data capture, doing inserts and updates. So this is why lakehouse has become so important because they give us these transactional capabilities. >> Great. So I'm interested, the name is great, lakehouse. The concept is powerful, but I get concerned that it's a lot of marketing hype behind it. So I want to examine that a bit deeper. How mature is the concept of lakehouse? Are there practical examples that really exist in the real world that are driving business results for practitioners? Tony, maybe you could kick that off. >> Well, put it this way. I think what's interesting is that both data lakes and data warehouse that each had to extend themselves. To believe the Databricks hype it's that this was just a natural extension of the data lake. In point of fact, Databricks had to go outside its core technology of Spark to make the lakehouse possible. And it's a very similar type of thing on the part with data warehouse folks, in terms of that they've had to go beyond SQL, In the case of Databricks. There have been a number of incremental improvements to Delta lake, to basically make the table format more performative, for instance. But the other thing, I think the most dramatic change in all that is in their SQL engine and they had to essentially pretty much abandon Spark SQL because it really, in off itself Spark SQL is essentially stop gap solution. And if they wanted to really address that crowd, they had to totally reinvent SQL or at least their SQL engine. And so Databricks SQL is not Spark SQL, it is not Spark, it's basically SQL that it's adapted to run in a Spark environment, but the underlying engine is C++, it's not scale or anything like that. So Databricks had to take a major detour outside of its core platform to do this. So to answer your question, this is not mature because these are all basically kind of, even though the idea of blending platforms has been going on for well over a decade, I would say that the current iteration is still fairly immature. And in the cloud, I could see a further evolution of this because if you think through cloud native architecture where you're essentially abstracting compute from data, there is no reason why, if let's say you are dealing with say, the same basically data targets say cloud storage, cloud object storage that you might not apportion the task to different compute engines. And so therefore you could have, for instance, let's say you're Google, you could have BigQuery, perform basically the types of the analytics, the SQL analytics that would be associated with the data warehouse and you could have BigQuery ML that does some in database machine learning, but at the same time for another part of the query, which might involve, let's say some deep learning, just for example, you might go out to let's say the serverless spark service or the data proc. And there's no reason why Google could not blend all those into a coherent offering that's basically all triggered through microservices. And I just gave Google as an example, if you could generalize that with all the other cloud or all the other third party vendors. So I think we're still very early in the game in terms of maturity of data lakehouses. >> Thanks, Tony. So Sanjeev, is this all hype? What are your thoughts? >> It's not hype, but completely agree. It's not mature yet. Lakehouses have still a lot of work to do, so what I'm now starting to see is that the world is dividing into two camps. On one hand, there are people who don't want to deal with the operational aspects of vast amounts of data. They are the ones who are going for BigQuery, Redshift, Snowflake, Synapse, and so on because they want the platform to handle all the data modeling, access control, performance enhancements, but these are trade off. If you go with these platforms, then you are giving up on vendor neutrality. On the other side are those who have engineering skills. They want the independence. In other words, they don't want vendor lock in. They want to transform their data into any number of use cases, especially data science, machine learning use case. What they want is agility via open file formats using any compute engine. So why do I say lakehouses are not mature? Well, cloud data warehouses they provide you an excellent user experience. That is the main reason why Snowflake took off. If you have thousands of cables, it takes minutes to get them started, uploaded into your warehouse and start experimentation. Table formats are far more resonating with the community than file formats. But once the cost goes up of cloud data warehouse, then the organization start exploring lakehouses. But the problem is lakehouses still need to do a lot of work on metadata. Apache Hive was a fantastic first attempt at it. Even today Apache Hive is still very strong, but it's all technical metadata and it has so many different restrictions. That's why we see Databricks is investing into something called Unity Catalog. Hopefully we'll hear more about Unity Catalog at the end of the month. But there's a second problem. I just want to mention, and that is lack of standards. All these open source vendors, they're running, what I call ego projects. You see on LinkedIn, they're constantly battling with each other, but end user doesn't care. End user wants a problem to be solved. They want to use Trino, Dremio, Spark from EMR, Databricks, Ahana, DaaS, Frink, Athena. But the problem is that we don't have common standards. >> Right. Thanks. So Doug, I worry sometimes. I mean, I look at the space, we've debated for years, best of breed versus the full suite. You see AWS with whatever, 12 different plus data stores and different APIs and primitives. You got Oracle putting everything into its database. It's actually done some interesting things with MySQL HeatWave, so maybe there's proof points there, but Snowflake really good at data warehouse, simplifying data warehouse. Databricks, really good at making lakehouses actually more functional. Can one platform do it all? >> Well in a word, I can't be best at breed at all things. I think the upshot of and cogen analysis from Sanjeev there, the database, the vendors coming out of the database tradition, they excel at the SQL. They're extending it into data science, but when it comes to unstructured data, data science, ML AI often a compromise, the data lake crowd, the Databricks and such. They've struggled to completely displace the data warehouse when it really gets to the tough SLAs, they acknowledge that there's still a role for the warehouse. Maybe you can size down the warehouse and offload some of the BI workloads and maybe and some of these SQL engines, good for ad hoc, minimize data movement. But really when you get to the deep service level, a requirement, the high concurrency, the high query workloads, you end up creating something that's warehouse like. >> Where do you guys think this market is headed? What's going to take hold? Which projects are going to fade away? You got some things in Apache projects like Hudi and Iceberg, where do they fit Sanjeev? Do you have any thoughts on that? >> So thank you, Dave. So I feel that table formats are starting to mature. There is a lot of work that's being done. We will not have a single product or single platform. We'll have a mixture. So I see a lot of Apache Iceberg in the news. Apache Iceberg is really innovating. Their focus is on a table format, but then Delta and Apache Hudi are doing a lot of deep engineering work. For example, how do you handle high concurrency when there are multiple rights going on? Do you version your Parquet files or how do you do your upcerts basically? So different focus, at the end of the day, the end user will decide what is the right platform, but we are going to have multiple formats living with us for a long time. >> Doug is Iceberg in your view, something that's going to address some of those gaps in standards that Sanjeev was talking about earlier? >> Yeah, Delta lake, Hudi, Iceberg, they all address this need for consistency and scalability, Delta lake open technically, but open for access. I don't hear about Delta lakes in any worlds, but Databricks, hearing a lot of buzz about Apache Iceberg. End users want an open performance standard. And most recently Google embraced Iceberg for its recent a big lake, their stab at having supporting both lakes and warehouses on one conjoined platform. >> And Tony, of course, you remember the early days of the sort of big data movement you had MapR was the most closed. You had Horton works the most open. You had Cloudera in between. There was always this kind of contest as to who's the most open. Does that matter? Are we going to see a repeat of that here? >> I think it's spheres of influence, I think, and Doug very much was kind of referring to this. I would call it kind of like the MongoDB syndrome, which is that you have... and I'm talking about MongoDB before they changed their license, open source project, but very much associated with MongoDB, which basically, pretty much controlled most of the contributions made decisions. And I think Databricks has the same iron cloud hold on Delta lake, but still the market is pretty much associated Delta lake as the Databricks, open source project. I mean, Iceberg is probably further advanced than Hudi in terms of mind share. And so what I see that's breaking down to is essentially, basically the Databricks open source versus the everything else open source, the community open source. So I see it's a very similar type of breakdown that I see repeating itself here. >> So by the way, Mongo has a conference next week, another data platform is kind of not really relevant to this discussion totally. But in the sense it is because there's a lot of discussion on earnings calls these last couple of weeks about consumption and who's exposed, obviously people are concerned about Snowflake's consumption model. Mongo is maybe less exposed because Atlas is prominent in the portfolio, blah, blah, blah. But I wanted to bring up the little bit of controversy that we saw come out of the Snowflake earnings call, where the ever core analyst asked Frank Klutman about discretionary spend. And Frank basically said, look, we're not discretionary. We are deeply operationalized. Whereas he kind of poo-pooed the lakehouse or the data lake, et cetera, saying, oh yeah, data scientists will pull files out and play with them. That's really not our business. Do any of you have comments on that? Help us swing through that controversy. Who wants to take that one? >> Let's put it this way. The SQL folks are from Venus and the data scientists are from Mars. So it means it really comes down to it, sort that type of perception. The fact is, is that, traditionally with analytics, it was very SQL oriented and that basically the quants were kind of off in their corner, where they're using SaaS or where they're using Teradata. It's really a great leveler today, which is that, I mean basic Python it's become arguably one of the most popular programming languages, depending on what month you're looking at, at the title index. And of course, obviously SQL is, as I tell the MongoDB folks, SQL is not going away. You have a large skills base out there. And so basically I see this breaking down to essentially, you're going to have each group that's going to have its own natural preferences for its home turf. And the fact that basically, let's say the Python and scale of folks are using Databricks does not make them any less operational or machine critical than the SQL folks. >> Anybody else want to chime in on that one? >> Yeah, I totally agree with that. Python support in Snowflake is very nascent with all of Snowpark, all of the things outside of SQL, they're very much relying on partners too and make things possible and make data science possible. And it's very early days. I think the bottom line, what we're going to see is each of these camps is going to keep working on doing better at the thing that they don't do today, or they're new to, but they're not going to nail it. They're not going to be best of breed on both sides. So the SQL centric companies and shops are going to do more data science on their database centric platform. That data science driven companies might be doing more BI on their leagues with those vendors and the companies that have highly distributed data, they're going to add fabrics, and maybe offload more of their BI onto those engines, like Dremio and Starburst. >> So I've asked you this before, but I'll ask you Sanjeev. 'Cause Snowflake and Databricks are such great examples 'cause you have the data engineering crowd trying to go into data warehousing and you have the data warehousing guys trying to go into the lake territory. Snowflake has $5 billion in the balance sheet and I've asked you before, I ask you again, doesn't there has to be a semantic layer between these two worlds? Does Snowflake go out and do M&A and maybe buy ad scale or a data mirror? Or is that just sort of a bandaid? What are your thoughts on that Sanjeev? >> I think semantic layer is the metadata. The business metadata is extremely important. At the end of the day, the business folks, they'd rather go to the business metadata than have to figure out, for example, like let's say, I want to update somebody's email address and we have a lot of overhead with data residency laws and all that. I want my platform to give me the business metadata so I can write my business logic without having to worry about which database, which location. So having that semantic layer is extremely important. In fact, now we are taking it to the next level. Now we are saying that it's not just a semantic layer, it's all my KPIs, all my calculations. So how can I make those calculations independent of the compute engine, independent of the BI tool and make them fungible. So more disaggregation of the stack, but it gives us more best of breed products that the customers have to worry about. >> So I want to ask you about the stack, the modern data stack, if you will. And we always talk about injecting machine intelligence, AI into applications, making them more data driven. But when you look at the application development stack, it's separate, the database is tends to be separate from the data and analytics stack. Do those two worlds have to come together in the modern data world? And what does that look like organizationally? >> So organizationally even technically I think it is starting to happen. Microservices architecture was a first attempt to bring the application and the data world together, but they are fundamentally different things. For example, if an application crashes, that's horrible, but Kubernetes will self heal and it'll bring the application back up. But if a database crashes and corrupts your data, we have a huge problem. So that's why they have traditionally been two different stacks. They are starting to come together, especially with data ops, for instance, versioning of the way we write business logic. It used to be, a business logic was highly embedded into our database of choice, but now we are disaggregating that using GitHub, CICD the whole DevOps tool chain. So data is catching up to the way applications are. >> We also have databases, that trans analytical databases that's a little bit of what the story is with MongoDB next week with adding more analytical capabilities. But I think companies that talk about that are always careful to couch it as operational analytics, not the warehouse level workloads. So we're making progress, but I think there's always going to be, or there will long be a separate analytical data platform. >> Until data mesh takes over. (all laughing) Not opening a can of worms. >> Well, but wait, I know it's out of scope here, but wouldn't data mesh say, hey, do take your best of breed to Doug's earlier point. You can't be best of breed at everything, wouldn't data mesh advocate, data lakes do your data lake thing, data warehouse, do your data lake, then you're just a node on the mesh. (Tony laughs) Now you need separate data stores and you need separate teams. >> To my point. >> I think, I mean, put it this way. (laughs) Data mesh itself is a logical view of the world. The data mesh is not necessarily on the lake or on the warehouse. I think for me, the fear there is more in terms of, the silos of governance that could happen and the silo views of the world, how we redefine. And that's why and I want to go back to something what Sanjeev said, which is that it's going to be raising the importance of the semantic layer. Now does Snowflake that opens a couple of Pandora's boxes here, which is one, does Snowflake dare go into that space or do they risk basically alienating basically their partner ecosystem, which is a key part of their whole appeal, which is best of breed. They're kind of the same situation that Informatica was where in the early 2000s, when Informatica briefly flirted with analytic applications and realized that was not a good idea, need to redouble down on their core, which was data integration. The other thing though, that raises the importance of and this is where the best of breed comes in, is the data fabric. My contention is that and whether you use employee data mesh practice or not, if you do employee data mesh, you need data fabric. If you deploy data fabric, you don't necessarily need to practice data mesh. But data fabric at its core and admittedly it's a category that's still very poorly defined and evolving, but at its core, we're talking about a common meta data back plane, something that we used to talk about with master data management, this would be something that would be more what I would say basically, mutable, that would be more evolving, basically using, let's say, machine learning to kind of, so that we don't have to predefine rules or predefine what the world looks like. But so I think in the long run, what this really means is that whichever way we implement on whichever physical platform we implement, we need to all be speaking the same metadata language. And I think at the end of the day, regardless of whether it's a lake, warehouse or a lakehouse, we need common metadata. >> Doug, can I come back to something you pointed out? That those talking about bringing analytic and transaction databases together, you had talked about operationalizing those and the caution there. Educate me on MySQL HeatWave. I was surprised when Oracle put so much effort in that, and you may or may not be familiar with it, but a lot of folks have talked about that. Now it's got nowhere in the market, that no market share, but a lot of we've seen these benchmarks from Oracle. How real is that bringing together those two worlds and eliminating ETL? >> Yeah, I have to defer on that one. That's my colleague, Holger Mueller. He wrote the report on that. He's way deep on it and I'm not going to mock him. >> I wonder if that is something, how real that is or if it's just Oracle marketing, anybody have any thoughts on that? >> I'm pretty familiar with HeatWave. It's essentially Oracle doing what, I mean, there's kind of a parallel with what Google's doing with AlloyDB. It's an operational database that will have some embedded analytics. And it's also something which I expect to start seeing with MongoDB. And I think basically, Doug and Sanjeev were kind of referring to this before about basically kind of like the operational analytics, that are basically embedded within an operational database. The idea here is that the last thing you want to do with an operational database is slow it down. So you're not going to be doing very complex deep learning or anything like that, but you might be doing things like classification, you might be doing some predictives. In other words, we've just concluded a transaction with this customer, but was it less than what we were expecting? What does that mean in terms of, is this customer likely to turn? I think we're going to be seeing a lot of that. And I think that's what a lot of what MySQL HeatWave is all about. Whether Oracle has any presence in the market now it's still a pretty new announcement, but the other thing that kind of goes against Oracle, (laughs) that they had to battle against is that even though they own MySQL and run the open source project, everybody else, in terms of the actual commercial implementation it's associated with everybody else. And the popular perception has been that MySQL has been basically kind of like a sidelight for Oracle. And so it's on Oracles shoulders to prove that they're damn serious about it. >> There's no coincidence that MariaDB was launched the day that Oracle acquired Sun. Sanjeev, I wonder if we could come back to a topic that we discussed earlier, which is this notion of consumption, obviously Wall Street's very concerned about it. Snowflake dropped prices last week. I've always felt like, hey, the consumption model is the right model. I can dial it down in when I need to, of course, the street freaks out. What are your thoughts on just pricing, the consumption model? What's the right model for companies, for customers? >> Consumption model is here to stay. What I would like to see, and I think is an ideal situation and actually plays into the lakehouse concept is that, I have my data in some open format, maybe it's Parquet or CSV or JSON, Avro, and I can bring whatever engine is the best engine for my workloads, bring it on, pay for consumption, and then shut it down. And by the way, that could be Cloudera. We don't talk about Cloudera very much, but it could be one business unit wants to use Athena. Another business unit wants to use some other Trino let's say or Dremio. So every business unit is working on the same data set, see that's critical, but that data set is maybe in their VPC and they bring any compute engine, you pay for the use, shut it down. That then you're getting value and you're only paying for consumption. It's not like, I left a cluster running by mistake, so there have to be guardrails. The reason FinOps is so big is because it's very easy for me to run a Cartesian joint in the cloud and get a $10,000 bill. >> This looks like it's been a sort of a victim of its own success in some ways, they made it so easy to spin up single note instances, multi note instances. And back in the day when compute was scarce and costly, those database engines optimized every last bit so they could get as much workload as possible out of every instance. Today, it's really easy to spin up a new node, a new multi node cluster. So that freedom has meant many more nodes that aren't necessarily getting that utilization. So Snowflake has been doing a lot to add reporting, monitoring, dashboards around the utilization of all the nodes and multi node instances that have spun up. And meanwhile, we're seeing some of the traditional on-prem databases that are moving into the cloud, trying to offer that freedom. And I think they're going to have that same discovery that the cost surprises are going to follow as they make it easy to spin up new instances. >> Yeah, a lot of money went into this market over the last decade, separating compute from storage, moving to the cloud. I'm glad you mentioned Cloudera Sanjeev, 'cause they got it all started, the kind of big data movement. We don't talk about them that much. Sometimes I wonder if it's because when they merged Hortonworks and Cloudera, they dead ended both platforms, but then they did invest in a more modern platform. But what's the future of Cloudera? What are you seeing out there? >> Cloudera has a good product. I have to say the problem in our space is that there're way too many companies, there's way too much noise. We are expecting the end users to parse it out or we expecting analyst firms to boil it down. So I think marketing becomes a big problem. As far as technology is concerned, I think Cloudera did turn their selves around and Tony, I know you, you talked to them quite frequently. I think they have quite a comprehensive offering for a long time actually. They've created Kudu, so they got operational, they have Hadoop, they have an operational data warehouse, they're migrated to the cloud. They are in hybrid multi-cloud environment. Lot of cloud data warehouses are not hybrid. They're only in the cloud. >> Right. I think what Cloudera has done the most successful has been in the transition to the cloud and the fact that they're giving their customers more OnRamps to it, more hybrid OnRamps. So I give them a lot of credit there. They're also have been trying to position themselves as being the most price friendly in terms of that we will put more guardrails and governors on it. I mean, part of that could be spin. But on the other hand, they don't have the same vested interest in compute cycles as say, AWS would have with EMR. That being said, yes, Cloudera does it, I think its most powerful appeal so of that, it almost sounds in a way, I don't want to cast them as a legacy system. But the fact is they do have a huge landed legacy on-prem and still significant potential to land and expand that to the cloud. That being said, even though Cloudera is multifunction, I think it certainly has its strengths and weaknesses. And the fact this is that yes, Cloudera has an operational database or an operational data store with a kind of like the outgrowth of age base, but Cloudera is still based, primarily known for the deep analytics, the operational database nobody's going to buy Cloudera or Cloudera data platform strictly for the operational database. They may use it as an add-on, just in the same way that a lot of customers have used let's say Teradata basically to do some machine learning or let's say, Snowflake to parse through JSON. Again, it's not an indictment or anything like that, but the fact is obviously they do have their strengths and their weaknesses. I think their greatest opportunity is with their existing base because that base has a lot invested and vested. And the fact is they do have a hybrid path that a lot of the others lack. >> And of course being on the quarterly shock clock was not a good place to be under the microscope for Cloudera and now they at least can refactor the business accordingly. I'm glad you mentioned hybrid too. We saw Snowflake last month, did a deal with Dell whereby non-native Snowflake data could access on-prem object store from Dell. They announced a similar thing with pure storage. What do you guys make of that? Is that just... How significant will that be? Will customers actually do that? I think they're using either materialized views or extended tables. >> There are data rated and residency requirements. There are desires to have these platforms in your own data center. And finally they capitulated, I mean, Frank Klutman is famous for saying to be very focused and earlier, not many months ago, they called the going on-prem as a distraction, but clearly there's enough demand and certainly government contracts any company that has data residency requirements, it's a real need. So they finally addressed it. >> Yeah, I'll bet dollars to donuts, there was an EBC session and some big customer said, if you don't do this, we ain't doing business with you. And that was like, okay, we'll do it. >> So Dave, I have to say, earlier on you had brought this point, how Frank Klutman was poo-pooing data science workloads. On your show, about a year or so ago, he said, we are never going to on-prem. He burnt that bridge. (Tony laughs) That was on your show. >> I remember exactly the statement because it was interesting. He said, we're never going to do the halfway house. And I think what he meant is we're not going to bring the Snowflake architecture to run on-prem because it defeats the elasticity of the cloud. So this was kind of a capitulation in a way. But I think it still preserves his original intent sort of, I don't know. >> The point here is that every vendor will poo-poo whatever they don't have until they do have it. >> Yes. >> And then it'd be like, oh, we are all in, we've always been doing this. We have always supported this and now we are doing it better than others. >> Look, it was the same type of shock wave that we felt basically when AWS at the last moment at one of their reinvents, oh, by the way, we're going to introduce outposts. And the analyst group is typically pre briefed about a week or two ahead under NDA and that was not part of it. And when they dropped, they just casually dropped that in the analyst session. It's like, you could have heard the sound of lots of analysts changing their diapers at that point. >> (laughs) I remember that. And a props to Andy Jassy who once, many times actually told us, never say never when it comes to AWS. So guys, I know we got to run. We got some hard stops. Maybe you could each give us your final thoughts, Doug start us off and then-- >> Sure. Well, we've got the Snowflake Summit coming up. I'll be looking for customers that are really doing data science, that are really employing Python through Snowflake, through Snowpark. And then a couple weeks later, we've got Databricks with their Data and AI Summit in San Francisco. I'll be looking for customers that are really doing considerable BI workloads. Last year I did a market overview of this analytical data platform space, 14 vendors, eight of them claim to support lakehouse, both sides of the camp, Databricks customer had 32, their top customer that they could site was unnamed. It had 32 concurrent users doing 15,000 queries per hour. That's good but it's not up to the most demanding BI SQL workloads. And they acknowledged that and said, they need to keep working that. Snowflake asked for their biggest data science customer, they cited Kabura, 400 terabytes, 8,500 users, 400,000 data engineering jobs per day. I took the data engineering job to be probably SQL centric, ETL style transformation work. So I want to see the real use of the Python, how much Snowpark has grown as a way to support data science. >> Great. Tony. >> Actually of all things. And certainly, I'll also be looking for similar things in what Doug is saying, but I think sort of like, kind of out of left field, I'm interested to see what MongoDB is going to start to say about operational analytics, 'cause I mean, they're into this conquer the world strategy. We can be all things to all people. Okay, if that's the case, what's going to be a case with basically, putting in some inline analytics, what are you going to be doing with your query engine? So that's actually kind of an interesting thing we're looking for next week. >> Great. Sanjeev. >> So I'll be at MongoDB world, Snowflake and Databricks and very interested in seeing, but since Tony brought up MongoDB, I see that even the databases are shifting tremendously. They are addressing both the hashtag use case online, transactional and analytical. I'm also seeing that these databases started in, let's say in case of MySQL HeatWave, as relational or in MongoDB as document, but now they've added graph, they've added time series, they've added geospatial and they just keep adding more and more data structures and really making these databases multifunctional. So very interesting. >> It gets back to our discussion of best of breed, versus all in one. And it's likely Mongo's path or part of their strategy of course, is through developers. They're very developer focused. So we'll be looking for that. And guys, I'll be there as well. I'm hoping that we maybe have some extra time on theCUBE, so please stop by and we can maybe chat a little bit. Guys as always, fantastic. Thank you so much, Doug, Tony, Sanjeev, and let's do this again. >> It's been a pleasure. >> All right and thank you for watching. This is Dave Vellante for theCUBE and the excellent analyst. We'll see you next time. (upbeat music)

Published Date : Jun 2 2022

SUMMARY :

And Doug Henschen is the vice president Thank you. Doug let's start off with you And at the same time, me a lot of that material. And of course, at the and then we realized all the and Tony have brought to light. So I'm interested, the And in the cloud, So Sanjeev, is this all hype? But the problem is that we I mean, I look at the space, and offload some of the So different focus, at the end of the day, and warehouses on one conjoined platform. of the sort of big data movement most of the contributions made decisions. Whereas he kind of poo-pooed the lakehouse and the data scientists are from Mars. and the companies that have in the balance sheet that the customers have to worry about. the modern data stack, if you will. and the data world together, the story is with MongoDB Until data mesh takes over. and you need separate teams. that raises the importance of and the caution there. Yeah, I have to defer on that one. The idea here is that the of course, the street freaks out. and actually plays into the And back in the day when the kind of big data movement. We are expecting the end And the fact is they do have a hybrid path refactor the business accordingly. saying to be very focused And that was like, okay, we'll do it. So Dave, I have to say, the Snowflake architecture to run on-prem The point here is that and now we are doing that in the analyst session. And a props to Andy Jassy and said, they need to keep working that. Great. Okay, if that's the case, Great. I see that even the databases I'm hoping that we maybe have and the excellent analyst.

ENTITIES

Entity	Category	Confidence
Doug	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
Tony	PERSON	0.99+
Uber	ORGANIZATION	0.99+
Frank	PERSON	0.99+
Frank Klutman	PERSON	0.99+
Tony Baers	PERSON	0.99+
Mars	LOCATION	0.99+
Doug Henschen	PERSON	0.99+
2020	DATE	0.99+
AWS	ORGANIZATION	0.99+
Venus	LOCATION	0.99+
Oracle	ORGANIZATION	0.99+
2012	DATE	0.99+
Databricks	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Holger Mueller	PERSON	0.99+
Andy Jassy	PERSON	0.99+
last year	DATE	0.99+
$5 billion	QUANTITY	0.99+
$10,000	QUANTITY	0.99+
14 vendors	QUANTITY	0.99+
Last year	DATE	0.99+
last week	DATE	0.99+
San Francisco	LOCATION	0.99+
SanjMo	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
8,500 users	QUANTITY	0.99+
Sanjeev	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
32 concurrent users	QUANTITY	0.99+
two	QUANTITY	0.99+
Constellation Research	ORGANIZATION	0.99+
Mongo	ORGANIZATION	0.99+
Sanjeev Mohan	PERSON	0.99+
Ahana	ORGANIZATION	0.99+
DaaS	ORGANIZATION	0.99+
EMR	ORGANIZATION	0.99+
32	QUANTITY	0.99+
Atlas	ORGANIZATION	0.99+
Delta	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
Athena	ORGANIZATION	0.99+
next week	DATE	0.99+

Ram Venkatesh, Cloudera | AWS re:Invent 2020

>>from >>around the globe. It's the Cube with digital coverage of AWS reinvent 2020 sponsored by Intel, AWS and our community partners. >>Everyone welcome back to the cubes Coverage of AWS reinvent 2020 virtual. This is the Cube virtual. I'm John for your host this year. We're not in person. We're doing remote interviews because of the pandemic. The whole events virtual over three weeks for this week would be having a lot of coverage in and out of what's going on with the news. All that stuff here happening on the Cube Our next guest is a featured segment. Brown Venkatesh, VP of Engineering at Cloudera. Welcome back to the Cube Cube Alumni. Last time you were on was 2018 when we had physical events. Great to see you, >>like good to be here. Thank you. >>S O. You know, Cloudera obviously modernized up with Horton works. That comedy has been for a while, always pioneering this abstraction layer originally with a dupe. Now, with data, all those right calls were made. Data is hot is a big part of reinvent. That's a big part of the theme, you know, machine learning ai ai edge edge edge data lakes on steroids, higher level services in the cloud. This is the focus of reinvents. The big conversations Give us an update on cloud eras. Data platform. What's that? What's new? >>Absolutely. You are really speaking of languages. Read with the whole, uh, data lake architecture that you alluded to. It's uploaded. This mission has always been about, you know, we want to manage how the world's data that what this means for our customers is being ableto aggregate data from lots of different sources into central places that we call data lakes on. Then apply lots of different types of passing to it to direct business value that would cdp with Florida data platform. What we have essentially done is take those same three core tenants around data legs multifunctional takes on data stewardship of management to add on a bunch off cloud native capabilities to it. So this was fundamentally I'm talking about things like disaggregated storage and compute by being able to now not only take advantage of H d efs, but also had a pretty deep, fundamental level club storage. But this is the form factor that's really, really good for our customers. Toe or to operate that from a TCO perspective, if you're going to manage hundreds of terabytes of data like like a lot of a lot of customers do it. The second key piece that we've done with CDP has to do with us embracing containers and communities in a big way on primer heritages around which machines and clusters and things of that nature. But in the cloud context, especially in the context, off managed community services like Amazon CKs, this Lexus spin apart traditional workloads, Sequels, park machine learning and so on. In the context of these Cuban exiles containerized environments which lets customers spin these up in seconds. They're supposed to, you know, tens of minutes on as they're passing, needs grow and shrink. They can actually scale much, much faster up and down to, you know, to make sure that they have the right cost effective footprint for their compute e >>go ahead third piece. >>But the turkey piece of all of this right is to say, along with like cloud native orchestration and cloud NATO storage is that we've embraced this notion of making sure that you actually have a robust data discovery story around it. so increasingly the data sets that you create on top off a platform like CDP. There themselves have value in other use cases that you want to make sure that these data sets are properly replicated. They're probably secure the public government. So you can go and analyze where the data set came from. Capabilities of security and provenance are increasingly more important to our customers. So with CDP, we have a really good story around that data stewardship aspect, which is increasingly important as you as you get into the cloud. And you have these sophisticated sharing scenarios. The >>you know, Clotaire has always had and Horton works. Both companies had strong technical chops. It's well document. Certainly the queues been toe all the events and covered both companies since the inception of 10 years ago. A big data. But now we're in cloud. Big data, fast data, little data, all data. This is what the cloud brings. So I want to get your thoughts on the number one focus of problem solving around cloud. I gotta migrate. Or do I move to the cloud immediately and be born there? Now we know the hyper scale is born in the cloud companies like the Dropbox in the world. They were born in the cloud and all the benefits and goodness came with that. But I'm gonna be pivoting. I'm a company at a co vid with a growth strategy. Lift and shift. Okay, that was It's over. Now that's the low hanging fruit that's use cases kind of done. Been there, done that. Is it migration or born in the cloud? Take us through your thoughts on what does the company do right now? >>E thinks it's a really good question. If you think off, you know where our customers are in their own data journey, right? So increasingly. You know, a few years ago, I would say it was about operating infrastructure. That's where their head was at, right? Increasingly, I think for them it's about deriving value from the data assets that they already have on. This typically means in a combining data from different sources the structure data, some restructure data, transactional data, non transactional, data event oriented data messaging data. They wanna bring all of that and analyze that to make sure that they can actually identify ways toe monetize it in ways that they had not thought about when they actually stored the data originally, right? So I think it's this drive towards increasing monetization of data assets that's driving the new use cases on the platform. Traditionally, it used to be about, you know, sequel analysts who are, if you are like a data scientist using a party's park. So it was sort of this one function that you would focus on with the data. But increasingly, we're seeing these air about, you know, these air collaborative use cases where you wanna have a little bit of sequel, a little bit of machine learning, a little bit off, you know, potentially real time streaming or even things like Apache fling that you're gonna use to actually analyze the data eso when this kind of an environment. But we see that the data that's being generated on Prem is extremely relevant to the use case, but the speed at which they want to deploy the use case. They really want to make sure that they can take advantage of the clouds, agility and infinite capacity to go do that. So it's it's really the answer is it's complicated. It's not so much about you know I'm gonna move my data platform that I used to run the old way from here to there. But it's about I got this use case and I got to stand this up in six weeks, right in the middle of the pandemic on how do I go do that on the data that has to come from my existing line of business systems. I'm not gonna move those over, but I want to make sure that I can analyze the data from their in some cohesive Does that make sense? >>Totally makes sense. And I think just to kind of bring that back for the folks watching. And I remember when CDP was launching the thes data platforms, it really was to replace the data warehouse is the old antiquated way of doing things. But it was interesting. It wasn't just about competing at that old category. It was a new category. So, yeah, you had to have some tooling some sequel, you know, to wrangle data and have some prefabricated, you know, data fenced out somewhere in some warehouse. But the value was the new use cases of data where you never know. You don't know where it's going to come until it comes right, because if you make it addressable, that was the idea of the data platform and data Lakes and then having higher level services. So s so to me. That's, I think, one distinction kind of new category coexisting and disrupting an old category data warehousing. Always bought into that. You know, there's some technical things spark Do all these elements on mechanisms underneath. That's just evolution. But income in incomes cloud on. I want to get your thoughts on this because one of the things that's coming out of all my interviews is speed, speed, speed, deploying high, high, large scale at very large speed. This is the modern application thinking okay to make that work, you gotta have the data fabric underneath. This has always been kind of the dream scenario, So it's kind of playing out. So one Do you believe in that? And to what is the relationship between Cloudera and AWS? Because I think that kind of interestingly points to this one piece. >>Absolutely. So I think that yeah, from my perspective, this is what we call the shared data experience that's central to see PP like the idea is that, you know, data that is generated by the business in one use case is relevant and valid in another use case that is central to how we see companies leveraging data or the second order monetization that they're after, Right? So I think this is where getting out off a traditional data warehouse like data side of context, being able to analyze all of the data that you have, I think is really, really important for many of our customers. For example, many of them increasingly hold what they call this like data hackathons right where they're looking at can be answered. This new question from all the data that we have that is, that is a type of use case that's really hard to enable unless you have a very cohesive, very homogeneous view off all of your data. When it comes to the cloud partners, right, Increasingly, we see that the cloud native services, especially for the core storage, compute and security services are extremely robust that they give us, you know, the scale and that's really truly unparalled in terms of how much data we can address, how quickly we can actually get access to compute on demand when we need it. And we can do all of this with, like, a very, very mature security and governance fabric that you can fit into. So we see that, you know, technologies like s three, for example, have come a long way on along the journey with Amazon on this over the last 78 years. But we both learned how to operate our work clothes. When you're running a terabytes scale, right, you really have to pay attention to matters like scale out and consistency and parallelism and all of these things. These matters significantly right? And it's taken a certain maturity curve that you have to go through to get there. The last part of that is that because the TCO is so optimized with the customer to operate this without any ops on their side, they could just start consuming data, even if it's a terabyte of data. So this means that now we have to have the smarts in the processing engines to think about things like cashing, for example very, very differently because the way you cash data that Zinn hedge defense is very different from how you would do that in the context of his three are similarly, the way you think about consistency and metadata is very, very different at that layer. But we made sure that we can abstract these differences out at the platform layer so that as an as it is an application consumer, you really get the same experience, whether you're running these analytics on clam or whether you're running them in the cloud. And that's really central to how I see this space evolving is that we want to meet the customer where they are, rather than forcing them to change the way they work because off the platform that they're simple. >>So could you take them in to explain some of the integrations with AWS and some customer examples? Because, um, you know, first of all, cost is a big concern on everyone's mind because, you know, it's still lower costs and higher value with the cloud anyway. But it could get away from you. So you know, you're constantly petabytes of scale. There's a lot of data moving around. That's one thing to integration with higher level services. Can you give where does explain how Claudia integration with Amazon? What's the relation of customer wants to know. Hey, you guys, you know, partnering, explain the partnership. And what does it mean for me? >>Absolutely. So the way we look at the partnership hit that one person and ghetto. It's really a four layer cake because the lowest layer is the core infrastructure services. We talked about storage and computing on security, and I am so on and so forth. So that layer is a very robust integration that goes back a few years. The next layer up from that has to do with increasingly, you know, as our customers use analytic experiences from Florida on, they want to combine that with data that's actually in the AWS compute experiences like the red Ship, for example. That's what the analytics layer uploaded the data warehouse offering and how that interrupts would be other services in Amazon that could be relevant. This is common file formats that open source well form it really help us in this context to make sure that they have a very strong level of interest at the analytics there. The third layer up from that has to do with consumption. Like if you're gonna bring an analyst on board. You want to make sure that all of their sequel, like analyst experiences, notebooks, things of that nature that's really strong. And club out of the third layer on the highest layer is really around. Data sharing. That's as aws new and technologies like that become more prevalent. Now. Customers want to make sure that they can have these data states that they have in the different clouds, actually in a robbery. So we provide ways for them, toe browse and search data, regardless of whether that data is on AWS or on traffic. And so that's how the fourth layer in the stack, the vertical slice running through all of these, that we have a really strong business relationship with them both on the on the on the commercial market side as well as in AWS marketplace. Right? So we can actually by having cdp be a part of it of the US marketplace. This means that if you have an enterprise agreement with with Amazon, you can actually pay for CDP toe the credit sexuality purchased. This is a very, very tight relationship that's designed again for these large scale speeds and feeds. Can the customer >>so just to get this right. So if I love the four layer cake icings the success of CDP love that birthday candles can be on top to when you're successful. But you're saying that you're going to mark with Amazon two ways marketplace listing and then also jointly with their enterprise field programs. That right? You say because they have this program you can bundle into the blanket pos or Pio processes That right can explain that again. >>S so if you think this'll states, if you're talking about are significant. So we want to make sure that, you know, we're really aligned with them in terms off our cloud migration strategy in terms of how the customer actually execute to what is a fairly you know, it's a complex deployment to deploy a large multiple functions did and existed takes time, right, So we're gonna make sure that we navigate this together jointly with the U. S. To make sure that from a best practices standpoint, for example, were very well aligned from a cost standpoint, you know what we're telling the customer architecturally is very rather nine. That's that's where I think really the heart of the engineering relationship between the two companies without. >>So if you want Cloudera on Amazon, you just go in. You can click to buy. Or if you got to deal with Amazon in terms of global marketplace deal, which they have been rolling out, I could buy there too, Right? All right, well, run. Thanks for the update and insight. Um, love the four layer cake love gets. See the modernization of the data platform from Cloudera. And congratulations on all the hard work you guys been doing with AWS. >>Thank you so much. Appreciate. >>Okay, good to see you. Okay, I'm John for your hearing. The Cube for Cube virtual for eight of us. Reinvent 2020 virtual. Thanks for watching.

Published Date : Dec 8 2020

SUMMARY :

It's the Cube with digital coverage of AWS All that stuff here happening on the Cube Our next like good to be here. That's a big part of the theme, you know, machine learning ai ai edge you know, to make sure that they have the right cost effective footprint for their compute e so increasingly the data sets that you create on top off a platform you know, Clotaire has always had and Horton works. on how do I go do that on the data that has to come from my existing line of business systems. But the value was the new use cases of data where you never know. So we see that, you know, technologies like s three, So you know, you're constantly petabytes of scale. The next layer up from that has to do with increasingly, you know, as our customers use analytic So if I love the four layer cake icings the success of CDP love So we want to make sure that, you know, we're really aligned with them And congratulations on all the hard work you guys been Thank you so much. Okay, good to see you.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Ram Venkatesh	PERSON	0.99+
2018	DATE	0.99+
Dropbox	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
John	PERSON	0.99+
Florida	LOCATION	0.99+
Horton	PERSON	0.99+
Brown Venkatesh	PERSON	0.99+
Both companies	QUANTITY	0.99+
Lexus	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
two companies	QUANTITY	0.99+
eight	QUANTITY	0.99+
tens of minutes	QUANTITY	0.99+
one thing	QUANTITY	0.99+
hundreds of terabytes	QUANTITY	0.98+
this week	DATE	0.98+
three	QUANTITY	0.98+
third layer	QUANTITY	0.98+
aws	ORGANIZATION	0.98+
two ways	QUANTITY	0.98+
this year	DATE	0.98+
US	LOCATION	0.98+
Intel	ORGANIZATION	0.97+
over three weeks	QUANTITY	0.97+
10 years ago	DATE	0.97+
third piece	QUANTITY	0.97+
fourth layer	QUANTITY	0.97+
both	QUANTITY	0.97+
one piece	QUANTITY	0.96+
Clotaire	ORGANIZATION	0.96+
pandemic	EVENT	0.94+
third laye	QUANTITY	0.94+
second key piece	QUANTITY	0.93+
Cube virtual	COMMERCIAL_ITEM	0.92+
TCO	ORGANIZATION	0.91+
second order	QUANTITY	0.9+
four layer	QUANTITY	0.89+
U. S.	LOCATION	0.89+
six weeks	QUANTITY	0.89+
one	QUANTITY	0.88+
Zinn	ORGANIZATION	0.86+
few years ago	DATE	0.86+
last 78 years	DATE	0.85+
one person	QUANTITY	0.84+
terabyte	QUANTITY	0.83+
Cube for	COMMERCIAL_ITEM	0.83+
one function	QUANTITY	0.81+
Apache	ORGANIZATION	0.79+
Cube	COMMERCIAL_ITEM	0.79+
2020	TITLE	0.79+
one distinction	QUANTITY	0.77+
CDP	ORGANIZATION	0.74+
three core tenants	QUANTITY	0.72+
Claudia	PERSON	0.72+
turkey	OTHER	0.71+
reinvent 2020	EVENT	0.67+
S O.	PERSON	0.64+
nine	QUANTITY	0.63+
data	QUANTITY	0.6+
NATO	ORGANIZATION	0.59+
clam	ORGANIZATION	0.59+
VP	PERSON	0.53+

HPE Discover 2020 Analysis | HPE Discover 2020

>>from around the globe. It's the Cube covering HP. Discover Virtual experience Brought to you by HP. >>Welcome back to the Cube's coverage of HP Discover. 2020. The virtual experience. The Cube. The Cube has been virtualized. My name is Dave Vellante. I'm here with Stuart Minuteman and our good friend Tim Crawford is here. He's a strategic advisor to see Io's with boa. Tim, Great to see you. Stuart. Thanks for coming on. >>Great to see you as well, Dave. >>Yes. So let's unpack. What's going on in that Discover Antonio's, He notes, Maybe talk a little bit about the prospects for HP of coming forward in this decade. You know, last decade was not a great one for HP, HP. I mean, there was a lot of turmoil. There was a botched acquisitions. There was breaking up the company and spin merges and a lot of distractions. And so now that companies really and you hear this from Antonio kind of positioning for innovation for the next decade. So So I think this is probably a lot of excitement inside the company, but I want to touch on a couple of points and then you get your guys reaction, I guess, you know, to start off. Obviously, Antonio's talking about Cove in the role that they played in that whole, you know, pandemic and the transition toe the the isolation economy. But so let me start with you, Tim. I mean, what is the sort of posture amongst cios that you talk to? How strategic is HB H B two? The folks that you talk to in your community? >>Well, I think if you look at how CIOs are thinking, especially as we head into covert it into Corona virus and kind of mapping through that, that price, um, it really came down to Can they get their hands on technology? Can they get people back to work working from home? Can they do it in a secure fashion? Um, keeping people productive. I mean, there was a lot of block and tackling, and even to this day, there's still a fair amount of that was taking place. Um, we really haven't seen the fallout from the cybersecurity impact of expanding our foot print. Um, quite. But we'll see that, probably in the coming months. There are some initial inklings there when it comes to HP specifically I think it comes back to just making sure that they had the product on hand, that they understood that customers are going through dramatic change. And so all bets are off. You have to kind of step back and say, Okay, those plans that I had 60 9100 and 20 days ago those strategies that I may have already started down the path with those are up for grabs. I need to step back from those and figure out What do I do now? And I think each company, HP included, needs to think about how do they start to meld themselves, to be able to address those changing customer needs? And I think that's that's where this really kind of becomes the rubber hits the road is is HP capable of doing that? And are they making the right changes? And quite frankly, that starts with empathy. And I think we've heard pretty clearly from Antonio that he is sympathetic to the plight of their customers and the world >>on the whole. >>Yeah, and I think culturally 10 minutes do I mean I think you know HP is kind of getting back to some of its roots, and Tony has been there for a long time. I think people I think is very well liked. Andi, I think, ease of use, and I'm sure he's tough. But he's also a very fair individual, and he's got a vision and he's focused. And so, you know, I think again, as they said, looking forward to this decade, I think could be one that is, you know, one of innovation. Although, you know, look, you look at the stock price, you know, it's kind of piqued in November 19. It's obviously down like many stocks, so there's a lot of work to do there, and it's too. We're certainly hearing from HP. This notion of everything is a service that we've talked about green like a lot. What's your sense of their prospects going forward in this, you know, New Era? >>Yeah, I mean, Dave, one of the biggest attacks we've heard about H E in the last couple of years, you know the line Michael Dell would use is you're not going to grow by, say, abstraction. But as a platform company, HP is much more open. From what I've seen in the HP that I remember from, you know, 5 to 10 years ago. So you look at their partner ecosystem. It's robust. So, you know, years ago, it seemed to be if it didn't come out of HP Labs, it wasn't a product, you know. That was the services arm all wanted to sell HP here. Now, in this software defined world working in a cloud environment, they're much more open to finding that innovation and enabling it. So, you know, we talk about Green Lake Day. Three lakes got about 1000 customers right now, and a big piece of that is a partner. Port Police, whether it's VM Ware Amazon Annex, were H B's full stack themselves. They have optionality in there, and that's what we hear from from users is that they want flexibility they don't want. You know, you look at the cloud providers, it's not, you know, here's a solution. You look at Amazon. There's dozens of databases that you can use from Amazon or, if you use on top of Amazon, so H p e. You know, not a public cloud provider, but looking more like that cloud experience. They've done so many acquisitions over the years. Many of them were troubled. They got rid of some of the pieces that they might have over paid for. But you look at something like CTP them in this multi cloud world in the networking space, they've got a really cool, open source company, the company behind spiffy, inspire. And, you know, companies that are looking at containers and kubernetes, you know, really respond to say, Hey, these are projects that were interesting Oh, who's the company that that's driving that it's HP so more open, more of a partner ecosystem definitely feels that there's a lot there that I respect and like that hp >>well, I mean, the intent of splitting the company was so that HP could be more focused but focused on innovation was the intent was to be the growth company. It hasn't fully played out yet. But Tim, when you think about the conversations that CIOs are having with with HPI today versus what they were having with hpe HP, the the conglomerate of that the Comprising e ds and PCs, I guess I don't know, in a way, more more Dell like so Certainly Michael Dell's having strategic conversations, CIOs. But you got to believe that the the conversations are more focused today. Is that a good thing or a jury's still out? >>No, it absolutely is a good thing. And I think one of the things that you have to look at is we're getting back to brass tax. We're getting back to that focus around business objectives. So no longer is that hey, who has the coolest tech? And how can we implement that tax? Kind of looking from a tech business? Ah, spectrum, you're now focused squarely is a C i. O. You have to be squarely focused on what are the business objectives that you are teamed up for, and if you're not, you're on a very short leash and that doesn't end well. And I think the great thing about the split of HP HP e split and I think you almost have to kind of step back for a second. Let's talk about leadership because leadership plays a very significant role, especially for CIOs that are thinking about long term decisions and strategic partners. I don't think that HP necessarily had the right leadership in place to carry them into that strategic world. I think Antonio really makes a change there. I mean, they made some really poor decisions. Post split. Um, that really didn't bode well for HP. Um, and frankly, I talked a bit about that I know wasn't really popular within HP, but quite frankly, they needed to hear it. And I think that actually has been heard. And I think they are listening to their customers. And one of the big changes is they're getting back into the software business. And when you talk about strategic initiatives, you have to get beyond just the hardware and start moving up the proverbial stack, getting closer to those business initiatives. And that is software. >>Yeah, well, Antonio talked about sort of the insights. I mean, something I've said a lot about borrowed from the very Meeker conversations that that data is plentiful. Something I've always said. Insights aren't. And so you're right. You've seen a couple of acquisitions, you know, Matt bahr They picked up, I think pretty inexpensively. Kind of interesting cause, remember, HP hp had an investment in Horton works, which, of course, is now Cloudera and Blue Data. Ah Kumar Conte's company, you know, kind of focusing on maybe automating data, you know, they talked about Ed centric, cloud enabled, data driven. Nobody's gonna argue with those things. But you're right, Tim. I mean, you're talking more software than kind of jettisons the software business and now sort of have to rebuild it. And then, of course, do this cloud. What do you make of HP ease Cloud play? >>Yeah, well, I >>mean, >>Dave, you the pieces. You were just talking about math bar and blue data, where HP connects it together is, you know, ai ops. So you know, where are we going with infrastructure? There needs to be a lot more automation. We heard a great quote. I love from automation anywhere. Dave was, if you talk about digital transformation without automation, it's hallucination. So, you know, HP baking that into what they're doing. So, you know, I fully agree with Tim software software software, you know, is where the innovation is. So it can't just be the infrastructure. How do you have eyes and books into the applications? How are you helping customers build those new pieces? And what's the other software that you build around that? So, you know, absolutely. It's an interesting piece. And you know, HP has got a lot of interesting pieces. You know, you talk about the edge. Aruba is a great asset for that kind of environment and from a partnership, that is a damn point. Dave. They have. John Chambers was in the keynote. John, of course. Long time partners. He's with Cisco for many years Intel. Cisco started eating with HP on the server business, but now he's also the chairman of pensando. HP is an investor in pensando general availability this month of that solution, and that's going to really help build out that next generation edge. So, you know, a chip set that HP E can offer similar to what we see how Amazon builds outpost s. So that is a solution both for the enterprise and beyond. Is as a B >>yeah course. Do. Of course, it's kind of, but about three com toe. Add more fuel to that tension. Go ahead, Tim. >>Well, I was going to pick apart some of those pieces because you know, at edge is not an edge is not an edge. And I think it's important to highlight some of the advantages that HP is bringing to the table where Pensando comes in, where Aruba comes in and also we're really comes in. I think there are a number of these components that I want to make sure that we don't necessarily gloss over that are really key for HP in terms of the future. And that is when you step back and you look at how customers are gonna have to consume services, how they're going to have to engage with both the edge and the cloud and everything in between. HP has a great portfolio of hardware. What they haven't necessarily had was the glue, that connective tissue to bring all of that together. And I think that's where things like Green Lake and Green Lake Central really gonna play a role. And even their, um, newer cloud services are going to play a role. And unlike outposts and unlike some of the other private cloud services that are on the market today, they're looking to extend a cloud like experience all the way to the edge and that continuity creating that simplicity is going to be key for enterprises. And I think that's something that shouldn't be understated. It's gonna be really important because when I look at in the conversations I'm having when we're looking at edge to cloud and everything in between. Oh my gosh, that's really complicated. And you have to figure out how to simplify that. And the only way you're going to do that is if you take it up a layer and start thinking about management tools. You start thinking about autumn, and as companies start to take data from the edge, they start analyzing it at the edge and intermediate points on the way to cloud. It's going to be even more important to bring continuity across this entire spectrum. And so that's one of the things that I'm really excited about that I'm hearing from Antonio's keynote and others. Ah, here at HP Discover. >>Yeah, >>well, let's let's stay on that stupid. Let's stay on that for a second. >>Yeah, I wanted to see oh interested him because, you know, it's funny. You think back. You know, HP at one point in time was a leader in, you know, management solutions. You know, HP one view, you know, in the early days, it was really well respected. I think what I'm hearing from you, I think about outpost is Amazon hasn't really put management for the edge. All they're doing is extending the cloud piece and putting a piece out of the edge. It feels like we need a management solution that built from the ground up for this kind of solution. And do I hear you right? You believe that to be as some of those pieces today? >>Well, let's compare and contrast briefly on that. I think Amazon and the way Amazon is well, is Google and Microsoft, for that matter. The way that they are encompassing the edge into their portfolio is interesting, but it's an extension of their core business, their core public cloud services business. Most of the enterprise footprint is not in public cloud. It's at the other end of that spectrum, and so being able to take not just what's happening at the edge. But what about in your corporate data center in your corporate data center? You still have to manage that, and that doesn't fall under the purview of Cloud. And so that's why I'm looking at HP is a way to create that connective tissue between what companies are doing within the corporate data center today, what they're doing at the edge as well as what they're doing, maybe in private cloud and an extension public cloud. But let's also remember something else. Most of these enterprises, they're also in a multi cloud environment, so they're touching into different public cloud providers for different services. And so now you talk about how do I manage this across the spectrum of edge to cloud. But then, across different public cloud providers, things get really complicated really fast. And I think the hints of what I'm seeing in software and the new software branding give me a moment of pause to say, Wait a second. Is HP really gonna head down that path? And if so, that's great because it is of high demand in the enterprise. >>Well, let's talk about that some more because I think this really is the big opportunity and we're potentially innovation is. So my question is how much of Green Lake and Green Lake services are really designed for sort of on Prem to make that edge to on Prem? No, I want to ask about Cloud, how much of that is actually delivering Cloud Native Services on AWS on Google on Azure and Ali Cloud etcetera versus kind of creating a cloud like experience for on Prem in it and eventually the edge. I'm not clear on that. You guys have insight on how much effort is going into that cloud. Native components in the public cloud. >>Well, I would say that the first thing is you have to go back to the applications to truly get that cloud native experience. I think HP is putting the components together to a prize. This to be able to capitalize on that cloud like experience with cloud native APS. But the vast majority of enterprise app they're not cloud native. And so I think the way that I'm interpreting Green Lake and I think there are a lot of questions Greenland and how it's consumed by enterprises there. There was some initial questions around the branding when it first came out. Um, and so you know it's not perfect. I think HP definitely have some work to do to clarify what it is and what it isn't in a way that enterprises can understand. But from what I'm seeing, it looks to be creating and a cloud like experience for enterprises from edge to cloud, but also providing the components so that if you do have applications that are shovel ready for cloud or our cloud native, you can embrace Public Cloud as well as private cloud and pull them under the Green Lake >>Rela. Yeah, ostensibly stew kubernetes is part of the answer to that, although you know, as we've talked about, Kubernetes is necessary containers and necessary but not sufficient for that experience. And I guess the point I'm getting to is, you know we do. We've talked about this with Red Hat, certainly with VM Ware and others the opportunity to have that experience across clouds at the Edge on Prim. That's expensive from an R and D standpoint. And so I want to kind of bring that into the discussion. HP last year spent about 1.8 billion in R and D Sounds like a lot of money. It's about 6% of its of it's revenues, but it's it's spread thin now. It does are indeed through investments, for instance, like Pensando or other acquisitions. But in terms of organic R and D, you know, it's it's it's not at the top of the heap. I mean, obviously guys like Amazon and Google have surpassed them. I've written about this with regard to IBM because they, like HP, spend a lot on dividends on share buybacks, which they have to do to prop up the stock price and placate Wall Street. But it But it detracts from their ability to fund R and d student your take on that sort of innovation roadmap for the next decade. >>Yeah, I mean, one of the things we look at it in the last year or so there's been what we were talking about earlier, that management across these environments and kubernetes is a piece of it. So, you know, Google laid down and those you've got Microsoft with Azure, our VM ware with EMS. Ooh! And to Tim's point, you know, it feels like Green Lake fits kind of in that category, but there's there's pieces that fall outside of it. So, you know, when I first thought of Green Lake, it was Oh, well, I've got a private cloud stack like an azure stack is one of the solutions that they have there. How does that tie into that full solution? So extending that out, moving that brand I do here, you know good things from the field, the partners and customers. Green Lake is well respected, and it feels like that is, that is a big growth. So it's HB 50 from being more thought of, as you know, a box seller to more of that solution in subscription model. Green Lake is a vehicle for that. And as you pointed out, you know, rightfully so. Software so important. And I feel when that thing I'd say HPI ee feels toe have more embracing of software than, say, they're closest competitors. Which is Dell, which, you know, Dell Statement is always to be the leading infrastructure writer, and the arm of VM Ware is their software. So, you know, just Dell alone without VM ware, HP has to be that full solution of what Dell and VM ware together. >>Yeah, and VM Ware Is that the crown jewel? And of course, HP doesn't have a VM ware, but it does have over 8000 software engineers. Now I want to ask you about open source. I mean, I would hope that they're allocating a large portion of those software engineers. The open source development developing tooling at the edge, developing tooling from multi cloud certainly building hooks in from their hardware. But is HP Tim doing enough in open source? >>Well, I don't want to get on the open source bandwagon, and I don't necessarily want to jump off it. I think the important thing here is that there are places where open source makes sense in places where it doesn't, um, and you have to look at each particular scenario and really kind of ask yourself, does it make sense to address it here? I mean, it's a way to to engage your developers and engage your customers in a different mode. What I see from HP E is more of a focus around trying to determine where can we provide the greatest value for our customers, which, frankly, is where their focus should be, whether that shows up in open source for software, whether that shows up in commercial products. Um, we'll see how that plays out. But I think the one thing that I give HP e props on one of several things I would say is that they are kind of getting back to their roots and saying, Look, we're an infrastructure company, that is what we do really well We're not trying to be everything to everyone. And so let's try and figure out what are customers asking for? How do we step through that? I think this is actually one of the challenges that Antonio's predecessors had was that they tried to do jump into all the different areas, you know, cloud software. And they were really X over, extending themselves in ways that they probably should. But they were doing it in ways that really didn't speak to their four, and they weren't connecting those dots. They weren't connecting that that connective tissue they needed to dio. So I do think that, you know, whether it's open source or commercial software, we'll see how that plays out. Um, but I'm glad to see that they are stepping back and saying Okay, let's be mindful about how we ease into this >>well, so the reason I bring up open source is because I think it's the mainspring of innovation in the industry on that, but of course it's very tough to make money, but we've talked a lot about H B's strength since breath is, we haven't talked much about servers, but they're strong in servers. That's fine We don't need to spend time there. It's culture. It seems to be getting back to some of its roots. We've touched on some of its its weaknesses and maybe gaps. But I want to talk about the opportunities, and there's a huge opportunity to the edge. David Flores quantified. He says that Tam is four. Trillion is enormous, but here's my question is the edge Right now we're seeing from companies like HP and Dell. Is there largely taking Intel based servers, kind of making a new form factor and putting them out on the edge? Is that the right approach? Will there be an emergence of alternative processors? Whether it's our maybe, maybe there's some NVIDIA in there and just a whole new architecture for the edge to authority. Throw it out to you first, get Tim Scott thoughts. >>Yeah, So what? One thing, Dave, You know, HP does have a long history of partnering with a lot of those solutions. So you see NVIDIA up on stage when you think about Moonshot and the machine and some of the other platforms that they felt they've looked at alternative options. So, you know, I know from Wicky Bon standpoint, you know, David Foyer wrote the piece. That arm is a huge opportunity at the edge there. And you would think that HP would be one of the companies that would be fast to embrace that >>Well, that's why I might like like Moonshot. I think that was probably ahead of its time. But the whole notion of you know, a very slim form factor that can pop in and pop out. You know, different alternative processor architecture is very efficient, potentially at the edge. Maybe that's got got potential. But do you have any thoughts on this? I mean, I know it's kind of Yeah, any hardware is, but, >>well, it is a little hardware, but I think you have to come back to the applicability of it. I mean, if you're taking a slim down ruggedized server and trying Teoh essentially take out, take off all the fancy pieces and just get to the core of it and call that your edge. I think you've missed a huge opportunity beyond that. So what happens with the processing that might be in camera or in a robot or in an inch device? These are custom silicon custom processors custom demand that you can't pull back to a server for everything you have to be able to to extend it even further. And, you know, if I compare and contrast for a minute, I think some of the vendors that are looking at Hey, our definition of edge is a laptop or it is this smaller form factor server. I think they're incredibly limiting themselves. I think there is a great opportunity beyond that, and we'll see more of those kind of crop up, because the reality is the applicability of how Edge gets used is we do data collection and data analysis in the device at the device. So whether it's a camera, whether it's ah, robot, there's processing that happens within that device. Now some of that might come back to an intermediate area, and that intermediate area might be one of these smaller form factor devices, like a server for a demo. But it might not be. It might be a custom type of device that's needed in a remote location, and then from there you might get back to that smaller form factor. Do you have all of these stages and data and processing is getting done at each of these stages as more and more resources are made available. Because there are things around AI and ML that you could only do in cloud, you would not be able to do even in a smaller form factor at the edge. But there are some that you can do with the edge and you need to do at the edge, either for latency reasons or just response time. And so that's important to understand the applicability of this. It's not just a simple is saying, Hey, you know, we've got this edge to cloud portfolio and it's great and we've got the smaller servers. You have to kind of change the vernacular a little bit and look at the applicability of it and what people are actually doing >>with. I think those are great points. I think you're 100% right on. You are going to be doing AI influencing at the edge. The data of a lot of data is going to stay at the edge and I personally think and again David Floor is written about this, that it's going to require different architectures. It's not going to be the data center products thrown over to the edge or shrunk down. As you're saying, That's maybe not the right approach, but something that's very efficient, very low cost of when you think about autonomous vehicles. They could have, you know, quote unquote servers in there. They certainly have compute in there. That could be, you know, 2344 $5000 worth of value. And I think that's an opportunity. I'd love to see HP Dell, others really invest in R and D, and this is a new architecture and build that out really infuse ai at the edge. Last last question, guys, we're running out of time. One of the things I'll start with you. Still what things you're gonna watch for HP as indicators of success of innovation in the coming decade. As we said last decade, kind of painful for HP and HP. You know, this decade holds a lot of promise. One of the things you're gonna be watching in terms of success indicators. >>So it's something we talked about earlier is how are they helping customers build new things, So a ws always focuses on builders. Microsoft talks a lot. I've heard somethin double last year's talk about building those new applications. So you know infrastructure is only there for the data, and the applications live on top of it. And if you mention Dave, there's a number of these acquisitions. HP has moved up the stack. Some eso those proof points on new ways of doing business. New ways of building new applications are what I'm looking for from HP, and it's robust ecosystem. >>Tim. Yeah, yeah, and I would just pick you back right on. What's do was saying is that this is a, you know, going back to the Moonshot goals. I mean, it's about as far away as HP ease, and HP is routes used to be and that that hardware space. But it's really changing business outcomes, changing business experiences and experiences for the customers of their customers. And so is far cord that that eight p e can get. I wouldn't expect them to get all the way there, although in conversations I am having with HP and with others that it seems like they are thinking about that. But they have to start moving in that direction. And that's actually something that when you start with the builder conversation like Microsoft has had, an Amazon has had Google's had and even Dell, to some degree has had. I think you missed the bigger picture, so I'm not saying exclude the builder conversation. But you have to put it in the right context because otherwise you get into this siloed mentality of right. We have solved one problem, one unique problem, and built this one unique solution. And we've got bigger issues to be able to address as enterprises, and that's going to involve a lot of different moving parts. And you need to know if you're a builder, you've it or even ah ah, hardware manufacturer. You've got to figure out, How does your piece fit into that bigger picture and you've got to connect those dots very, very quickly. And that's one of the things I'll be looking for. HP as well is how they take this new software initiative and really carry it forward. I'm really encouraged by what I'm seeing. But of course the future could hold something completely different. We thought 2020 would look very different six months ago or a year ago than it does today. >>Well, I wanna I want to pick up on that, I think I would add, and I agree with you. I'm really gonna be looking for innovation. Can h P e e get back to kind of its roots? Remember, H B's router invents it was in the logo. I can't translate its R and D into innovation. To me, it's all about innovation. And I think you know cios like Antonio Neri, Michael Dell, Arvind Krishna. They got a They have a tough, tough position because they're on the one hand, they're throwing off cash, and they can continue Teoh to bump along and, you know, placate Wall Street, give back dividends and share buybacks. And and that's fine. And everybody would be kind of happy. But I'll point out that Amazon in 2007 spent spend less than a $1,000,000,000 in R and D. Google spent about the back, then about the same amount of each B E spends today. So the point is, if the edge is really such a huge opportunity, this $4 trillion tam is David Foyer points out, there's a There's a way in which some of these infrastructure companies could actually pull a kind of mini Microsoft and reinvent themselves in a way that could lead to massive shareholder returns. But it was really will take bold vision and a brave leader to actually make that happen. So that's one of things I'm gonna be watching very closely hp invent turn r and D into dollars. And so you guys really appreciate you coming on the Cube and breaking down the segment for ah, the future of HP be well, and, uh and thanks very much. Alright. And thank you for watching everybody. This is Dave Volante for Tim Crawford and Stupid men. Our coverage of HP ease 2020 Virtual experience. We'll be right back right after this short break. >>Yeah, yeah, yeah, yeah.

Published Date : Jun 23 2020

SUMMARY :

Discover Virtual experience Brought to you by HP. He's a strategic advisor to see Io's with boa. And so now that companies really and you hear this from Antonio kind of positioning for innovation for the next decade. I think it comes back to just making sure that they had the product on hand, And so, you know, that I remember from, you know, 5 to 10 years ago. But you got to believe that the the conversations And I think one of the things that you have to look you know, kind of focusing on maybe automating data, And you know, HP has got a lot of interesting pieces. Add more fuel to that tension. And that is when you step back and you look at how customers are gonna have to consume services, Let's stay on that for a second. You know, HP one view, you know, in the early days, it was really well respected. And so now you talk about how do I manage this across Well, let's talk about that some more because I think this really is the big opportunity and we're potentially innovation edge to cloud, but also providing the components so that if you do have applications And I guess the point I'm getting to is, you know we do. Which is Dell, which, you know, Dell Statement is always to be the leading infrastructure Yeah, and VM Ware Is that the crown jewel? had was that they tried to do jump into all the different areas, you know, Throw it out to you first, get Tim Scott thoughts. And you would think that HP would be one of the companies that would be fast But the whole notion of you custom demand that you can't pull back to a server for everything They could have, you know, quote unquote servers in there. And if you mention Dave, that this is a, you know, going back to the Moonshot goals. And I think you know cios like Antonio Neri, Michael Dell, Arvind Krishna. Yeah, yeah, yeah,

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Tim Crawford	PERSON	0.99+
Dave Vellante	PERSON	0.99+
David Flores	PERSON	0.99+
Tony	PERSON	0.99+
Dell	ORGANIZATION	0.99+
Antonio	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
Tim	PERSON	0.99+
November 19	DATE	0.99+
Dave	PERSON	0.99+
David Foyer	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tim Scott	PERSON	0.99+
Arvind Krishna	PERSON	0.99+
Stuart	PERSON	0.99+
John	PERSON	0.99+
2007	DATE	0.99+
John Chambers	PERSON	0.99+
Michael Dell	PERSON	0.99+
Dave Volante	PERSON	0.99+
100%	QUANTITY	0.99+
David Floor	PERSON	0.99+
last year	DATE	0.99+
Antonio Neri	PERSON	0.99+
10 minutes	QUANTITY	0.99+
$4 trillion	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+

Breaking Analysis: COVID-19 Takeaways & Sector Drilldowns Part II

>>from the Cube Studios in Palo Alto and Boston connecting with thought leaders all >>around the world. This is a cube conversation, Everyone. Welcome to this week's Cube insights, powered by ET are My name is Dave Volante, and we've been reporting every week really on the code. 19. Impact on Budgets Docker Korakia is back in with me soccer. It's great to see you really >>again for having >>your very welcome. Soccer is, of course, the director of research, that we are our data partner and man. I mean, you guys have just been digging into the data or a court reiterate We're down, you know, roughly around minus 5% for the year. The thing about what we're doing here and where they want to stress in the audience that that's going to change. The key point is we don't just do ah, placeholder and update you in December. Every time we get new information, we're going to convey it to you. So let's get right into it. What we want to do today is you kind of part two from the takeaways that we did last week. So let's start with the macro guys. If you bring up the first chart, take us through kind of the top three takeaways. And just to reiterate where we're at >>Yeah, no problem. And look, as you mentioned, uh, what we're doing right now is we're collecting the pulse of CIOs. And so things change on and we continue to expect them to change, you know, in the next few weeks, in the next few months, as things change with it. So just kind of give a recap of the survey and then kind of going through some of our top macro takeaways. So in March mid March, we launched our Technology Spending Intention Survey. We had 1250 CIOs approximately. Take that survey. They provided their updated 2020 verse 2019 spending intentions, right? So effectively, they first Davis, those 20 21st 19 spending intentions in January. And then they went ahead and up state of those based on what happened with move it and then in tandem with that, we did this kind of over 19 drill down survey where we asked CEOs to estimate the budget impact off overnight in versus what they originally forecast in the year. And so that leads us to our first take away here, where we essentially aggregated the data from all these CIOs in that Logan 19 drill down survey. And we saw a revision of 900 basis points so down to a decline of 5%. And so coming into the year, the consensus was about 4% growth. Ah, and now you can see we're down about 5% for the year. And again, that's subject to change. And we're going again re measure that a Z kind of get into June July and we have a couple of months under our belt with the folks at night. The second big take away here is, you know, the industries that are really indicating those declines and spend retail, consumer airlines, financials, telco I key services in consulting. Those are the verticals, as we mentioned last week, that we're really seeing some of the largest Pullbacks and spend from consumers and businesses. So it makes sense that they are revising their budgets downwards the most. And then finally, the last thing we captured that we spoke about last week as well as a few weeks before that, and I think that's really been playing out the last kind of week in 1/2 earnings is CIOs are continuing to press the pedal on digital transformation. Right? We saw that with Microsoft, with service now last night, right, those companies continued the post good numbers and you see good demand, what we're seeing and where those declines that we just mentioned earlier are coming from. It's it's the legacy that's the on premise that your place there's such a concentration of loss and deceleration within some of those companies. And we'll kind of get into that more a Z go through more slides. But that's really what kind of here, you know, that's really what we need to focus on is the declines are coming from very select vendors. >>Yeah, and of course you know where we were in earning season now, and we're paying close attention to that. A lot of people say I just ignore the earnings here, you know, you got the over 19 Mulligan, but But that's really not right. I mean, obviously you want to look at balance sheets, you want to look at cash flows, but also we're squinting through some of the data your point about I t services and insulting is interesting. I saw another research firm put out that you know, services and consulting was going to be OK. Our data does, you know, different. Uh, and we're watching. For instance, Jim Kavanaugh on IBM's earnings call was very specific about the metrics that they're watching. They're obviously very concerned about pricing and their ability. The book business. There we saw the cloud guys announced Google was up in the strong fifties. The estimate is DCP was even higher up in the 80% range. Azure, you know, we'll talk about this killing it. I mean, you guys have been all over of Microsoft and its presence, you know, high fifties aws solid at around 34% growth from a larger base. But as we've been reporting, you know, downturns. They've been they've been good to cloud. >>That's right. And I think, you know, based on the data that we've captured, um, you know, it's people are really pressing the pedal on cloud and SAS with this much remote work, you need to have you know, that structure in place to maintain productivity. >>Okay, let's bring up the next slide. Now. We've been reporting a lot on this sort of next generation work loads Bob one Dato all about storage and infrastructures of service. Compute. There's an obviously some database, but there's a new analytics workload emerging. Uh, and it's kind of replacing, or at least disinter mediating or disrupting the traditional e d ws. I've said for years. CDW is failed to live up to its expectations of 360 degree insights and real time data, and that's really what we're showing here is some of the traditional CDW guys are getting hit on Some of the emerging guys, um, are looking pretty good. So take us through what we're looking at here. Soccer. >>Yeah, no problem. So we're looking at the database data warehousing sector. What you're looking at here is replacement rates. Um And so, as example, if you see up in with roughly 20% replacement, what that means is one out of five people who took the survey for that particular sector for that vendor indicated that they were replacing, and so you can see here for their data. Cloudera, IBM, Oracle. They have very elevated and accelerating replacement rates. And so when we kind of think about this space. You can really see the bifurcation, right? Look how well positioned the Microsoft AWS is. Google Mongo, Snowflake, low replacements, right low, consistent replacements. And then, of course, on the left hand side of the screen, you're really seeing elevated, accelerating. And so this space is It kind of goes with that theme that we've been talking about that we covered last week by application, right when you think about the declines that you're seeing and spend again, it's very targeted for a lot of these kind of legacy legacy vendors. And we're again. We're seeing a lot of the next gen players that Microsoft AWS in your post very strong data. And so here, looking within database, it's very clear as to which vendors are well positioned for 2020 and which ones look like they're being ripped out and swapped out in the next few months. >>So this to me, is really interesting. So you know, you you've certainly reported on the impact that snowflake is having on Terra data. And in some of IBM's business, the old man, he's a business. You can see that here. You know, it's interesting. During the Hadoop days, Cloudera Horton works when they realize that it didn't really make money on Hadoop. They sort of getting the data management and data database and you're seeing that is under pressure. It's kind of interesting to me. Oracle, you know, is still not what we're seeing with terror data, right, Because they've got a stranglehold on the marketplace That's right, hanging in there. Right? But that snowflake would no replacements is very impressive. Mongo consistent performer. And in Google aws, Microsoft AWS supports with Red Shift. They did a one time license with Park Cell, which was an MPP database. They totally retooled a thing. And now they're sort of interestingly copycatting snowflake separating compute from storage and doing some other moves. And yet they're really strong partners. So interesting >>is going on and even, you know, red shift dynamodb all. They all look good. All these all these AWS products continue screen Very well. Ah, in the data warehousing space, So yeah, to your point, there's a clear divergence of which products CIOs want to use and which ones they no longer want in their stack. >>Yeah, the database market is very much now fragment that it used to be in an Oracle db two sequel server. As you mentioned, you got a lot of choices. The Amazon. I think I counted, you know, 10 data stores, maybe more. Dynamodb Aurora, Red shift on and on and on. So a really interesting space, a lot of activity in that new workload that I'm talking about taking, Ah, analytic databases, bringing data science, pooling into that space and really driving these real time insights that we've been reporting on. So that's that's quite an exciting space. Let's talk about this whole workflow. I t s m a service now. Just just announced, uh, we've been consistently crushing it. The Cube has been following them for many, many years, whether, you know, from the early days of Fred Luddy, Bruce Lukman, the short time John Donahoe. And now Bill McDermott is the CEO, but consistent performance since the AIPO. But what are we actually showing here? Saga? Yeah, You bring up that slot. Thank you. >>So our key take away on kind of the i t m m i t s m i t workflow spaces. Look, it's best in breed, which is service now, or some of the lower cost providers. Right There's really no room for middle of the pack, so >>this is an >>interesting charts. And so what you're looking at here, there's a few directives, so kind of walk you through it and then I'll walk through. The actual results is we're looking within service now accounts. And so we're seeing how these companies are doing within or among customers that are using service. Now, today, where you're looking at on the ex, access is essentially shared market share our shared customers, and then on the Y axis you're seeing essentially the spend velocity off those vendors within service. Now's outs, right? So if the vendor was doing well, you would see them moving up into the right, right? That means they're having more customer overlap with service now, and they're also accelerating Spend, but you can see if you will get zendesk. If you look at BMC, it's a managed right. You can see there either losing market share and spend within service now accounts or they're losing spend right and zendesk is another example Here, Um, and what's actually interesting is, and we've had a lot of anecdotal evidence from CIOs is that look they start with service. Now it's best in breed, but a few of them have said, Look, it's got expensive, Um, and so they would move over Rezendes. And then they would look at it versus a conference that last year, and we had a few CEO say, Look at last quarter of the price of zendesk. Andi moved away from Zendesk and subsequently well, with last year. And so it's just it's interesting that, you know, during these times where you know CIOs are reducing their budgets on that look, it's either best of breed or low cost. There's really no room in the middle, and so it's actually kind of interesting. In this space, it's It's an interesting dynamic and being usually it's best of breed or low cost. Rarely do you kind of see both win, and I think that's what kind of makes the space interesting. >>I've been following service now for a number of years. I just make a few comments there. First of all, you know, workday was the gold standard in enterprise software for the longest time and, you know, company and and and I I always considered service now to be kind of part of that you know Silicon Valley Mafia with Frank's Loop. But what's happened is, you know, Sluman did a masterful job of identifying the total available market and executing with demand, and now you know, his successors have picking it beyond there. You know, service now has a market cap that's not quite double, but I mean, I think workday last I checked was in the mid thirties. Service now is market valuation is up in the 60 billion range. I mean, they announced, um uh, just recently, very interestingly, they be expectations. They lowered their guidance relative to consensus guide, but I think the street hose, first of all, they beat their numbers and they've got that SAS model, that very predictable model. And I think people are saying, Look there, just leaving meat on the bone so they can continue to be because that's been their sort of m o these last several years. So you got to like their positioning and you get to talk to customers. They are pricey. You do hear complaints about that, and they've got a strong lock spec. But generally I got my experiences. If people can identify business value and clear productivity, they work through the lock in, you know, they'll just fight it out in the negotiations with procurement. >>That's right, and two things on that. So with service now and and even Salesforce, right, they are a platform like approach type of vendors right where you build on them. And that's what makes them such break companies, right? Even if they have, you know, little nicks and knacks here and there. When they report people see past that right, they understand their best of breed. You build your companies on the service now's and the sales forces of the world. And to the second point, you're exactly right. Businesses want to maintain consistent productivity on, and I think that, you know, is it kind of resonates with the theme, right, doubling down on Cloud and sas. Um, as as you have all this remote work, as you have kind of, you know, questionable are curating marquee a macro environment organizations want to make sure that their employees continue to execute that they're generating consistent productivity. And using these kind of best of breed tools is the way to go. >>It's interesting you mentioned, uh, salesforce and service now for years I've been saying they're on a collision course we haven't seen yet because they're both platforms. I still, uh I'm waiting for that to happen. Let's bring up the next card and let's get into networking way talk. Um Ah. Couple of weeks ago, about the whole shift from traditional Mpls moving to SD win. And this sort of really lays it out. Take us through the data here, please. >>Yeah, no problem. So we're just looking at a handful of vendors here. Really? We're looking at networking vendors that have the highest adoption rates within cloud accounts. And so what we did was we looked inside of aws azure GCC, right. We essentially isolated just those customers. And then we said which networking vendors are seeing the best spend data and the most adoptions within those cloud accounts. And so you get you can kind of see some, uh, some themes here, right? SD lan. Right. You can see Iraqi their VM. Where nsx. You see some next gen load balance saying are they're on the cdn side right then. And so you're seeing a theme here of more next gen players on You're not really seeing a lot of the mpls vendors here, right? They're the ones that have more flattening, decreasing and replacing data. And so the reason just kind of going on this slide is you know, when you kind of think about the networking space as a whole, this is where adoptions are going. This is this is where spends billing and expanded, arise it. And what we just talked about >>your networking such a fascinating space to me because you got you got the leader and Cisco That has helped 2/3 of the market for the longest time, despite competitors like Arista, Juniper and others trying to get in the Air Force and NSX. And the big Neisseria acquisition, you know, kind of potentially disrupted that. But you can see, you know, Cisco, they don't go down without a fight. And ah, there, let's take a look at the next card on Cdn. You know, this is interesting. Uh, you know, you think with all this activity around work from home and remote offices, there's a hot area, But what are we looking at here? >>Yeah, no problem. And that's right, right? You would think. And so we're looking at Cdn players here you would think with the uptake in traffic, you would see fantastic. That scores right for all the cdn vendor. So what you're looking at here and again there's a few lenses on here, so I kind of walk. You kind of walk the audience through here is first we isolated only those individuals that were accelerating their budgets due to work from home. Right. So we've had this conversation now for a few weeks where support employees working from home. You did see a decent number of organizations. I think it was 20 or 30% of organizations at the per server that indicated they're actually accelerate instead. So we're looking at those individuals. And then what we're doing is we're seeing how are how's Cloudflare and aka my performing within those accounts, right? And so we're looking at those specific customers and you could just see within Cloudflare and we practice and security and networking which by more the Cdn piece, How consistent elevated the date is right? This is spend in density, right? Not overall market share is obviously aka my you know, their brand father CD ends. They have the most market share and if you look at optimized to the right. Now you can see the spend velocity is not very good. It's actually negative across boats sector. So you know it's not. We're not saying that. Look, there's a changing of the guard that's occurring right now. We're still relatively small compared talk my But there's just such a start on trust here and again, it kind of goes to what we're talking about. Our macro themes, right? CIOs are continuing to invest in next gen Technologies, and better technologies on that is having an impact on some of these legacy. And, you know, grandfather providers. >>Well, I mean, I think as we enter this again, I've said a number of times. It's ironic overhead coming into a new decade. And you're seeing this throughout the I T. Stack, where you've got a lot of disruptors and you've got companies with large install bases, lot of on Prem or a lot of historical legacy. Yeah, and it's very hard for them to show growth. They often times squeeze R and D because they gotta serve Wall Street. And this is the kind of dilemma they're in, and the only good news with a comma here is there is less bad security go from negative 20% to a negative 8% net score. Um, but wow, what a what a contrast, but to your point, much, much smaller base, but still very relevant. We've seen this movie before. Let's let's wrap with another area that we've talked about. What is virtualization? Desktop virtualization? Beady eye again. A beneficiary of the work from home pivot. Um, And we're focused here, right on Fortune 500 net scores. But give us the low down on this start. >>Yeah, So this is something that look, I think it's it's pretty obvious to into the market you're seeing an uptake and spend across the board versus three months ago in a year ago and spending, etc. Among your desktop virtualization players, there's FBI, right? So that's gonna be your VPN right now. Obviously, they reported pretty good numbers there, so this is an obvious slide, but we wanted to kind of throw it in there. Just say, look, you know, these organizations are seeing nice upticks incent, you know, within the virtualization sectors, specifically within Fortune 500 again, that's kind of, you know, work from home spend that we're seeing here, >>right? So, I mean, this is really a 100% net score in the Fortune 500 for workspaces is pretty amazing. And I think the shared in on this that the end was actually quite large. It wasn't like single digits, Many dozens. I remember when Workspaces first came out, it maybe wasn't ready for prime time. But clearly there's momentum there, and we're seeing this across the board saga. Thanks so much for coming in this week. Really appreciate it. We're gonna be in touch with with you with the TR. We're gonna continue to report on this, but start Dr stay safe. And thanks again. >>Thanks again. Appreciate it. Looking for to do another one. >>All right. Thank you. Everybody for watching this Cube insights Powered by ET are this is Dave Volante for Dr Sadaaki. Remember, all these episodes are available as podcasts. I published weekly on wiki bond dot com Uh, and also on silicon angle dot com Don't forget tr dot Plus, Check out all the action there. Thanks for watching everybody. We'll see you next time. Yeah, yeah, yeah, yeah, yeah

Published Date : Apr 30 2020

SUMMARY :

It's great to see you really you know, roughly around minus 5% for the year. And so things change on and we continue to expect them to change, you know, A lot of people say I just ignore the earnings here, you know, you got the over 19 Mulligan, And I think, you know, based on the data that we've captured, um, So take us through what we're looking at here. and so you can see here for their data. So you know, you you've certainly reported on the impact that snowflake is is going on and even, you know, red shift dynamodb all. I think I counted, you know, 10 data stores, maybe more. So our key take away on kind of the i t m m i t s m i And so it's just it's interesting that, you know, you know, workday was the gold standard in enterprise software for the longest time and, you know, productivity on, and I think that, you know, is it kind of resonates with the theme, It's interesting you mentioned, uh, salesforce and service now for years I've been saying they're on a collision And so the reason just kind of going on this slide is you know, when you kind of think about the networking space as And the big Neisseria acquisition, you know, kind of potentially disrupted that. And so we're looking at Cdn players here you would think with the uptake in traffic, of the work from home pivot. specifically within Fortune 500 again, that's kind of, you know, work from home spend that we're seeing it. We're gonna be in touch with with you with the TR. Looking for to do another one. We'll see you next time.

ENTITIES

Entity	Category	Confidence
Jim Kavanaugh	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Bruce Lukman	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
20	QUANTITY	0.99+
Cisco	ORGANIZATION	0.99+
Bill McDermott	PERSON	0.99+
John Donahoe	PERSON	0.99+
January	DATE	0.99+
December	DATE	0.99+
Amazon	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Arista	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
5%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
2020	DATE	0.99+
1250 CIOs	QUANTITY	0.99+
last week	DATE	0.99+
Fred Luddy	PERSON	0.99+
60 billion	QUANTITY	0.99+
Juniper	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
second point	QUANTITY	0.99+
Boston	LOCATION	0.99+
last year	DATE	0.99+
Cloudflare	TITLE	0.99+
NSX	ORGANIZATION	0.99+
mid thirties	DATE	0.99+
Cube Studios	ORGANIZATION	0.99+
one	QUANTITY	0.99+
FBI	ORGANIZATION	0.99+
Zendesk	ORGANIZATION	0.99+
Cloudera	ORGANIZATION	0.99+
10 data stores	QUANTITY	0.99+
June July	DATE	0.99+
five people	QUANTITY	0.99+
two things	QUANTITY	0.99+
Neisseria	ORGANIZATION	0.99+
zendesk	ORGANIZATION	0.99+
8%	QUANTITY	0.99+
both platforms	QUANTITY	0.99+
360 degree	QUANTITY	0.99+
three months ago	DATE	0.99+
900 basis points	QUANTITY	0.98+
today	DATE	0.98+
Rezendes	ORGANIZATION	0.98+
this week	DATE	0.98+
BMC	ORGANIZATION	0.98+
last night	DATE	0.98+
Sadaaki	PERSON	0.98+
a year ago	DATE	0.98+
30%	QUANTITY	0.98+
first	QUANTITY	0.97+
March mid March	DATE	0.97+
about 4%	QUANTITY	0.97+
about 5%	QUANTITY	0.97+
first chart	QUANTITY	0.96+
20%	QUANTITY	0.96+
double	QUANTITY	0.96+
First	QUANTITY	0.95+
Docker Korakia	PERSON	0.95+
ET	ORGANIZATION	0.95+
one time license	QUANTITY	0.95+
2019	DATE	0.95+
Loop	ORGANIZATION	0.94+
around 34%	QUANTITY	0.94+
Andi	PERSON	0.94+
Davis	PERSON	0.94+
Bob	PERSON	0.94+
Sluman	PERSON	0.93+
CDW	ORGANIZATION	0.93+
Park Cell	ORGANIZATION	0.92+
Cube	ORGANIZATION	0.91+
first take	QUANTITY	0.91+

Chandler Hoisington, D2iQ | D2iQ Journey to Cloud Native

>>from San Francisco. It's the queue every day to thank you. Brought to you by day to like you. Hey, >>welcome back already, Jeffrey. Here with the Cube were a day to IQ's headquarters in downtown San Francisco. They used to be metal sphere, which is what you might know them as. And they've rebranded earlier this year. And they're really talking about helping Enterprises in their journey to cloud native. And we're really excited to have really one of the product guys he's been here and seeing this journey and how through with the customers and helping the company transforming his Chandler hosing tonight. He's the s VP of engineering and product. Chandler, great to see you. Thanks. So, first off, give everyone kind of a background on on the day to like you. I think a lot of people knew mesosphere. You guys around making noise? What kind of changed in the marketplace to to do a rebranding? >>Sure. Yeah, we've been obviously, Mason's here in the past and may so so I think a lot of people watching the cube knows No, no one knows about Mace ose as as we were going along our journey as a company. We noticed that a lot of people are also asking for carbonates. Eso We've actually been working with kubernetes since I don't know 16 4017 something that for a while now and as Maur Maur as communities ecosystem starting involving mature more. We also want to jump in and take advantage of that. And we started building some products that were specific to kubernetes and eso. We thought, Look, you know, it's a little bit confusing for people May, SOS and Kubernetes and at times those two technologies were seen almost as competitive, even though we didn't always see it that way. The market saw it that way, so we said, Look, this is going too confusing for customers being called Mesa Sphere. Let's let's rebrand around Maur what we really do. And we felt like what we do is not just focus around one specific technology. We felt like we helped customers with more than that more than just may so support more than just community support, Andi said. Look, let's let's get us a name that shows what we actually do for customers, and that's really helping them take their workloads and put them on on Not just, you know, um, a source platform, but actually take their workloads, bring them into production and enterprise way. That's really ready for day two. And that's that's why we called it data. >>And let's unpack the day to, cause I think some people are really familiar with the concept of day two. And for some people, they probably never heard it. But it's a pretty interesting concept, and I think it packs a lot of meaning in it. A number of letters. I think you >>can kind of just think about it if you were writing software, right? I mean, Day zero is okay. We're gonna design it. We're gonna start playing with some ideas. We're gonna pull into different technologies. We're gonna do a POC. We're gonna build our skateboards. So to say, that's kind of your day. Zero. What do we want? Okay, we're gonna build a Data Analytics pipeline. We want spark. We're going to store data. Cassandra, we're gonna use cough. Go to pass it around. We're gonna run our containers on top of communities. That's just kind of your day. Zero idea. You get it working, you slap it on a cluster. Things are good right? Day one might be okay. Let's actually do a beta put in production in some kind of way. You start getting customers using it. But now, in Day two, after all that's done, you're like, Wait a second. Things were going wrong. Where's our monitoring? We didn't set that up. Where's our logging? Oh, I don't know. Like, >>who do we >>call this? Our container Run time, we think has above. Who do we call like? Oh, I don't know What support contract that we cut, Right? So that's the things that we want to help customers with. We want to help them in the whole journey, getting to Day two. But once they're there, we want them to be ready for day two, right? And that's what we do. >>I love it because one of my favorite quotes I've used it 1000 times. I'll do 2001 right? Is that open source is free like a puppy. Exactly for you. When you leave you guys, you're not writing a check necessarily to the to the shelter, But there's a whole lot of other check. You got a right and take care of. And I think that's such a key piece. Thio Enterprise, right. They need somebody to call when that thing breaks. >>Yeah. I mean, I haven't come from enterprise company. I was actually a customer basis Fear before I joined. Yeah, that's exactly why we're customers that we wanted. Not only that, insurance policy, but someone that partner with us as we start figuring this out, you know? I mean, just picking. You know what container run time do I want to use with communities? That one decision could take months if you're not familiar with it. And you you put a couple of your best architects on it. Go research container. You go research, cryo go research doctor. Tell me what's what's the best one we should use with kubernetes. Whereas if you're going, if you have a partnership with a company like day two, you can say, Look, I trust these. You know this company, they they're they're experts of this and they see a lot of this. Let's go with their recommendation. It's >>okay. So you got you got your white board. You've got a whole bunch of open source things going on, right? And you've got a whole bunch of initiatives and the pressure's coming down from from on high to get going, you've got containers, Asian and Cloud native and hybrid Cloud all the stuff. And then you've got some port CEO on his team trying to figure it out. You guys have a whole plethora of service is around some of these products. So as you try it and then you got the journey right and you don't start from from a standing start. You gotta go. You gotta go. So how do you map out the combination of how people progress through their journey? What are the different types of systems that they want to put in place and into, prioritize and have some type of a logical successful implementation and roll out of these things from day zero day 132? No, it's >>a great question. I think that's actually how we formed our product. Strategy is we've been doing this for a while now and we've we've gone. We've gone on this journey with really big advanced customers like ride sharing companies and large telcos customers like that. We've also gone on this journey with smaller, less sophisticated customers like, you know, industrial customers from the Midwest. Right? And those are two very, very different customers. But what's similar is they're both going on the same journey we feel like, but they're just at different places. So we wanted to build products, find the customer where they're at in their journey, and the way we see it really is just at the very beginning. It's just training, right? So we have, ah, bunch of support. We're sorry. Service is around training. Help you understand? Not just kubernetes, but the whole cloud native ecosystem. So what is all this stuff? How does it work? How does it fit together? How do I just deploy simple app to right? That's the beginning of it. We also have some products in that area as well, to help people scale their training across the whole whole organization. So that's really exciting for us once once, once that customer has their training down there like Okay, look, get I need a cluster now, like I need a destroyer of sorts and criminals itself is great, but it needs a lot of pieces to actually get it ready for prime time. And that's where we build a product called Convoy Say Okay, here is your enterprise great. Ready to go kubernetes destro right out of the box. And that product is really it's what you could use to just fiddle around with communities. It's also what you put into production right on the game. That's that's been scale tested, security tests and mixed workload tested. It's everything. So that's that's kind of our communities. Destro. So you've gotten your training. You have your destro and now you're like, OK, I actually wanna want to run some applesauce. >>Let me hold there. Is it Is it open corps? Or, you know, there's a lot of conversation in the way the boys actually >>the way we built convoy. It's a great question. The way we build convoys said, Okay, we don't We want to pick the best of breed from each of these. Have you seen the cloud native ecosystem kind of like >>by charter, high charter, whatever it is, where they have all the logos and all the different spiral thing. So it's crazy. Got thousands of logos, right? And >>we said, Look, we're gonna navigate this for you. What's the best container run time to pick. And it's It's almost as if we were gonna build this for ourselves using all open source technology. So convoys completely opens. Okay, um, there's some special sauce that we put in on how to bring these things together. Install it. But all the actual components itself is open source. Okay, so that's so if you're a customer, you're like, OK, I want open source. I don't want to be tied to any specific vendor. I want to run on Lee open. So >>yeah, I was just thinking in terms of you know, how Duke is a reference right. And you had, you know, the Horton worst cloud there and map our strategies, which were radically different in the way they actually packaged told a dupe under the covers. Yeah, >>you can think of it similar. How Cloudera per ship, Possibly where they had cdh. And they brought in a lot of open source. But they also had a lot of proprietary components to see th and what we've tried to get away from it is tying someone in tow. Us. I know that sounds counterintuitive from a business perspective, but we don't want customers to feel like if I go with D to like you. I always have to go with me to like you. I have to drink the Kool Aid, and I'm never gonna be able to get off. >>Kind of not. Doesn't really go with the open source. Exactly this stuff. It's not >>right for our customers, right? A lot of our customers want that optionality, and they don't want to feel locked in. And so when we built convoy, he said, Look, you know, if we were to start our own company, not not an infrastructure coming that we are right now, but just a software company build any kind of ab How would we approach it? And that was one of the problems we saw for We don't wanna feel like we're tied into any. >>Right. Okay, so you got to get the training, you got the products. What's >>next? What's next is if you think about the journey, you're like, OK, a lot. What we've found and this may or may not be totally true is one of the first things people like to run on committees is actually they're builds. So see, I see. And we said, How can we help with this. We looked around the market and there's a lot of great see, I see products out there right now. There's get lab, which is great partner of ours. It's a great product. There's there's your older products. Like Jenkins. There's a bunch of sass products, Travis. See all these things. But what we we wanted to do if we were customers of our own products is something that was native to Kubernetes. And so we started looking at projects like tectonic and proud. Some of these projects, right? And we said, How can we do the same thing we did with convoy where we bring these projects together and make it easy for someone to adopt these kubernetes native. See, I see tools. And we did some stuff there that we think is pretty innovative as well. And that's what that's the product we call dispatch. >>Okay. What do you got? More than just products. You've got profession service. That's right. So now >>you need help setting all this up. How do you actually bring your legacy applications to this new platform? How do you get your legacy builds onto these new build systems That that's where our service is coming the plate and kind of steer you through this whole journey. Lastly, what we next in the journey, though? Those service's compliment Really? Well, with with the kind of the rest of the product suite, right? And we didn't just stop with C i c. He said, what is the next type of work that we want to run here? Okay, so there we looked at things like red hat operators. Right? And we said, Look, red hats doing really cool thing here with this operator framework, how can we simplify it? We learn we've done a lot of this before with D. C. O s, where we built what we called the DCS sdk to help people bring advanced complex workloads onto that platform. And we saw a lot of similarities with operators to our d c West sdk. We said, How can we bring some of our understanding and knowledge to that world? And we built this open source product called kudo. Okay, people are free to go check that out. And that's how we bring more advanced workload. So if you think about the journey back to the journey again, you got some training you have your have your cluster, you put your builds on it. Now you want to run some advance work logs? That's where Kudo comes. >>Okay? And then finally, at the end of the trail is 1 800 I need help. Well, almost into the trail. We're not there yet. There was one thing they're still moving with one more step right on >>the very last one. Actually, we said, Okay, what's next in this journey? And that's running multiple clusters of the same. Okay, so that's kind of the scale. That's the end of the journey from for us, for our proxy as it stands right now. And that's where you build a product called Commander. And that's really helping us launch and manage multiple >>companies clusters at the same time. >>So it's so great that you have the perspective of a customer and you bring that directly in two. You know what you want because you just have gone through this this journey. But I'm just curious, you know, if you put your old hat on, you know, kind of c i o your customer. You know, you just talked about the cake chart with Lord knows how many logos? How do you help people even just begin to think about about the choices and about the crazy rapid change in what? That I mean? Kubernetes wasn't a thing four years ago to help them stay on top of it to help them, you know, both kind of have a night to the vision, you know, make sure you're delivering today on not just get completely distracted by every bright, shiny object that happens to come along. Yeah, no, >>I think it's really challenging for the buyers. You know, I think there's a, especially as the industry continues to make sure there's a new concept that gets thrown at all times. Service Manager. You know, some new, cool way to do monitoring or logging right? And you almost feel like a dinosaur. If you're not right on top of these things to go to a conference in, are you using? You know, you know B P f. Yet what is that? You didn't feel right? Exactly. I think I think most importantly, what customers want is the ability what, the ability to move their technology and their platforms as their business has the need. If the need isn't there for the business, and the technology is running well. There shouldn't be a reason to move to a new platform. Our new set of technologies, in fact, with dese us with Mason charities. To us, we have a lot of happy customers that are gonna be moving crib. Amazing if they wanted to anytime soon. Do you see What's that? Something's that criminal is currently doesn't do. It may never do because the community is just not focused on it that DCS is solving. And those customers just want to see that will continue to support them in the journey that they're on with their their business. And I think that's what's most important is just really understanding our customer's understanding their business, understand where they wanna go. What are their goals, So to say, for their technology platforms and and making sure you were always one step ahead >>of them, that's a >>good place to be one step ahead of demand. All right, well, thanks for for taking a few minutes and sharing the story. Appreciate it. Okay. Thank you. All right. Thanks. Chandler. I'm Jeff. You're watching >>the Cube. Where? Day two. I >>Q in downtown San Francisco. Thanks for watching. We'll see you next time

Published Date : Nov 7 2019

SUMMARY :

Brought to you by day to like you. What kind of changed in the marketplace to to do a rebranding? And we started building some products that were specific to kubernetes and eso. I think you can kind of just think about it if you were writing software, right? So that's the things that we want to help customers with. And I think that's such a key piece. And you you put a couple of your best architects on it. So you got you got your white board. And that's where we build a product called Convoy Say Okay, here is your enterprise great. Or, you know, there's a lot of conversation the way we built convoy. And What's the best container run time to pick. And you had, you know, the Horton worst cloud there and map our strategies, but we don't want customers to feel like if I go with D to like you. Doesn't really go with the open source. And so when we built convoy, he said, Look, you know, if we were to start our own company, Okay, so you got to get the training, you got the products. And we said, How can we do the same thing we did with convoy where we bring these projects So now And we said, Look, red hats doing really cool thing here with this operator framework, how can we simplify it? And then finally, at the end of the trail is 1 And that's where you build a product called Commander. So it's so great that you have the perspective of a customer and you bring that directly in And you almost feel like a dinosaur. the story. I We'll see you next time

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Andi	PERSON	0.99+
Cassandra	PERSON	0.99+
Jeffrey	PERSON	0.99+
San Francisco	LOCATION	0.99+
two	QUANTITY	0.99+
Chandler Hoisington	PERSON	0.99+
1000 times	QUANTITY	0.99+
Chandler	PERSON	0.99+
2001	DATE	0.99+
Mason	ORGANIZATION	0.99+
one	QUANTITY	0.99+
Travis	PERSON	0.99+
both	QUANTITY	0.98+
four years ago	DATE	0.98+
Mesa Sphere	ORGANIZATION	0.98+
thousands of logos	QUANTITY	0.98+
two technologies	QUANTITY	0.97+
Duke	ORGANIZATION	0.97+
today	DATE	0.97+
Day two	QUANTITY	0.96+
day two	QUANTITY	0.96+
Jenkins	PERSON	0.96+
each	QUANTITY	0.95+
16 4017	OTHER	0.95+
SOS	ORGANIZATION	0.95+
Day one	QUANTITY	0.94+
first	QUANTITY	0.94+
Mace ose	ORGANIZATION	0.92+
tonight	DATE	0.92+
Zero idea	QUANTITY	0.92+
Chandler	ORGANIZATION	0.92+
IQ	ORGANIZATION	0.92+
DCS	ORGANIZATION	0.91+
Cloudera	ORGANIZATION	0.9+
one step	QUANTITY	0.9+
one thing	QUANTITY	0.9+
Cube	ORGANIZATION	0.9+
Kubernetes	PERSON	0.89+
Kubernetes	ORGANIZATION	0.88+
Midwest	LOCATION	0.88+
Horton	ORGANIZATION	0.87+
one more step	QUANTITY	0.85+
Eso	ORGANIZATION	0.82+
DCS	TITLE	0.8+
Lee	ORGANIZATION	0.79+
earlier this year	DATE	0.78+
Thio Enterprise	ORGANIZATION	0.78+
C.	TITLE	0.78+
once	QUANTITY	0.78+
1 800	QUANTITY	0.77+
D2iQ	PERSON	0.74+
one specific technology	QUANTITY	0.74+
Convoy	ORGANIZATION	0.73+
Kool Aid	ORGANIZATION	0.7+
D.	ORGANIZATION	0.69+
day	QUANTITY	0.68+
one decision	QUANTITY	0.67+
a second	QUANTITY	0.66+
Kudo	PERSON	0.65+
Maur	ORGANIZATION	0.65+
Lord	PERSON	0.64+
West	ORGANIZATION	0.6+
D2iQ	TITLE	0.59+
May	ORGANIZATION	0.54+
Zero	QUANTITY	0.53+
Day	OTHER	0.53+
zero	QUANTITY	0.52+
C	TITLE	0.52+
Asian	LOCATION	0.5+
tectonic	TITLE	0.5+
d	ORGANIZATION	0.46+
132	QUANTITY	0.43+
Cube	TITLE	0.42+
Maur Maur	PERSON	0.4+
O	ORGANIZATION	0.34+
Cloud Native	TITLE	0.33+

Keith Griffin, Cisco | Cisco Live US 2019

>> Announcer: Live from San Diego California, it's The Cube. Covering Cisco Live US 2019. Brought to you by Cisco, and it's ecosystem partners. >> Welcome back to The Cube. Lisa Martin with Stu Miniman. Day three of our coverage of Cisco Live. We're pleased to welcome to TheCube, Keith Griffin, Principal Engineer Collaboration from Cisco. Keith, good morning Welcome. >> Morning. Thanks for having me. >> So, lots of announcements this morning, or this week with respect to collaboration, cognitive collaboration. Webex intelligence a lot of Webex users out there. Walk us through Webex intelligence. >> Keith Griffin: Sure well Webex Intelligence and Cognitive Collaboration, it brings together a set of underlying AI and machine learning, that technologies, we loosely break them down into four areas. Relationship intelligence, which is were people insights would sit. Computer vision, where we would see our face recognition and name labels for meetings. Multi-modal bots and assistance where we would have our Webex assistant offer, and audio and speech technologies where we've got some interesting features like noise detection in meetings when you've got those like annoying dogs barking in the background when your having your meeting, and also something that we were just about to release meeting transcription, so that you can no longer have to take meeting notes and our intelligence platform will take the notes for you. >> Stu Miniman: - All right, so Keith Lisa and I did enterprise connect earlier and it's amazing some of the things that are happening. You talk about you know cloud and AI coming into meetings. Part of me is a little worried. I worked in telcom back in the 90's and it feels like in may ways in the last 20 years, we haven't got beyond the, Okay in the first 10 minutes of the meeting, let's make sure everybody's in are the right people talking, are the right people muted, I mean the machines are going to make this really easy for us so that we can stop the human people messing it up right? >> Exactly, exactly and one of the things that's interesting about that, well actually one thing I'll say is that I also can from telecom's in the 90's, I've seen that journey all the way through, and I'm still six to eight minutes late for meetings when I start them and I'd love to blame the technology and lots of people do but let's face it, hands up we're factors in this as well. We have the most amazing non cognitive features like one button to push. A single green button I just have to push that to start the meeting, but guess what I have to don't do? I don't push the button 'cause I'm setting up my laptop, or I'm taking my coat off or I'm generally getting settled in. So the technology assistance at this stage is really good. And what we wanted to do was look at, how can we take the friction out of joining a meeting even when we've got such a simple experience, and we found out things like Webex Assistance where I can just speak to the system definitely does that. It speeds the access to the meeting, but one of the things we tried out with Webex Assistance which were just about the release was call proactive mode. Now proactive mode was where I don't even have to say okay Webex join the meeting. It says to me, "Hey Keith looks like your ready to start your meeting, will I get it started for you." I simply say "yes", and while I'm setting up my laptop and taking off my coat, where were right in on getting the meeting going, and that was something we came across during our early field trials and we saw a huge adoption for customers so we go right on developing that it's going to be available soon. >> One of the things that Stu mentioned we were at Enterprise Connect a couple of months ago, Amy Chain, one of my favorite key notes, she's so animated. I know she was on stage yesterday. She announced people insights a couple of months ago. Let's kind of dig into that as the relationship intelligence. What does that mean, what does that enable, and how is that an enabler of reducing friction? >> Yeah, it's really, it's really on multiple levels I think, there's the before the meeting experience and then during the meeting. So one of the things that we found through a survey that we just recently completed was that, I don't was to misquote it but there was healthy percentage of people I'm going to guess at about 40-45% that spend a significant amount of time before a meeting googling and figuring out who they are meeting with. To try and find out more to have that connection when they get to the meeting. So what if we could just dynamically do that, and there was no need to go search or spend time ahead of the meeting, so that's one area of friction reduced or removed, so you can go right in there and you've got that personalized briefing for the meeting itself. >> So what do I see, is this.., I'm logging into Webex, or is it before the meeting and it, what kind of information about the person I'm talking to does it populate for me? >> Yeah so in the meeting itself, on your roster, you can click on a new icon that's beside the participant, and you can find out public profile information about the user, that's on the meeting, as well as their corporate directory information, if you're in that organization, and also news about their company, so I would have the latest Cisco news and just a general description of what our company does, and if I'm meeting with somebody else, they see that about me, they see my education background, and anything else I choose to offer, and choice is important, the fact that me as an end user, I'm in control of that, I'm in control of that data, I can edit it, I can hide it, I can delete it, I think that's really critical, in this era of data privacy and machine learning eccentric solutions, so that's how it happens in the meeting itself, and we're looking more also at personalized briefings and looking at how we can bring that forward, and also looking at areas like, how we could bring that into the video experience, you don't want to clutter the video experience with all this information, but it would be nice to have something more than even a name label which is useful to have, maybe a title or role or something like that, so we're looking at bringing that across the entire portfolio. >> All right, so Keith, you brought up data privacy, I want you to talk a little bit about some of the other products outside of just, you know, the base Webex, when you talk about things like facial recognition, where is that today, we know is hot button topic, you know, what are seeing and what are the request you're getting from customers. >> Yeah, we're pretty close to be able to release face recognition for name labels and meetings, and the goal of the feature, as the name suggest, is just simply to put a name label on the user, so you have that more personal connection, in the meeting. We're taking our time with the feature, because we want to get the data privacy right from the beginning, it's not something I feel you can add afterwards, you have to have a strong data privacy posture, right from the beginning, so the types of steps that we've taken, are to make sure that this is a disabled feature, so an IT admin must op the organization in, and then individual users must also enroll, and that enrollment step does two things, one, it gives a picture so that we can calculate the mathematical representation of the user for that matching, but also it offers the user the opportunity to consent to their face being used in the system, and that's really critical again back to that point about users being in control of their data, and at any point, they can go back to that, and decide I want to add a new photo, maybe I want to something like a photo with no glasses or with glasses, or with a beard or without a beard to make the system more accurate, but they can go in their and have complete control over that, hide their labels, whatever it is they would want to do. >> Keith, just a follow-up on that, maybe give us, you know, what difference is Webex from some of the other solutions out there when it comes to security and data privacy, there's a lot of new players out there, you know, how does Cisco look at themselves, versus the rest of the floor. >> Yeah, there's a lot of differentiators, probably longer than we would have time for today, but if I take face recognition for example, a lot of those user controls are really critical and important, the way that we can leverage the devices as well as the cloud, I think is a really critical aspect of that, if I think about something like, or noise detection which we haven't talked about from a data privacy point of view, we do that in the device or on the Webex client, not streaming to the cloud, and the idea is to reduce that creep factor at every aspect that we possibly can, right, so addressing data privacy mitigation at every single point, there's no single solution, I think, for it, so when you combine the user controls, where you implement the feature, how you implement the feature, and you roll of that up, it becomes a fairly significant differentiator, I did a session here yesterday, where it was exclusively on data privacy, and I couldn't even present my slides, it turned into an interview, I just stood and answered questions for the hour because people are so interested in this, but the feedback that I got was our posture on data privacy is something that makes the solutions deployable for enterprise customers, and it was great to get that feedback, we've worked hard at it, and we've continued to do that, I think it's something that we actually need to lead with as much as the features themselves. >> So as we look at the Webex platform, and all of the expansions that Cisco has done, one of the biggest complaints with collaboration that we all have as workers is this overload of collaboration apps, and switching back and forth between, Webex and Slack and email and text and all these things, talk to us about what you guys have done to mitigate that, and make Webex a more broad portfolio that would be a greater facilitator of less friction in collaboration. >> Well, that's a really interesting area to talk about, because there's two ways that maybe I would look at that, one is that, from a platform point of view, I think it's no longer good enough to just have phenomenal video and phenomenal audio, and phenomenal share, we have to make sure that we got this intelligent and contextual experience that's woven across that, and then that would bring me to the second part, which is invisible AI, it's making sure that these experiences are, you know, the users don't have to do anything to access them, that they just show up, like meeting transcription, so if I go back and look at a meeting recording afterwards, and all of the notes are neatly organized on a panel on the right hand side, that's AI at work, invisibly for me, and when I go back to review that, I've got everything I need, but I didn't have to go do something to make that happen, so we're trying a lot to focus on these invisible, cognitive experiences throughout the platform. >> Keith, how about the ecosystem, I mean Cisco talks a lot about its partners here, I went through the show floor, collaboration's a big space there, talk a little bit about the expansive ecosystem that Webex has built. >> Yeah and one area in particular that has come up in the last month is that we were able to opensource our MindMeld platform, so we acquired the company MindMeld two years ago, and built Webex assistant using their phenomenal conversationally iPad form, and then took steps in Lorrissa Horton's group to opensource that and make it available to developers and I saw some examples of that yesterday on the show floor, really amazing what people have done, where they've taken Webex assistant and combined it with bots and assistant technology that they've built, on top of the MindMeld data science platform, so I was amazed because it wasn't so long ago when we did that and they have solutions already, so yeah, really interesting first step, but there's a lot more we can do there, I'd like to see us taking Webex assistant and offering extensibility beyond just the MindMeld opensource, I would like to see us look at a multi assistance strategy which we've got, where we could potentially integrate with some of the consumer systems that are out there, consumer assistance in particular, there's a lot that we've done but I think there's a lot more that can be done in bots and the systems phase. >> When we look all of this innovation, the way this innovation that we're riding with, you know, we're in the Devnet zone, Susie Weed talks about the ways of compute, mobile edge, AI everywhere, but also this demand for connectivity, the expansion of 5G that we're expecting, the adoption of wifi 6, how are some of those ways influencing how cognitive collaborations at Cisco is being developed. >> I have never thought about that, but what I would say is that it comes down to one thing, or maybe three things: data, data, data, right, that's, all of those systems produce lots of data, AI machine learning lives on data, it's data and algorithm ultimately, that's what it is, there's tons of algorithms out there, but those areas that you mentioned, those waves, they all produce lots of data and as long as we can act on those, with data privacy in mind, and provide compelling features to customers, I think that it opens up just way more opportunities, and what we've done up to this point cognitive, is really first step type stuff, as amazing as it is, a lot of it is based heavily on supervised machine learning, I think getting to unsupervised learning, reinforcement learning, and acting on those larger data sets is going to bring some really interesting solutions in the future. >> All right, so Keith, look forward a little bit for us, where you know all this machine learning and AI has caused a real growth in some of the breath of the portfolio, what's exciting you kind of in the next six to twelve months, what spaces should we be keeping an eye on in your world. >> One if the areas I've been working most closely on is meeting transcription, and again, it's a tip of the iceberg type solution were we've got the meeting notes and that's great, but I really want to imagine where we could bring that next, so notes are great, but if I didn't have time to go the meeting and I didn't have time to listen to the recording, probably not going to have time for thirty pages of notes, but what if I could get insights and actions, what if I could have Webex assistant help me with that, where I say, okay Webex, what actions did I get on the 10pm meeting yesterday that I missed, that to me is an area that I think it doesn't just personally excite me from a technology point of view, but I think has far reaching impacts for users, and its in approximately that time frame, this is not five years away or ten years away, we're getting there really quickly, so that is the one area that I would really pick out right now, because it gives us the baseline to integrate with a lot of the other cognitive offers we have and really go somewhere with that. >> I would love that, you're right, I mean who has time to listen to a recording let alone read a transcript >> Keith: Right. >> So that's something to look forward to in the future, as well as next time you'll have to come back and give us an example of a customer that has, whether it's a bank or any type of other organization with a lot of work force, you know, distributed work force and some of the big benefits, all the way up to the business, the top line that they're getting so we'll have to look for that for next time. >> Keith: Sure, I'd love to do that. >> Keith, thank you so much for joining Stu and me on theCube this morning, we appreciate it. >> Thanks for having me. >> All right, for Stu Miniman, I'm Lisa Martin, you're watching theCube live from Cisco live, day three, thanks for watching.

Published Date : Jun 12 2019

SUMMARY :

and it's ecosystem partners. We're pleased to welcome to TheCube, Keith Griffin, Thanks for having me. or this week with respect to collaboration, and machine learning, that technologies, and it's amazing some of the things that are happening. and that was something and how is that an enabler of reducing friction? So one of the things that we found through or is it before the meeting and it, and anything else I choose to offer, and what are the request you're getting from customers. and the goal of the feature, as the name suggest, from some of the other solutions out there and the idea is to reduce that creep factor and all of the expansions that Cisco has done, and all of the notes are neatly organized Keith, how about the ecosystem, and I saw some examples of that yesterday on the show floor, the expansion of 5G that we're expecting, and as long as we can act on those, in some of the breath of the portfolio, and I didn't have time to listen to the recording, and give us an example of a customer that has, on theCube this morning, we appreciate it. All right, for Stu Miniman, I'm Lisa Martin,

ENTITIES

Entity	Category	Confidence
Amy Chain	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
Keith Griffin	PERSON	0.99+
Stu Miniman	PERSON	0.99+
Keith	PERSON	0.99+
thirty pages	QUANTITY	0.99+
Webex	ORGANIZATION	0.99+
Susie Weed	PERSON	0.99+
10pm	DATE	0.99+
ten years	QUANTITY	0.99+
two ways	QUANTITY	0.99+
San Diego California	LOCATION	0.99+
Keith Lisa	PERSON	0.99+
six	QUANTITY	0.99+
yesterday	DATE	0.99+
five years	QUANTITY	0.99+
Stu	PERSON	0.99+
two years ago	DATE	0.99+
two things	QUANTITY	0.99+
last month	DATE	0.99+
Lorrissa Horton	PERSON	0.99+
second part	QUANTITY	0.99+
one thing	QUANTITY	0.98+
one	QUANTITY	0.98+
one button	QUANTITY	0.98+
Enterprise Connect	ORGANIZATION	0.98+
iPad	COMMERCIAL_ITEM	0.98+
three things	QUANTITY	0.98+
eight minutes	QUANTITY	0.98+
this week	DATE	0.98+
One	QUANTITY	0.97+
90's	DATE	0.97+
first 10 minutes	QUANTITY	0.97+
today	DATE	0.96+
first step	QUANTITY	0.96+
Day three	QUANTITY	0.96+
MindMeld	ORGANIZATION	0.95+
this morning	DATE	0.94+
one area	QUANTITY	0.92+
couple of months ago	DATE	0.91+
US	LOCATION	0.91+
MindMeld	TITLE	0.89+
2019	DATE	0.89+
twelve months	QUANTITY	0.88+
Slack	TITLE	0.88+
single green button	QUANTITY	0.87+
tons of algorithms	QUANTITY	0.87+
a lot of new players	QUANTITY	0.87+
about 40-45%	QUANTITY	0.86+
Webex	TITLE	0.84+

Dustin Kirkland, Google | CUBEConversation, June 2019

>> from our studios in the heart of Silicon Valley. HOLLOWAY ALTO, California It is a cube conversation. >> Welcome to this Special Cube conversation here in Palo Alto, California at the Cube Studios at the Cube headquarters. I'm John for the host, like you were a Dustin Kirkland product manager and Google friend of the Cuban. The community with Cooper Netease been on the Cube Cube alumni. Dustin. Welcome to the Cube conversation. >> Thanks. John's a beautiful studio. I've never been in the studio and on the show floor a few times, but this is This is fun. >> Great to have you on a great opportunity to chat about Cooper Netease yet of what you do out some product man's working Google. But really more importantly on this conversation is about the fifth anniversary, the birthday of Cuba Netease. Today we're celebrating the fifth birthday of Cooper Netease. Still, it's still a >> toddler, absolutely still growing. You think about how you know Lennox has been around for a long time. Open stack has been around these other big projects that have been around for, you know, going on decades and Lenox this case and Cooper nineties. It's going so fast, but It's only five years old, you know. >> You know, I remember Adam Open Stack event in Seattle many, many years ago. That was six years ago. Pubes on his 10th year. So many of these look backs moments. This is one of them. I was having a beer with Lou Tucker. J J Kiss Matic was like one of the first comes at the time didn't make it, But we were talking about open stagger like this Cooper Netease thing. This is really hot. This paper, this initiative this could really be the abstraction layer to kind of bring all this cloud Native wasn't part of the time, but it was like more of an open stack. Try and move up to stack. And it turned out it ended up happening. Cooper Netease then went on to change the landscape of what containers did. Dr. Got a lot of credit for pioneering that got the big VC funding became a unicorn, and then containers kind of went into a different direction because of Cooper duties. >> Very much so. I mean, the modernization of software infrastructure has been coming for a long time, and Cooper nutty sort of brings it all brings it all together at this point, but putting software into a container. We've been doing that different forest for for a lot of time, uh, for a long time, but But once you have a lot of containers, what do you do with that? Right? And that was the problem that Cooper Nettie solved so eloquently and has, you know, now for a couple of years, and it just keeps getting better. >> You know, you mentioned modernization. Let's talk about that because I think the modernization the theme is now pretty much prevalent in every vertical. I'll be in D. C. Next week for the Amazon Webster was public sector Summit, where modernization of governments and nations are being discussed. Education, modernization of it. We've seen it here. The media business that were participating in is about not where you store the code. It's how you code. How you build is a mindset shift. This has been the rial revelation around the Dev Ops Movement Infrastructures Code, now called Cloud Native. Share your thoughts on this modernization mindset because it really is how you build. >> Yeah, I think the cross pollination actually across industries and we even we see that even just in the word containers, right and all the imagery around shipping and shipping containers, we've applied these age old concepts that have been I don't have perfected but certainly optimized over decades of, actually centuries or millennia of moving things across water in containers. Right. But we apply that to software and boom. We have the step function difference in the way that we we manage and we orchestrated and administer code. That's one example of that cross pollination, and now you're talking about, like optimizing optimized governments or economies but being able to maybe then apply other concepts that we've come a long way in computer science do de bop set a good example? You know, applying Dev ops principles to non computer feels. Just think about that for a second. >> It's mind blowing. And if you think about also the step function you mentioned because I think this actually changed a lot of the entrepreneurial landscape as well and also has shaped open source and, you know, big news this this quarter is map are going to shut down due one of the biggest do players. Cloudera merge with Horton Works fired their CEO, the founder Michael. So has retired, Some say forced out. I don't think so. I think it's more of his time. I'm Rodel still there. Open source is a business model, you know. Can we be the red hat for her? Duped the red? Not really kind of the viable, but it's evolving. So open source has been impacted by this step function. There's a business impact. Talk about the dynamics with step function both on the business side and on how software's built specifically open source. >> You know, you and I have been around open source for a long, long time. I think it started when I was in college in the late nineties on then through my career at IBM. And it's It's interesting how on the fringe open source was for so long and such so so much of my BM career. And then early time spent onside it at Red Hat. It was it was something that was it was different, was weird. It was. It was very much fringe where the right uh, but now it's in mainstream and it's everywhere, and it's so mainstream that it's almost the defacto standard to just start with open source. But you know, there's some other news that's been happening lately that she didn't bring up. But it's a really touchy aspect of open source right now on that's on some of the licenses and how those licenses get applied by software, especially databases. When offered as a service in the cloud. That's one of the big problems. I think that that's that we're we're working within the open >> source, summarize the news and what it means. What's what's happening? What's the news and what's the really business? Our technical impact to the licensing? What's the issue? What's the core issue? >> Yeah, eso without taking judgment any any way, shape or form on this, the the the TL D are on. This is a number of open source database is most recently cockroach D. B. I have adopted a different licensing model that is nonstandard from an open source perspective. Uh, and from one perspective, they're they're adopting these different licensing models because other vendors can take that software and offered as a service, yes, and in some some cases, like Amazon like Sure, you said, uh, and offered as a as a service, uh, and maybe contribute. Maybe pay money to the smaller startup or the open source community behind it. But not necessarily. Uh, and it's in some ways is quite threatening to open source communities and open source companies on other cases, quite empowering. And it's going to be interesting to see how that plays out. The tension between open sourcing software and eventually making money off of it is something that we've we've seen for, you know, at least 25. >> And it continues to go on today, and this is, to me a real fascinating area that I think is going to be super important to keep an eye on because you want to encourage contribution and openness. Att the same time we look at the scale of just the Lenox foundations numbers. It's pretty massive in terms of now, the open source contribution. When you factor in even China and other nations, it's it's on exponential growth, right? So is it just open source? Is the model not necessarily a business? Yeah. So this is the big question. No one knows. >> I think we crossed that. And open source is the model. Um, and this is where me is a product manager. That's worked around open source. I've spent a lot of time thinking about how to create commercial offerings around open source. I spent 10 years at Economical, the first half of which, as an engineer, the second half of which, as a product manager around, uh, about building services, commercial services around 12 And I learned quite a few things that now apply absolutely to communities as well as to a number of open source startups. That that I've advised on DH kind of given them some perspective on maybe some successful and unsuccessful ways to monetize that that opens. >> Okay, so doesn't talk about Let's get back to Coburg. And so I think this is the next level Talk track is as Cooper Netease has established itself and landed in the industry and has adoption. It's now an expansion votes the land adopted expand. We've seen adoption. Now it's an expansion mode. Where does it go from here? Because you look at the tale signs things like service meshes server. Listen, you get some interesting trends that going to support this expansionary stage of uber netease. What is your view about the next expansion everyway what >> comes next? Yeah, I I think I think the next stage is really about democratizing communities for workloads that you know. It's quite obvious where when communities is the right answer at the scale of a Google or a Twitter or Netflix or, you know, some of these massive services that it is obviously and clearly the best answer to orchestrating containers. Now I think the next question is, how does that same thing that works at that massive scale Also worked for me as a developer at a very small scale helped me develop my software. My small team of five or 10 people. Do I need a coup? Burnett. He's If I'm ah five or 10 person startup. Well, I mean, not the original sort of borde vision of communities. It's probably overkill, but actually the tooling has really advanced, and we now >> have >> communities that makes sense on very small scales. You've got things like a three s from from Rancher. You've got micro Kates from from my colleagues at economical other ways of making shrinking communities down to something that fits, perhaps on devices perhaps at the edge, beyond just the traditional data center and into remote locations that need to deploy manage applications >> on the Cooper Netease clustering the some of the tech side. You know, we've seen some great tech trends as mentioned in Claudia Horton. Works and map Our Let's Take Claudia and Horton work. Remember back in the old days when it was booming? Oh, they were so proud to talk about their clusters. I stood up all these clusters and then I would ask them, Well, what do you doing with it? Well, we're storing data. I think so. That became kind of this use case where standing up the cluster was the use case and they're like, OK, now let's put some data in it. It's a question for you is Coburn. Eddie's a little bit different. I'm not seeing they were seeing real use cases. What are people standing up? Cuban is clusters for what specific Besides the same Besides saying I've done it. Yeah, What's the what's the main use case that you're seeing this that has real value? >> Yeah, actually, there's you just jog t mind of really funny memory. You know, back in those big data days, I was CEO of a startup. We were encrypting data, and we were helping encrypt healthcare data for health care companies and the number of health care companies that I worked with at that time who said they had a big data problem and they had all of I don't know, 33 terabytes worth of worth of data that they needed to encrypt. It was kind of humorous sometimes like, Is that really a big, big data problem? This fits on a single disc, you know, Uh, but yeah, I mean, it's interesting how >> that the hype of of the tech was preceding. The reality needs needs, says Cooper Nettie. So I have a Cuban Eddie's cluster for blank. Fill in the blank. What are people saying? >> Yeah, uh, it's It's largely about the modernization. So I need to modernize my infrastructure. I'm going to adopt the platform. That's probably not, er, the old er job, a Web WebSphere type platform or something like that. I'm investing in hardware investing in Software Middle, where I'm investing in people, and I want all of those things to line up with where industry is going from a software perspective, and that's where Cooper Nighties is sort of the cornerstone piece of that Lennox Of course, that's That's pretty well established >> canoes delivery in an integration piece of is that the pipeline in was, that was the fit on the low hanging fruit use cases of Cooper Netease just development >> process. Or it's the operations it's the operations of now got software that I need to deploy across multiple versions, perhaps multiple sites. Uh, I need to handle that upgrade ideally without downtime in a way that you said service mash in a way that meshes together makes sense. I've got a roll out new certificates I need to address the security, vulnerability, thes air, all the things that Cooper and I used to such a better job at then, what people were doing previously, which was a whole lot of four loops, shell strips and sshh pushing, uh, pushing tar balls around. Maybe Debs or rpm's around. That is what Cooper not he's actually really solves and does an elegant job of solving as just a starting point. And that's just the beginning and, you know, without getting ve injury here, you know, Anthros is the thing that we had at Google have built around Cooper Netease that brings it to enterprise >> here the other day did a tweet. I called Anthem. I just typing too fast. I got a lot of crap on Twitter for that mission. And those multi cloud has been a big part of where Cubans seems to fit. You mentioned some of the licensing changes. Cloud has been a great resource for a lot of the new Web scale applications from all kinds of companies. Now, with several issues seeing a lot more than capabilities, how do you see the next shift with data State coming in? Because God stateless date and you got state full data. Yeah, this has become a conversation point. >> Yeah, I think Kelsey Hightower has said it pretty eloquently, as he usually does around the sort of the serval ist movement and lets lets developers focus on just their code and literally just their code, perhaps even just their function in just their piece of code, without having to be an expert on all of the turtles all the way, all the way down. That's the big difference about service have having written a couple of those functions. I can I can really invest my time on the couple of 100 lines of code that matter and not choosing a destro choosing a cougar Nati is choosing, you know, all the stack underneath. I simply choose the platform where I'm gonna drop that that function, compile it, uploaded and then riff and rub. On that >> fifth anniversary, Cooper Netease were riffing on Cooper Netease. Dustin Circle here inside the Cube Cube Alumni you were recently at the coop con in overseas in Europe, Barcelona, Barcelona, great city. Keeps been there many times. Do was there covering for us. Couldn't make this trip, Unfortunately, had a couple daughter's graduating, so I didn't make the trip. Sorry, guys. Um, what was the summary? What was the takeaway? Was the big walk away from that event? What synthesized? The main stories were the most important stories being >> told. >> Big news, big observations. >> It was a huge event to start with. It was that fear of Barcelona. Um, didn't take over the whole space. But I've been there a number of times from Mobile World Congress. But, you know, this is this is cube con in the same building that hosts all of mobile world Congress. So I think 8,000 attendees was what we saw. It's quite celebratory. You know, I think we were doing some some pre fifth birthday bash celebrations, Key takeaways, hybrid hybrid, Cloud, multi Cloud. I think that's the world that we've evolved into. You know, there was a lot of tension. I think in the early days about must stay on. Prem must go to the cloud. Everything's there's gonna be a winner and a loser and everything's gonna go one direction or another. I think the chips have fallen, and it's pretty obvious now that the world will exist in a very hybrid, multi cloud state. Ultimately, there's gonna be some stuff on Prem that doesn't move. There's going to be some stuff better hosted in one arm or public clouds. That's the multi cloud aspect, Uh, and there will be stubborn stuff at the edge and remote locations and vehicles on oil rigs at restaurants and stores and >> so forth. What's most exciting from a trans statement? What do you what? What's what's getting you excited from what you see on the landscape out there? >> So the tying all of that to Cooper Netease, Cuban aunties, is the thing that basically normalizes all of that. You write your application put it in a container and expect to communities to be there to scale that toe. Operate that top grade that to migrate that over time. From that perspective, Cooper nineties has really ticked, ticked all the boxes, and you've got a lot of choices now about which companies here, you're going to use it and where >> beyond communities, a lot of variety of projects coop flow, you got service messes out there a lot of difference. Project. What's What's a dark horse? What's something that sets out there that people should be paying attention to? That you see emerging? That's notable. That should be paying attention. To >> think is a combination of two things. One is pretty obvious, and that's a ML is coming like a freight train and is sort of the next layer of excitement. I think after Cooper, Netease becomes boring, which hopefully if we've done our jobs well, that communities layer gets settled and we'll evolve. But the sort of the hockey stick hopefully settles down and it becomes something super stable. Uh, the application of machine learning to create artificial intelligence conclusions, trends from things that is sort of the next big trend on then I would say another one If you really want the dark horse. I think it's around communications. And I think it's around the difference in the way that we communicate with one another across all forms of media voice, video chat, writing, how we interact with people, how we interact with our our tools with our software and in fact, how our software in Iraq's with us in our software acts with with other software that communications industry is, it's ripe for some pretty radical disruption. And you know some of the organizations and they're doing that. It's early early days on those >> changes. Final point you mentioned earlier in our conversation here about how Dev Ops is influencing impacting non tech and computer science. Really? What did you mean by that? >> Uh, well, I think you brought up unexpectedly and that that you were looking at the way Uh, some other industries are changing, and I think that cross pollination is actually quite quite powerful when you take and apply a skill and expertise you have outside of your industry. But it adds something new and interesting, too, to your professional environment. That's where you get these provocative operations. He's really creative, innovative things that you know. No one really saw it coming. >> Dave Ops principles apply to other disciplines. Yeah, agility. That's that's pointing down waterfall based processes. That's >> one phenomenal example. Imagine that for governments, right to remove some of the like the pain that you and I know. I've got to go and renew my license. My birthday's coming up. I gotta go to renew my driver's license. You know much. I'm dreading going to the the DMV Root >> Canal driver's license on the same. Exactly >> how waterfall is that experience. And could we could we beam or Mohr Agile More Dev Autopsy and some of our government across >> the U. S. Government's procurement practices airbase upon 1990 standards they still want Request a manual, a physical manual for every product violent? Who does that? >> I know that there are organizations trying to apply some open source principles to government. But I mean, think about, you know, just democracy and how being a little bit more open and transparent in the way that we are in open source code, the ability to accept patches. I have a side project, a passion for brewing beer and I love applying open source practices to the industry of brewing. And that's an example of where use professional work, Tio. Compliment a hobby. >> All right, we got to bring some cubic private label, some Q beer. >> If you like sour beer, I'm in the sour beer. >> That's okay. We like to get the pus for us. Final question for you. Five years from now, Cooper needs to be 10 years old. What's the world gonna look like when we wake up five years from now with two Cuban aunties? >> Yeah, I think, uh, I don't think we're struggling with the Cooper nutties. Uh, the community's layer. At that point, I think that's settled science, inasmuch as Lennox is pretty settled. Science, Yes, there's a release, and it comes out with incremental features and bug fixes. I think Cuban aunties is settled. Science management of of those containers is pretty well settled. Uh, five years from now, I think we end up with software, some software that that's writing software. And I don't quite mean that in the way That sounds scary, uh, and that we're eliminating developers, but I think we're creating Mohr powerful, more robust software that actually creates that that software and that's all built on top of the really strong, robust systems we have underneath >> automation to take the heavy lifting. But the human creation still keeping one of the >> humans Aaron the look it's were We're many decades away from humans being out of the loop on creative processes. >> Dustin Kirkland, he a product manager of Google Uh, Cooper Netease guru also keep alumni here in the studio talking about the coup. Burnett. He's 50 year anniversary. Of course, the kid was president creation during the beginning of the wave of communities. We love the trend we love Cloud would left home a tec. I'm Sean for here in Palo Alto. Thanks for watching.

Published Date : Jun 6 2019

SUMMARY :

from our studios in the heart of Silicon Valley. I'm John for the host, like you were a Dustin Kirkland product manager and Google friend I've never been in the studio and on the show floor a few times, Great to have you on a great opportunity to chat about Cooper Netease yet of what you do out some product man's You think about how you know Lennox has been around that got the big VC funding became a unicorn, and then containers kind of went into a different direction I mean, the modernization of software infrastructure has been coming for a long time, This has been the rial revelation around the Dev Ops Movement Infrastructures We have the step function difference in the way that lot of the entrepreneurial landscape as well and also has shaped open source and, but now it's in mainstream and it's everywhere, and it's so mainstream that it's almost the defacto What's the news and what's the really that we've we've seen for, you know, at least 25. Att the same time we look at the scale And open source is the model. is as Cooper Netease has established itself and landed in the industry and has adoption. the scale of a Google or a Twitter or Netflix or, you know, some of these massive services that it edge, beyond just the traditional data center and into remote locations that need to deploy manage on the Cooper Netease clustering the some of the tech side. This fits on a single disc, you know, Uh, but yeah, I mean, it's interesting that the hype of of the tech was preceding. That's probably not, er, the old er And that's just the beginning and, you know, I got a lot of crap on Twitter for that mission. I simply choose the platform where I'm gonna drop that that function, Dustin Circle here inside the Cube Cube That's the multi cloud aspect, on the landscape out there? So the tying all of that to Cooper Netease, Cuban aunties, is the thing that basically normalizes all That you see emerging? Uh, the application of machine learning to create artificial What did you mean by that? at the way Uh, some other industries are changing, and I think that cross pollination Dave Ops principles apply to other disciplines. that you and I know. Canal driver's license on the same. And could we could we beam or Mohr Agile More Dev Autopsy the U. S. Government's procurement practices airbase upon 1990 standards they still want But I mean, think about, you know, just democracy and how being a little bit more open and transparent in What's the world gonna look like when we wake And I don't quite mean that in the way That sounds scary, But the human creation still keeping one of the humans Aaron the look it's were We're many decades away from humans being out of the loop on We love the trend we love Cloud would left home

ENTITIES

Entity	Category	Confidence
Michael	PERSON	0.99+
Europe	LOCATION	0.99+
Dustin Kirkland	PERSON	0.99+
Barcelona	LOCATION	0.99+
10 years	QUANTITY	0.99+
Seattle	LOCATION	0.99+
Sean	PERSON	0.99+
Palo Alto	LOCATION	0.99+
Dustin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
100 lines	QUANTITY	0.99+
John	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Lou Tucker	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lenox	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Cooper	PERSON	0.99+
Cooper Netease	PERSON	0.99+
first half	QUANTITY	0.99+
five	QUANTITY	0.99+
Coburg	LOCATION	0.99+
Cooper Netease	ORGANIZATION	0.99+
DMV	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Iraq	LOCATION	0.99+
second half	QUANTITY	0.99+
8,000 attendees	QUANTITY	0.99+
Palo Alto, California	LOCATION	0.99+
10th year	QUANTITY	0.99+
10 people	QUANTITY	0.99+
Rodel	PERSON	0.99+
June 2019	DATE	0.99+
Claudia Horton	PERSON	0.99+
six years ago	DATE	0.99+
33 terabytes	QUANTITY	0.99+
Horton	PERSON	0.99+
two things	QUANTITY	0.99+
Claudia	PERSON	0.99+
1990	DATE	0.99+
fifth anniversary	QUANTITY	0.99+
Burnett	PERSON	0.99+
Eddie	PERSON	0.99+
D. C.	LOCATION	0.99+
One	QUANTITY	0.99+
uber netease	ORGANIZATION	0.98+
Aaron	PERSON	0.98+
Netflix	ORGANIZATION	0.98+
fifth birthday	QUANTITY	0.98+
Next week	DATE	0.98+
Today	DATE	0.98+
single disc	QUANTITY	0.98+
both	QUANTITY	0.97+
Twitter	ORGANIZATION	0.97+
Red Hat	ORGANIZATION	0.97+
Cube Cube	ORGANIZATION	0.97+
five years	QUANTITY	0.97+
one	QUANTITY	0.97+
Kelsey Hightower	PERSON	0.97+
Economical	ORGANIZATION	0.97+
one perspective	QUANTITY	0.97+
Cubans	PERSON	0.96+
U. S. Government	ORGANIZATION	0.96+
many years ago	DATE	0.96+
first	QUANTITY	0.95+
late nineties	DATE	0.95+
one example	QUANTITY	0.95+
J J Kiss Matic	PERSON	0.95+
Cooper Nettie	PERSON	0.94+
50 year anniversary	QUANTITY	0.94+
China	LOCATION	0.93+
Mobile World Congress	EVENT	0.93+
Cuban	OTHER	0.93+
Dave Ops	PERSON	0.93+
10 person	QUANTITY	0.92+
couple	QUANTITY	0.92+
Coburn	ORGANIZATION	0.92+

Pandit Prasad, IBM | DataWorks Summit 2018

>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.

Published Date : Jun 19 2018

SUMMARY :

Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Rebecca	PERSON	0.99+
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
Pandit	PERSON	0.99+
last year	DATE	0.99+
Python	TITLE	0.99+
yesterday morning	DATE	0.99+
Hortonworks	ORGANIZATION	0.99+
three solutions	QUANTITY	0.99+
Ricky	PERSON	0.99+
Northsys	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
Pandit Prasad	PERSON	0.99+
GDPR	TITLE	0.99+
IBM Analytics	ORGANIZATION	0.99+
first version	QUANTITY	0.99+
both	QUANTITY	0.99+
one year ago	DATE	0.98+
Hortonwork	ORGANIZATION	0.98+
three	QUANTITY	0.98+
today	DATE	0.98+
DSX	TITLE	0.98+
Formworks	ORGANIZATION	0.98+
this year	DATE	0.98+
Atlas	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Granger	ORGANIZATION	0.97+
Gaurdium	ORGANIZATION	0.97+
one	QUANTITY	0.97+
Data Steward Studio	ORGANIZATION	0.97+
two portfolios	QUANTITY	0.97+
Truata	ORGANIZATION	0.96+
DataWorks Summit 2018	EVENT	0.96+
one solution	QUANTITY	0.96+
one way	QUANTITY	0.95+
next year	DATE	0.94+
500 nodes	QUANTITY	0.94+
NTN	ORGANIZATION	0.93+
Watson	TITLE	0.93+
Hortonworks	PERSON	0.93+

Jeff Eckard, IBM | Cisco Live US 2018

>> Live from Orlando Florida, it's theCUBE. Covering Cisco Live 2018, brought to you by Cisco, NetApp, and theCUBE's ecosystem partners. (electronic music flourish) >> Welcome back, I'm Stu Miniman and this is theCUBE's exclusive coverage of Cisco Live 2018 in Orlando Florida. Joining me, my co-host for this segment Dave Vellante sitting in for John Furrier and happy to welcome to the program Jeff Eckard, who's the Vice President of Storage Solutions at IBM. Jeff, thanks so much for joining us. >> Thank you, good to see you guys. >> All right, and 26,000 people here. It'd been many years since I'd been to Cisco Live. There's some things that are same, many of the same faces, but a lot of new jobs, a lot of buzz going on. What's your impression been of the show this week? >> Yeah, it's been an interesting, great show for IBM and our presence, but it's a very large ecosystem of Cisco partners, a lot of their, our joint end users and a lot of focus on multi-cloud. You've consistently heard that as a theme from Cisco as well as IBM since last fall at their partner forum and they've continued it here with a lot of focus on being able to take tools and capabilities and enabling enterprises to manage data where they want to manage it. And it's really interesting, from traditional systems vendors like Cisco, to see that focus particularly around developers. >> It's been fascinating for me to watch. Jeff, you and I have some background in the storage and storage networking piece, specifically, where it was like, OK, where I sit in the stack and I've got a couple of integrations, and we work on our standards here. It's much broader. >> Oh, absolutely. The things that we're working on. We're talking about cloud. There's a lot of software that flows. Data and applications are critically important. Talk a little bit about some of that transformation and how you're seeing the expansion, and-- >> Yeah, no, it's a interesting time. If you think about the opportunities and challenges facing all enterprises, data is at the core of digital transformation, digital enhancement, whatever term you wanna use with it. Typically, it's focused in on wanting to provide realtime insights so that you make better decisions against threats or opportunities. Being able to deliver personalized services to your clients, and then also improving your internal processes and business outcomes. And so data is core for digital transformation, and you kinda see, kind of this web of what we're talking about here and then what we're doing with clients as well. >> You know, Jeff, you talk about multi-cloud, you've been in the business for a while, and throughout your career you've tried to help customers simplify their lives, and everybody felt, I thought, OK, I'm gonna put stuff in the cloud, it's gonna get simpler, and now you see this spate of clouds, whether it's infrastructures of service, private clouds, SaaS, and complexity is, in some regards, never have been higher, particularly as it relates to the data. >> That's right. >> You've gotta figure out, where do you put this stuff? How do you protect it, what about governance? Even if you think security's better in the cloud, it might be different for every cloud. So how is IBM approaching, generally in your team, specifically approaching simplifying the complex of this multi-cloud world? >> Sure, so from an IBM Perspective, at the top level we approached it with innovative technology and a lot of industry expertise, whether it's in financial services or healthcare, cloud and what we do with the public IBM cloud is really important around the services we provide there, data and AI, and then as you come down from that, modern infrastructure is key because modern infrastructure supports the data. So when you look at 80% of enterprises are intending to be multi-cloud. Something like 70% already are, right? Because of what you referenced with the consumption of SaaS. So, multi-cloud is the defacto operating model for applications and then, therefore, for the data. So from an IBM storage and SDI perspective, we kind of view... There are three primary adoption patterns that we're seeing with our clients. The first is around modernizing traditional applications or workloads, which also drags modern infrastructure, flash-based systems, leveraging more of storage efficiency technologies, like compression and dedupe, being able to protect that data, whether it's in a traditional VMware environment or the emerging containers environment. So, yeah, data's at the core. The partnership that we have with Cisco around VersaStack enables us to support traditional private clouds, whether those are built on the VMware set of tools or now, as last week we announced, the VersaStack for IBM Cloud Private. IBM Cloud Private is an enterprise platform for developers to leverage microservices and containerized IBM Middleware Services, whether that's WebSphere or MQ or Microservices Builder, as well as a whole catalog of open source technologies and tools to get agility out of the DevOps process and then also layer on analytics on top of that. >> So customers, they're gonna want consistency across all those clouds. So what role do you guys bring? Are you trying to be a platform of platforms, or is that too aspirational? Obviously, you can't have 100% market shares, so that's not practical. But to the extent that people adopt your technologies, is that how we should be thinking of about it? >> Well, so IBM Cloud Private is an open platform. It's built on Docker runtimes and Kubernetes orchestration. It's open to where you can leverage things like Red Hat OpenShift if you've chosen them for your containers platform, and then we also support the traditional Private Clouds with VMware. So, there's a whole set of tools in there. What we're trying to do from a data management perspective is protect it, whether that's backup and recovery, morphing into this new category of secondary data reuse. So, for instance, from a traditional workflow of just doing backup and recovery, we can now take native format copies of the data, whether that's in Oracle or SQL Server database, et cetera, and take that data to the Public Cloud, where different personas and use cases can act on that data. So you can spin up a VM from that Native format within our tools in the IBM cloud. So that's from a data protection standpoint. On data management, we have, later this year, we'll talk more formally about programs that we have around metadata management. That's where you can index and classify, for instance, unstructured or structured data, and act on that in terms of, where was it last accessed? Who should be accessing it? Is it personally identifiable information? Do I wanna run analytics on it? So the metadata management is an opportunity to plug in to broader IBM things, whether it's Watson data platform or information governance catalogs, to provide that kind of uber across cloud infrastructure management. >> And that's a machine sort of intelligence, automation component, that scale, right? >> It could absolutely be used for augmented intelligence, artificial intelligence, some of the machine learning pieces as well. >> Jeff, Jeff, I'm wondering if you could give us a little insight of some of the places that customers are falling down. We were just talking to a systems integrator before you came on and he said, "Well, sometimes I take a virtualized environment "and I move it and it's not really geared "for this modern platform." Containerization can help in a lot of these environments, so when you talk about the pattern we've seen that works many times is you modernize the platform, and then I can modernize the application, start pulling things apart, start refactoring, start playing with some of these environments because I can't just... Lift and shift can help, but it can't be that's the only move. There's a lot of work that needs to get done, and a lot of time that's underestimated. >> Right, well it's not a panacea, but there is a key tool called Transformation Advisor that is part of the IBM cloud platform. It's intended to assist with the challenge that you just stated, which is, OK, how do I take a traditional workload, determine if it's ready to be containerized, and then start the process of containerization. You can go back to some of the VM migration pieces, too. There's a whole set of tools that enterprises have used. Transformation Advisor is one tooling example of what we can do in the platform. And then we obviously have services through Global Services that can help at a large scale for enterprises to kinda make that step. >> You bring up a good point there, 'cause we always struggle with some of these tool transformations, but if you go back to virtualization it was really some of the organizational things that had to shift. Wonder if you can talk about some of the things that are changing here. This show, we've spent a lot of time talking about Cisco's moving up the stack, network people are much more closer tied to some of those new application development, especially with things like intent-based networking. >> Well, it's a interesting reminder that we get often from clients, 'cause you're really touching at some of the remember the operational steps, things like containerization are interesting new technologies, and there's a lot of advantages to them. But just going back a minute, of the heritage with what we've been doing with Cisco around VersaStack, leveraging it on a VMware environment, we hear a lot from customers that their operational practices really are set around Vmware and the VMware tooling. So one of the things that we did with IBM Cloud Private is, it can run on top of VMware. So as customers want to take a kind of transitive step towards microservices, they can continue to leverage their operational practices around VMware. So it's important to, it sometimes takes enterprises a little bit longer than you may guess, right, to embrace the new set of things. Our product portfolio and our directions are set where they can leverage some of the operational pieces they already have. >> Well, just for our viewers who may not know, I mean, the recent history of IBM and Cisco is quite interesting. IBM at one point purchased a company called BNT, which got sold as part of the X86 sale to Lenovo. That opened up a huge opportunity for IBM and Cisco to partner because it was very clear swim lanes. And that sorta catalyzed a relationship that from your standpoint, VersaStack was sort of the first instantiation of that relationship. So, take us through, sort of, where you guys are in the partnership and where you see it going. >> Sure, yeah, so VersaStack, for folks who may not be familiar, it's a Converge System, right? So it's IBM storage, flash or otherwise, leverages Cisco UCS servers, and then their Nexus and MDS Switching. So it's integrated, validated as a single solution to, as the name implies, to be very versatile and provide agility and flexibility. And so, through our routes to market, either with distribution or resellers or system integrators, it is a way that we can address platforms that matter to our joint customers. We've talked about IBM Cloud Private. A lot of heritage around VMware and SQL server and Oracle and a lot of focus around SAP HANA. So, we typically will partner around which enterprise platforms are we going, and then we also partner, in general, around MDS Switching with Cisco, and we'll talk more about that in months to come as we enhance that relationship. >> So, the solutions part of your title, you just mentioned VMware, Oracle, SAP HANA, there may be others. How do you guys approach solutions? Maybe you can talk about that a little bit. >> Yeah, so a solution, at a PetaLogic level, is a successful repeatable outcome. And what we focus on, then, are the integrations that matter. Those could be, integrations with IBM tools, like we talked about with IBM Cloud Private. Could be the integrations that we do jointly with Cisco through the validated design process for some of these applications or databases. And so we have teams that do the validation work and figure out how we marry IBM capabilities with ecosystem capabilities. And there's a whole, whether we're automating private clouds or accelerating workloads including the partnership that IBM and Cisco have with Horton Works. And then in industry context as well, particularly in healthcare and financial services. We'll pick the platforms that really matter and then do the integrations that enable us to take, whether it's our systems or our software or IBM level capabilities to market. >> I wanna come back to this simplicity theme, specifically in the context of data protection. With all this multi-cloud, data protection has become a really hot topic. You guys have dramatically simplified your data protection offering with Spectra Protect Plus. Talk about data protection, how it's changing from where it used to be just, OK, it's a virtualized world. We kind of understand the challenges of virtual data protection. That has played itself out, and now there's a whole new wave coming. What's your perspective on this? >> Well, I don't know if the virtual is play, I mean, the virtualized environment is still kind of paying the freight, if you will. >> Yeah, played out in terms of-- >> Yes, no, no, yeah, right. >> We understand what had to change. >> Right. And customers have made that change >> Yeah, and your simplicity point on that is really key. So one of the enhancements that we announced last year at VMWorld was Spectrum Protect Plus. So that's an agent list, OVA based, VM based backup and recovery tool. And it's very simple to use. The trick is that we've focused its capabilities around secondary data re-use. So I mentioned earlier, that whole workflow has evolved to where the data has increasing value beyond its primary use, right? So backup and recover, but then we can leverage those native format copies. Spectrum Protect Plus is available either on a bring your own license or a monthly subscription in the IBM cloud, other clouds over time. And so we enable enterprises to not only do the traditional backup and protection, but very simply, move that data to either a secondary or tertiary data center, if that's still a part of their backup architecture, or into the public cloud. And so the simplicity factor comes in, again, that it's agent lists. There's a catalog of where all your copies are, and you can reuse that data for whether it's DevOps or DevTest or analytics purposes. >> OK, so that's helpful. So what I'm trying to get to was sort of the enablers, maybe from a technology standpoint, because in the virtualization world, it was all about efficiency because you didn't have the underutilized physical resources anymore. >> Yep, right. >> All the servers utilized 10%. (chuckles) Well, I got rid of a lot of those physical servers, and the one job that needed that power was backup, so I needed a new way to approach it. What I'm hearing is, in this multi-cloud world, it's a focus on simplicity. I'm inferring from that, a cloud-like experience, maybe some other capabilities that you guys are-- >> Yeah, so. >> Doing away with. >> The containers are a progression. I mean, VMware came around to maximize your CPU and storage utilization. Containers provide yet another level of efficiency on top of that. They bring with them the need for changes in your data protection. And so we, at Think in March, we talked about our directions around container aware data protection and container aware snapshots. Most vendors will use snapshots and then volume level controls of how we've traditionally done backup. We have a progression, and we'll talk more about it later in the year, of how we do snapshots, again, that are container aware. They leverage our tools, such as Spectrum Copy Data Management, Spectrum Protect Plus, integrate with our arrays. But they'll bring the same level of capability that we've had traditionally in a virtualized environment to also support data protection in a container world. >> Well, it's an interesting landscape right now in data protection. >> Oh, it's awesome! There's so many new tools, and it's great to be able, (Dave chuckling) like we talked about earlier, to partner with Cisco around some of this as well. >> Great, Jeff, I wanna give you the final word, as if, for those that couldn't make it to the show, either share key conversation you're having, you're hearing from customers, or a big takeaway from the show that you'd like to share. >> Sure, yeah, we've had a lot of customers come up and wanna know, OK, well, how do you start, right? And we talked about, there are three primary adoption patterns, whether it's modernizing, and typically it will start with modernizing traditional workloads. 70% of private cloud usage is for that particular use case. Well, you can pretty quickly show them, then, the progression to, OK, they wanna be more agile. They wanna go cloud-native. From that private cloud infrastructure, you can do that, and then you can have a consistent way that you interact around services in the public cloud. And so that's what we've been talking to clients about. They wanted to know, how do I start with what I have, and then how do I get to this better future? And how do I leverage your tools and capabilities? And so whether that's with IBM systems components or what we do with our partnership with Cisco, we're showing them how we, collectively, can help them on that journey. >> All right, Jeff, I really appreciate all the updates. Dave, thanks so much for joining me for this segment. >> Yeah, thank you. >> We still have a full day here, three days wall-to-wall coverage of theCUBE, Cisco Live 2018. Thanks so much for watching. (techno musical flourish)

Published Date : Jun 13 2018

SUMMARY :

Covering Cisco Live 2018, brought to you by Cisco, and happy to welcome to the program but a lot of new jobs, a lot of buzz going on. and a lot of focus on multi-cloud. and I've got a couple of integrations, There's a lot of software that flows. and then what we're doing with clients as well. and now you see this spate of clouds, You've gotta figure out, where do you put this stuff? and then as you come down from that, So what role do you guys bring? and take that data to the Public Cloud, some of the machine learning pieces as well. a little insight of some of the places that is part of the IBM cloud platform. that had to shift. So one of the things that we did with IBM Cloud Private is, in the partnership and where you see it going. and then we also partner, in general, So, the solutions part of your title, Could be the integrations that we do jointly and now there's a whole new wave coming. kind of paying the freight, if you will. what had to change. And customers have made that change and you can reuse that data for whether it's DevOps because in the virtualization world, and the one job that needed that power was backup, and then volume level controls Well, it's an interesting landscape right now and it's great to be able, (Dave chuckling) or a big takeaway from the show that you'd like to share. and then you can have a consistent way All right, Jeff, I really appreciate all the updates. Thanks so much for watching.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Eckard	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Lenovo	ORGANIZATION	0.99+
10%	QUANTITY	0.99+
100%	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
BNT	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Orlando Florida	LOCATION	0.99+
70%	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
last year	DATE	0.99+
John Furrier	PERSON	0.99+
March	DATE	0.99+
first	QUANTITY	0.99+
three days	QUANTITY	0.99+
26,000 people	QUANTITY	0.99+
Oracle	ORGANIZATION	0.99+
theCUBE	ORGANIZATION	0.98+
three	QUANTITY	0.98+
last week	DATE	0.98+
NetApp	ORGANIZATION	0.98+
later this year	DATE	0.97+
last fall	DATE	0.97+
this week	DATE	0.97+
one	QUANTITY	0.96+
WebSphere	TITLE	0.96+
uber	ORGANIZATION	0.96+
VersaStack	ORGANIZATION	0.96+
SQL	TITLE	0.96+
SAP HANA	TITLE	0.96+
VMware	TITLE	0.95+
Think	ORGANIZATION	0.94+
Transformation Advisor	TITLE	0.94+
single solution	QUANTITY	0.94+
Red Hat OpenShift	TITLE	0.93+
Cisco Live 2018	EVENT	0.93+
DevTest	TITLE	0.93+
X86	COMMERCIAL_ITEM	0.92+
first instantiation	QUANTITY	0.92+
VMWorld	ORGANIZATION	0.92+
Vice President	PERSON	0.92+
DevOps	TITLE	0.91+
Kubernetes	TITLE	0.91+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Action Item | The Role of Open Source

>> Hi, I'm Peter Burris, Welcome to Wikibon's Action Item. (slow techno music) Once again Wikibon's research team is assembled, centered here in The Cube Studios in lovely Palo Alto, California, so I've got David Floyer and George Gilbert with me here in the studio, on the line we have Neil Raden and Jim Kobielus, thank you once again for joining us guys. This week we are going to talk about an issue that has been dominant consideration in the industry, but it's unclear exactly what direction it's going to take, and that is the role that open source is going to play in the next generation of solving problems with technology, or we could say the role that open source will play in future digital transformations. No one can argue whether or not open source has been hugely consequential, as I said it has been, it's been one of the major drivers of not only new approaches to creating value, but also new types of solutions that actually are leading to many of the most successful technology implementations that we've seen ever, that is unlikely to change, but the question is what formal open source take as we move into an era where there's new classes of individuals creating value, like data scientists, where those new problems that we're trying to solve, like problems that are mainly driven by the role that data as opposed to code plays, and that there are new classes of providers, namely service providers as opposed to product or software providers, these issues are going to come together, and have some pretty important changes on how open source behaves over the next few years, what types of challenges it's going to successfully take on, and ultimately how users are going to be able to get value out of it. So to start the conversation off George, let's start by making a quick observation, what has the history of open source been, take us through it kind of quickly. >> The definition has changed, in its first incarnation it was fixed UNIX fragmentation and the high price of UNIX system servers, meaning UNIX the proprietary UNIX's and the proprietary servers they were built, that actually rather quickly morphed into a second incarnation where it was let's take the Linux stack, Linux, Apache, MySQL, PHP, Python, and substitute that for the old incumbents, which was UNIX, BEA Web Logic, the J2E server and Oracle Database on an EMC storage device. So that was the collapse of the price of infrastructure, so really quickly then it morphed into something very, very different, which was we had the growth of the giant Internet scale vendors, and neither on pricing nor on capacity could traditional software serve their needs, so Google didn't quite do open source, but they published papers about what they did, those papers then were implemented. >> Like Map Produce. Yeah Map Produce, Big Table, Google File System, those became the basis of Hadoop which Yahoo open sourced. There is another incarnation going, that's probably getting near its end of life right now, which is sort of a hybrid, where you might take Kafka which is open source, and put sort of proprietary bits around it for management and things like that, same what Cloudera, this is called the open core model, it's not clear if you can build a big company around it, but the principle is, the principle for most of these is, the value of the software is declining, partly because it's open source, and partly because it's so easy to build new software systems now, and the hard part is helping the customer run the stuff, and that's where some of these vendors are capturing it. >> So let's David turn our attention to how that's going to turn into actual money. So in this first generation of open source, I think up until now, certainly Red Hat, Canonical have made money by packaging and putting forward distributions, that have made a lot of money, IBM has been one of the leaders in contributing open source, and then turning that into a services business, Cloudera, Horton Works, NapR, some of these other companies have not generated the same type of market presence that a Red Hat or Canonical have put forward, but that doesn't mean there aren't companies out there that have been very successful at appropriating significant returns out of open source software, mainly however they're doing it as George said, as a service, give us some examples. >> I think the key part of open source is providing a win-win environment, so that people are paid to do stuff, and what is happening now a lot is that people are putting stuff into open source in order that it becomes a standard, and also in order that it is maintained by the community as a whole. So those two functions, those two capabilities of being paid by a company often, by IBM or by whoever it is to do something on behalf of that company, so that it becomes a standard, so that it becomes accepted, that is a good business model, in the sense that it's win-win, the developer gets recognition, the person paying for it achieves their business objective of for example getting a standard recognized-- >> A volume. >> Volume, yes. >> So it's a way to get to volume for the technology that you want to build your business around. >> Yes, what I think is far more difficult in this area is application type software, so where open source has been successful, as George said is in the stacks themselves, the lower end of the stacks, there are a few, and they usually come from very very successful applications like Word, Microsoft Word, or things like that where they can be copied, and be put into open source, but even there they have around them software from a company, Red Hat or whoever it is, that will make it successful. >> Yes but open office wasn't that successful, get to the kind of, today we have Amazon, we have some of the hyper scalars that are using that open core model and putting forward some pretty powerful services, is that the new Red Hat, is that the new Canonical? >> The person who's made most money is clearly Amazon, they took open source code and made it robust, and made it in volume, those are the two key things you to have for success, it's got to be robust, it's got to be in volume, and it's very difficult for the open source community to achieve that on its own, it needs the support of a large company to do that, and it needs the value that that large company is going to get from it, for them to put those resources in. So that has been a very successful model a lot of people decry it because they're not giving back, and there's an argument-- >> They being Amazon, have not given back quite as much. >> Yes they have relatively very few commiters. I think that's more of a problem in the T&Cs of the open source contract, so those should probably be changed, to put more onus on people to give back into the pool. >> So let me stop you, so we have identified one thing that is likely going to have to be evolved as we move forward, to prevent problems, some of the terms and conditions, we try to ensure that there is that quid pro quo, that that win-win exists. So Jim Kobielus, let me ask you a question, open source has been, as David mentioned, open source has been more successful where there is a clear model, a clear target of what the community is trying to build, it hasn't been quite successful, where it is in fact is expected that the open source community is going to start with some of the original designs, so for example, there's an enormous plethora of big data tools, and yet people are starting to ask why is big data more successful, and partly it's because putting these tools together is so difficult. So are we going to see the type of artifacts and assets and technologies associated with machine learning, AI, deep learning et cetera, easily lend themselves to an open source treatment, what do you think? >> I think were going to see open source very much take off in the niches of the deep learning and machine learning AI space, where the target capabilities we've built are fairly well understood by our broad community. Machine learning clearly, we have a fair number of frameworks that are already well established, with respect to the core capabilities that need to be performed from modeling and training, and deployment of statistical models into applications. That's where we see a fair amount of takeoff for Tensor Flow, which Google built in an open source, because the core of deep learning in terms of the algorithm, in terms of the kinds of functions you perform to be able to take data and do feature engineering and algorithm selection are fairly well understood, so those are the kinds of very discreet capabilities for which open source code is becoming standard, but there's many different alternative frameworks for doing that, Tensor Flow being one of them, that are jostling for presence in the market. The term is commoditized, more of those core capabilities are being commoditized by the fact that there well understood and agreed to by a broad community. So those are the discrete areas we're seeing the open source alternatives become predominant, but when you take a Tensor Flow and combine it with a Spark, and with a Hadoop and a Kafka and broader collections of capabilities that are needed for robust infrastructure, those are disparate communities that each have their own participants committed and so forth, nobody owns that overall step, there's no equivalent of a lamp stack were all things to do with deep learning machine learning AI on an open source basis come to the fore. If some group of companies is going to own that broadening stack, that would indicate some degree of maturation for this overall ecosystem, that's not happening yet, we don't see that happening right now. >> So Jim, I want to, my bias, I hate the term commoditization, but I Want to unify what you said with something that David said, essentially what we're talking about is the agreement in a collaborative open way around the conventions of how we perform work that compute model which then turns into products and technologies that can in fact be distributed and regarded as a standard, and regarded as a commodity around which trading can take place. But what about the data side of things George, we have got, Jim's articulated I think a pretty good case, that we're going to start seeing some tools in the marketplace, it's going to be interesting to see whether that is just further layering on top of all this craziness that is happening in the big data world, and just adding to it in the ML world, but how does the data fit into this, are we going to see something that looks like open source data in the marketplace? >> Yes, yes, and a modified yes. Let me take those in two pieces. Just to be slightly technical, hopefully not being too pedantic, software used to mean algorithms and data structures, so in other words the recipe for what to do, and the buckets for where to put the data, that has changed in the data in terms of machine learning, analytic world where the algorithms and data are so tied together, the instances of the data, not the buckets, that the data changed the algorithms, the algorithms change the data, the significance of that is, when we build applications now, it's never done, and so you go, the construct we've been focusing on is the digital twin, more broadly defined than a smart device, but when you go from one vendor and you sort of partially build it, it's an evergreen thing, it's never done, then you go to the next vendor, but you need to be able to backport some core of that to the original vendor, so for all intents and purposes that's open source, but it boils down to actually the original Berkeley license for open source, not the Apache one everyone is using now. And remind me of the other question? >> The other issue is are we going to see datasets become open source like we see code bases and code fragments and algorithms becoming open source? >> Yes this is also, just the way Amazon made infrastructure commoditized and rentable, there are going to be many datasets were they used to be proprietary, like a Google web crawl, and Google knowledge graph of disambiguation people, places and things, some of these things are either becoming open source, or openly accessible by API, so when you put those resources together you're seeing a massive deflation, or a massive shrinkage in the capital intensity of building these sorts of apps. >> So Neil, if we take a look at where we are this far, we can see that there is, even though we're moving to a services oriented model, Amazon for example is a company that is able to generate commercial rents out of open source software, Jim has made a pretty compelling case that open source software can be, or will emerge out of the tooling world for some of these new applications, there are going to be some examples of datasets, or at least APIs to datasets that will look more open source like, so it's not inconceivable that we'll see some actual open source data, I think GDPR, and some other regulations, we're still early in the process of figuring out how we're going to turn data into commodity, using Jim's words. But what about the personnel, what about the people? There were reasons why developers moved to open source, some of the soft reasons that motivated them to do things, who they work with, getting the recognition, working on relevant projects, working with relevant technologies, are we going to see a similar set of soft motivators, diffuse into the data scientist world, so that these individuals, the real ones who are creating the real value, are going to have some degree of motivation to participate with each other collaborate with each other in an open source way, what do you think? >> Good question, I think the answer is absolutely true, but it's not unique to data scientists, academics, scientists in molecular biology, civil engineers, they all wannabe recognized by their peers, on some level beyond just their, just what they're doing in their organization, but there is another segment of data scientists that are just guys working for a paycheck, and generating predictive analysis and helping the company along and so forth, and that's what they're going to do. The whole open source thing, you remember object programming, you remember JavaBeans, you remember Web Services, we tried to turn developers into librarians, and when they wanted to develop something, you go to Github, I go to Github right now and I say I'm looking for a utility that can figure out why my face is so pink on this camera, I get 1000 listings of programs, and have no idea which ones work and which ones don't, so I think the whole open source thing is about to explode, it already has, in terms of piece parts. But I think managing in an organization is different, and when I say an organization, there's the Googles and the Amazons and so forth of the world, and then there's everybody else. >> Alright so we've identified an area where we can see some consequence of change where we can anticipate some change will be required to modernize the open source model, the licensing model, we see another one where the open source communities going to have to understand how to move from a product and code to a data and service orientation, can we think of any others? >> There is one other that I'd like to add to that, and that is compliance. You addressed it to some extent, but compliance brings some real-world requirements onto code and data, and you were saying earlier on that one of the options is bringing code and data so that they intermingle and change each other, I wonder whether that when you look at it from a compliance point of view will actually pass muster, because you need from a compliance point of view to prove, for example, in the health service, that it works, and it works the same way every time, and if you've got a set of code and data that doesn't work the same every time, you probably are going to get pushed back from the people who regularly health that this is not, you can't do it that way, you'll have to find another way to do it. But that again is, is at the same each time, so the point I'm making-- >> This is a bigger issue than just open source, this is an issue where the idea if continuous refinement of the code, and the data-- >> Automatic refinement. >> Automatic refinement, could in fact, we're going to have to change some compliance laws, is open source, is it possible the open source community might actually help us understand that problem? >> Absolutely, yes. >> I think that's a good point, I think that's a really interesting point, because you're right George, the idea of a continuous development, is not something that for example Serr Banes actually says I get this, Serr Banes actually says "Oh yeah, I get this." Serr Banes actually is like, yes the data, I acknowledge that this date is right, and I acknowledge the process by which it was created was read, now this is another subject, let's bring this up later, but I think it's relevant here, because in many respects it's a difference between an income statement and balance sheet right? Saying it's good now, is kind of like the income statement, but let's come back to this, because I think it's a bigger issue. You're asserting the open source community in fact may help solve this problem by coming up with new ways of conceiving say versioning of things, and stamping things and what is a distribution, what isn't a distribution, with some of these more tightly bound sets of-- >> What we find normally is that-- >> Jim: I think that we are going to-- >> Peter: Go on Jim. >> Just to elaborate on what Peter was talking about, that whole theme, I think what we're going to see is more open source governance of models and data, within distributed development environments, using technologies like block chain as a core enabler for these workflows, for these as it were general distributed hyper ledgers indicate the latest and greatest version of a given dataset, or a given model being developed somewhere around some common solution domain, I think those kinds of environments for governance will become critically important, as this pipeline for development and training and deployment of these assets, gets ever more distributed and virtual. >> By the way Jim I actually had a conversation with a very large open source distribution company a few months ago about this very point, and I agree, I think blockchain in fact could become a mechanism by which we track intellectual property, track intellectual contributions, find ways to then monetize those contributions, going back to what you were saying David, and perhaps that becomes something that looks like the basis of a new business model, for how we think about how open source goes after these looser, goosier problems. >> But also to guarantee integrity without going through necessarily a central-- >> Very important, very important because at the end of the day George-- >> It's always hard to find somebody to maintain. >> Right, big companies, one of the big challenges that companies today are having is that they do open source is that they want to be able to keep track of their intellectual property, both from a contribution standpoint, but also inside their own business, because they're very, very concerned that the stuff that they're creating that's proprietary to their business in a digital sense, might leave the building, and that's not something a lot of banks for example want to see happen. >> I want to stick one step into this logic process that it think we haven't yet discussed, which is, we're talking about now how end customers will consume this, but there still a disconnect in terms of how the open source software vendor's or even hybrid ones can get to market with this stuff, because between open source pricing models and pricing levels, we've seen a slow motion price collapse, and the problem is that, the new go to market motion is actually made up of many motions, which is discover, learn, try, buy, recommend, and within each of those, the motion was different, and you hear it's almost like a reflex, like when your doctor hit you on the knee and your leg kind of bounced, everybody says yeah we do land and expand, and land was to discover, learn, try augmented with inside sales, the recommend and standardizes still traditional enterprise software where someone's got to talk to IT and procurement about fitting into the broader architecture, and infrastructure of the firm, and to do that you still need what has always been called the most expensive migratory workforce in the world, which is an enterprise sales force. >> But I would suggest there's a big move towards standardization of stacks, true private cloud is about having a stack which is well established, and the relationship between all the different piece parts, and the stack itself is the person who is responsible for putting that stack and maintaining that stack. >> So for a moment pretend that you are a CIO, are you going to buy OpenStack or are you going to buy the Vmware stack? >> I'm going to buy Vmware stack. >> Because that's about open source? >> No, the point I'm saying is that those open source communities or pieces, would then be absorbed into the stack as an OEM supplier as opposed to a direct supplier and I think that's true for all of these stacks, if you look at the stack for example and you have code from Netapp or whatever it is that's in that code and they're contributing It You need an OEM agreement with that provider, and it doesn't necessarily have to be open source. >> Bottom line is this stuff is still really, really complicated. >> But this model of being an OEM provider is very different from growing an enterprise sales force, you're selling something that goes into the cost of goods sold of your customer, and that the cost of goods sold better be less than 15 percent, and preferably less than five percent. >> Your point is if you can't afford a sales force, an OEM agreement is a much better way of doing it. >> You have to get somebody else's sales force to do it for you. So look I'm going to do the Action Item on this, I think that this has been a great conversation again, David, George, Neil, Jim, thanks a lot. So here's the Action Item, nobody argues that open source hasn't been important, and nobody suggests that open source is not going to remain important, what we think based on our conversation today is that open source is going to go through some changes, and those changes will occur as a consequence of new folks that are going to be important to this like data scientists, to some of the new streams of value in the industry, may not have the same motivations that the old developer world had, new types of problems that are inherently more data oriented as opposed process-oriented, and it's not as clear that the whole concept of data as an artifact, data as a convention, data as standards and commodities, are going to be as easy to define as it was in the cold world. As well as ultimately IT organizations increasingly moving towards an approach that focused more on the consumption of services, as opposed to the consumption of product, so for these and many other reasons, our expectation is that the open source community is going to go through its own transformation as it tries to support future digital transformations, current and future digital transformations. Now some of the areas that we think are going to be transformed, is we expect that there's going to be some pressure on licensing, we think there's going to be some pressure in how compliance is handled, and we think the open source community may in fact be able to help in that regard, and we think very importantly that there will be some pressure on the open source community trying to rationalize how it conceives of the new compute models, the new design models, because where open source always has been very successful is when we have a target we can collaborate to replicate and replace that target or provide a substitute. I think we can all agree that in 10 years we will be talking about how open source took some time to in fact put forward that TPC stack, as opposed to define the true private cloud stack. So our expectation is that open source is going to remain relevant, we think it's going to go through some consequential changes, and we look forward to working with our clients to help them navigate what some of those changes are, both as commiters, and also as consumers. Once again guys, thank you very much for this week's Action Item, this is Peter Barris, and until next week thank you very much for participating on Wikibon's Action Item. (slow techno music)

Published Date : Jan 12 2018

SUMMARY :

and that is the role that open source is going to play and substitute that for the old incumbents, and partly because it's so easy to build IBM has been one of the leaders in contributing open source, so that people are paid to do stuff, that you want to build your business around. the lower end of the stacks, it needs the support of a large company to do that, of the open source contract, going to have to be evolved as we move forward, that are jostling for presence in the market. and just adding to it in the ML world, and the buckets for where to put the data, there are going to be many datasets were they used some of the soft reasons that motivated them to do things, and so forth of the world, There is one other that I'd like to add to that, and I acknowledge the process by which Just to elaborate on what Peter was talking about, going back to what you were saying David, are having is that they do open source is that they want and to do that you still need what has always and the stack itself is the person who is responsible and it doesn't necessarily have to be open source. Bottom line is this stuff is still and that the cost of goods sold better an OEM agreement is a much better way of doing it. and it's not as clear that the whole concept

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Neil Raden	PERSON	0.99+
David Floyer	PERSON	0.99+
George Gilbert	PERSON	0.99+
George	PERSON	0.99+
Peter Burris	PERSON	0.99+
Jim	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Neil	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Canonical	ORGANIZATION	0.99+
Peter Barris	PERSON	0.99+
Amazons	ORGANIZATION	0.99+
Horton Works	ORGANIZATION	0.99+
Wikibon	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
less than five percent	QUANTITY	0.99+
Googles	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Red Hat	TITLE	0.99+
Yahoo	ORGANIZATION	0.99+
NapR	ORGANIZATION	0.99+
Word	TITLE	0.99+
less than 15 percent	QUANTITY	0.99+
Cloudera	ORGANIZATION	0.99+
two functions	QUANTITY	0.99+
two capabilities	QUANTITY	0.99+
next week	DATE	0.99+
PHP	TITLE	0.99+
Python	TITLE	0.99+
MySQL	TITLE	0.99+
second incarnation	QUANTITY	0.99+
first incarnation	QUANTITY	0.99+
10 years	QUANTITY	0.98+
Palo Alto, California	LOCATION	0.98+
This week	DATE	0.98+
GDPR	TITLE	0.98+
two key	QUANTITY	0.98+
Linux	TITLE	0.98+
today	DATE	0.97+
1000 listings	QUANTITY	0.97+
one	QUANTITY	0.97+
UNIX	TITLE	0.97+
this week	DATE	0.96+
Github	ORGANIZATION	0.96+
first generation	QUANTITY	0.96+
Vmware	ORGANIZATION	0.96+
each	QUANTITY	0.95+
Kafka	TITLE	0.95+
one step	QUANTITY	0.94+
each time	QUANTITY	0.93+
JavaBeans	TITLE	0.92+
both	QUANTITY	0.91+
BEA Web Logic	ORGANIZATION	0.91+

Dheeraj Pandey, Nutanix | Nutanix NEXT Nice 2017

>> Narrator: Live, from Nice, France. It's theCUBE. Covering .NEXT Conference 2017 Europe. Brought to you by Nutanix. (techno music) >> Welcome back, I'm Stu Miniman and this is SiliconANGLE Media's production of theCUBE. Happy to have a welcome back to the program, CEO and Founder of Nutanix, Dheeraj Pandey. The keynote this morning, talking about how Nutanix really going from a traditional enterprise infrastructure company really becoming it's goal of being an iconic software company. So, Dheeraj, bring us up to speed as to you know, how Nutanix positioned itself for this future. >> Yeah, I think it's it's been a rite of passage because you can't start from AWS in day one. You have to sell books, and sell eCommerce. You know, you being in the eCommerce space. It was a 20 years journey for them before they could get into computing and people took them seriously. I mean, look at Apple with iPod, and then iPhone, and the iPad, and then iTunes and app store. And all that stuff was a journey of 15 years. You know, before they could really see that they've arrived. I think for us we had to build the form factor of an iPhone four so that people realize what this hyperconvergence thing was. Before we could go and ship an android as an operating system. 'Cause if hadn't android operating system come first... Just like Windows Mobile operating system was around for a while and nobody really understood how to really go and make money on it. I think we had to build a form factor first. And now that people grock it, now we can really go and make software out of this. And be swell software and make the android version of the iOS itself. And that's the thing. I think, as a company we're challenged to balance these paradoxes. Oh, I thought you were an appliance company and you believe in this Apple like finesse. Polish and attention to detail. How do you apply that to an android like the shboosh model where you leave it to others to go and build handsets and so on. I think that's the challenge that you've taken upon ourselves. Now inside, with the cloud service, we have a lot of control. With appliances, we have somewhat control because we at least know what our hardware is running on. But software we open it up. And opening it up, and yet not giving up on the attention to detail is the challenge that this company has to, actually, really go and undertake. We are looking at a lot of our tools and bill for certifications, and you know, passing the test. The litmus test for hardware and we're trying to figure out how to automate the heck out of it. Make them into cloud services. So that customers can now go an crowdsource certifications. So, there'll be some new paradigms that will emerge and the reason why we are well placed for those kinds of things is because our heritage is appliance. So now when we think of doing software a lot of the tooling, a lot of the automations, certifications, the attention to detail we had we'll need to go and make them into cloud services. We have some of them, like Cicer is a cloud service. X-ray is a cloud service. Foundation is a cloud service. So a lot of these services will then go and make the job of certifying an unknown piece of hardware easier, actually. I mean in fact, even day two and beyond we have what we can NCC which is a service that runs from within prism to do health checks. And every two hours you can do health checks. So if there's a new piece of hardware that we thought we just certified, we need to keep paranoid about it. Stay paranoid about it, and say, look is the hardware really the hardware we wanted it to be. So there's lots of really innovative things we can do as a company that really had the heritage of appliance to go and do software, as well. >> Yeah, absolutely people have always underestimated the interoperability required. Remember when server virtualization rolled out up the BIOS. You know, could make everything go horribly. Even, you know, containers could give you portability and run everywhere. Oh wait, networking and storage. There's considerations there. Do you think it's getting to a point from a maturation of the market that the software... You know, can you in the future take Nutanix to be a fully software company where you kind of let somebody else take care of the hardware pieces and then you just become their software. And then there's service software services. That seem like a likely future? >> Yeah, I think with the right tools, right level of automation, right level of machine learning, right level of talk-back. You know, I say talk-balk, I mean the fact that the hard beats are coming to us we understand what the customers are doing. And with the right level of paranoia day two and beyond. Which is NCC for example, it's, We call it Nutanix Cluster check. And it does like 350 odd health checks on a periodic basis. And it erases the load, and some things like that. With the right level of paranoia I think we can really go and make this work. And by the way, that's where design comes in. Like, how do you think of X-Ray as a service, and Foundation, and Cicer and NCC and so on. I think that's where the real design of a software company that is also not being callous about hardware comes in, actually. So I'm really looking forward to it. I think... it's not just about tech and products. It's also about go-to-market because go-to-market has a change too. I mean, the kind of packaging, and the kind of pricing, the kind of ELA's, sales compensation, channel programs, a lot of those things have to be revisited as well. As upstream engineering, you talk about, there's a lot of downstream go-to market engineering as well, that needs to be done. >> Now, when it comes to go-to-market, partnerships are key of course. There's the channel. You want to grow your sales channel, and grow a piece. But also from a technology standpoint, there's a comment I heard you make earlier this week. You know, Google has the opportunity to be kind of that next partner. As like Dell was a partner to give you pre-IPO credibility Dell's trusted you. Dell, you have Lenovo, you have IBM up on stage there. As a software company, who are the partners that help Nutanix kind of through this next phase? >> I think you mentioned some of them already. You know, the cloud vendors, though, obviously open up. And there will be new ones that'll open up over time as well. Where we're thinking about ways to blur the lines between public and private. Because I think the world, including the public cloud vendors have come to realize that. You know, you can't have silos. You can't have a public cloud that's separate from the private and so on. So being able to blur the lines, there'll be a lot of cloud partners for us as well. I think on the hardware side, we already talked about all of them, actually. Now, HP and Cisco are right now partners, in double quotes, because we go and make our software work on it, you know. But on some levels they'll probably also have to open up. And they're networking partners that've been working with you know, Arista is a good case in point. Lexi's another one. And security partners, like Palo Alto could be a large one over time because we think about what firewalls need to be look like in the next five years, and so on, you know. I think in every way, I look at even Apache foundation. Which is not really a company but the fact that we can really coop a lot of open source and build COM marketplace apps. Where the apps could be spun up in an on-prem environment and a single tenet on-prem environment. And you can drag and drop them into a side merchant intent environment. I think being able to go and do more with Apache. To me it's the... I would say, the biggest game changer for the company would be what else can we do with Apache? You know, 'cause we did a lot the first eight years. I mean, obviously, Linux is a big piece of our overall story, you know. Not just as hypervisor but a controller, and things like that is all Linux based. Which draws the pace of innovation of this company, actually. But beyond Linux, we've used Cassandra and ZooKeeper, RocksDB and things like that. What else can we do with Apache Spark, and Costco, and MariaDB, and things like that. I think we need to go and elevate the definition of infrastructure. To include databases and NoSQL systems, and batch processing hadoop, and things like that. All those things become a part of the overall marketplace story for us, you know. And that's where the really interesting stuff really comes in. >> How do you look at open source from a strategic standpoint from Nutanix? I think it's been phenomenal because we have then operated as a company that's bigger than we are. 'Cause otherwise, I mean, look at VMware. They don't have that goodness. Nor does Microsoft actually. I mean, Amazon is the only one that really goes and makes the best out of open source. >> Explain that, we say Microsoft had a huge push into open source. Especially, you know, kind of publicly the last two or three years. But they've been working on it, they've, you know, heavily embraced containers. You know, they've gone Kubernetes. You know, heavily. >> I'm going to give you examples. I think there's a lot of marchitecture. And what Microsoft is doing is open source. But, of course you know, Linux has to work on Hyper-V. So, that's a given. They cannot make a relevant stack without really making Linux work in Hyper-V. But they tried Hadoop on Windows. And Horton works actually on quartered Hadoop in Windows but there are not too many takers, as you see, you know. Containers will probably continue to make a lot of progress on Linux because of the LXD and LXC engines, and things like that. And there's a lot more momentum on the Linux side of containers then the LB on the Windows side containers >> And even Azure is running more Linux than they are Windows these days. >> Absolutely, now that being said, Azure Stack is still Azure Stack. It's still Hyper-V. It's still system centered, not user-centered and things like that. I think Microsoft software will really, really have to find itself. And change a lot of its thinking to really go and say we truly embrace open source like the way Amazon does. And like the way Facebook does. Like the way Nutanix does, I think. You know, it's a very different way we look at open source. We are much like Facebook and Amazon than someone else. I mean, VMware is way farther away from open source, in that sense. I mean vSphere, overall You know, I mean I would say that it probably is Linux based. ESX is Linux based from 17, 18 years ago. I am sure that curt path has been forked forever. And it's very hard for them to go and uptake from open source from overall upstream stuff actually. That we build, you know I mean, our stuff runs on a palm sized server. A palm sized server, imagine it. And that's where we put in a drone and that's the foundation of an edge cloud for us, in some sense. Our stuff runs on IBM power system because IBM was doing a lot of work with open source KVM that made it easy for us to port it to H-V, and so on. And so, I think H-V is a lot more momentum because it shares that overall core base of open source, as well. And I think, over time we'll do many more things with open source. Including in the platform space. >> Okay, how's Nutanix doing globally. You know, what more do you want to be doing. How would you rate yourself on kind of new tenent as a global company? >> I think it's a great question and it's one of those that's a double edged sword, actually. And I'll tell you what I mean by that. So when you stop growing, non-US business become 50%, 'cause that's pretty much the reflection of ID spend. Half the spend is outside the US, half the spend is within the US. Right from here is 65/35. Which is a very healthy place to be in, actually. I don't want to just think to change to like 50/50 end because that's a proxy for are we stop growing, actually. At the same time, I'd love to be shipping everywhere, because again, I've said that the definition of an enterprise cloud is even more relevant. And, you know, parts of the world that is not US, actually. In that sense, just being able to go and maintain that customer base outside the US. I mean, being able to do it. I mean, you know we recently sold a system in Myanmar, actually. And I was telling my friends that look, now I can die in piece because we have a system in Myanmar, you know. But the very fact that they are partners, and there's the channel community, and there's technology champion and their exports. There are certified people in these remote parts of the world. And the fact that we can support these customers successfully, says a lot about the overall reach of the technology. The fact that it's reliable, the fact that it's easy to use and spin up, and the fact that its easy to get certified on. I think is the core of Nutanix, so I feel good about those things, actually. >> You've reached a certain maturity of product marketed option and we've seen Nutanix starting to spread out into certain things sometimes we call adjacencies. You've talked about some of the different softer pieces. How do you manage the growth, the spread and make sure that, you know, simplicity. We were talking to Seneal this morning about absolutely you want simplicity but you also want to, you know. Where does Nutanix play and where don't they play? You know, where >> That's a great question So, there's a really good book that I was introduced to about two years ago. And it's also... There's some videos on YouTube about this book. It's called, The Founder's Mentality the YouTube video is called The Founder's Mentality, as well. And it talks about this very phenomenon that as companies grow they become complex. So they introduce a problem. It's called the Paradox of Growth. The thing that you want to do, really do, was grow. And that thing that you covered kills you. 'Cause growth creates complexity and complexity is a silent killer of growth. So the thing that you covered is the thing that kills you. And that is the Paradox of Growth, actually. You know, in very simple terms. And then it goes on to talk about what are the things you need to do because you started an insurgent company over time you started acting like you've arrived and you're incumbent now, all of a sudden. And the moment you start thinking like an incumbent you're done, in some sense. What are the headwinds, and what are the tailwinds that you can actually produce to actually stay an insurgent. I think there's some great lessons there about an insurgent mindset, and an owner's mentality and then finally, this obsessions for the front lining. How do you think about customers as the first, last thing. So, I think that's one of the guiding principles of the company. In how can we continue to imbibe the founder's mentality in there as well. Where every employee can be a founder, actually, without really having the founder's tag, and so on. And then internally, there's a lot of things we could do differently, in the way that we do engineering, in the way we do collaboration. I mean, these are all good things to revisit design. Not just the product design piece, but organizational design like what does it mean to have two PIDs a team, and microservices, and product managers, and prism developers and COM developers, assigned to two PIDs a team, and so on. QA developers and so on. So there's a lot of structure that we can put at scale. That continues to make us look small, continues to have accountability at a product manager level so that they act like GM's, as opposed to PM's. Where each of these two PIDs a team are like a quasi PNL. You know they, you can look at them very objectively and you can fund them. If they start to become too big you need to split them. If they are not doing too well, you need to go and kill them, actually. >> Alright, Dheeraj, last question I have for you. Enterprise cloud, I think, you know when it first came out as a term, we said, it was a little bit inspirational. What should we be looking for in a year to really benchmark and show as proof points that it's becoming reality. You know, from Nutanix. >> That's a great point. You know, obviously, when Gartner starts to use the term very close term, you know what I say. Used the term enterprise cloud operating system. And in one of the recent discourses I saw, enterprise cloud operating model. That's very similar to system, vs model, but the operating model of the enterprise cloud is based on the tenants of you know, web skilled engineering you know, the fact that things aren't in commodity servers. Everything is pure software and you have zero differentiation in hardware. And all those differentiation comes in pure software. Infrastructure is cold. All those things are not going away. Now how it becomes easy to use, so that you don't need PhD's to manage it is where consumer grade design comes in. And where you have this notion of prism and calmed that actually come to really help make it easy to use. I think this is the core of enterprise cloud itself, you know. I think, obviously, every layer in this overall cake needs more features, more capability, and so on. But foundationally, it's about web skilled engineering, consumer grade design. And if you're doing these two things getting more workloads, getting more geographies, getting more platforms, getting more features... All those things are basically a rite of passage. You know, you need to continue to do them all the time, actually. >> Alright, Dheeraj, I had a customer on. Said the reason he bought Nutanix was for that fullness of vision. So, always appreciate catching up with you. And we'll be back with lots more coverage here from Nutanix .NEXT, here in Nice, France. I'm Stu Miniman, and you're watching TheCUBE.

Published Date : Nov 8 2017

SUMMARY :

Brought to you by Nutanix. CEO and Founder of the attention to detail and then you just become their software. and the kind of pricing, You know, Google has the opportunity to be the fact that we can really and makes the best out of open source. kind of publicly the because of the LXD and LXC And even Azure and that's the How would you rate yourself on And the fact that we can support and make sure that, you know, simplicity. And the moment you start Enterprise cloud, I think, you know And in one of the recent Said the reason he bought Nutanix

ENTITIES

Entity	Category	Confidence
Dheeraj	PERSON	0.99+
Myanmar	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
NCC	ORGANIZATION	0.99+
Nutanix	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
US	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Lenovo	ORGANIZATION	0.99+
20 years	QUANTITY	0.99+
android	TITLE	0.99+
Dheeraj Pandey	PERSON	0.99+
50%	QUANTITY	0.99+
Apache	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Cicer	ORGANIZATION	0.99+
15 years	QUANTITY	0.99+
Stu Miniman	PERSON	0.99+
iPad	COMMERCIAL_ITEM	0.99+
Palo Alto	ORGANIZATION	0.99+
iOS	TITLE	0.99+
The Founder's Mentality	TITLE	0.99+
iPod	COMMERCIAL_ITEM	0.99+
The Founder's Mentality	TITLE	0.99+
Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
Linux	TITLE	0.99+
each	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Windows	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
ESX	TITLE	0.99+
AWS	ORGANIZATION	0.99+
Nice, France	LOCATION	0.99+
Costco	ORGANIZATION	0.99+
Azure Stack	TITLE	0.98+
H-V	TITLE	0.98+
iPhone four	COMMERCIAL_ITEM	0.98+
first eight years	QUANTITY	0.98+
two PIDs	QUANTITY	0.97+
iTunes	TITLE	0.97+
one	QUANTITY	0.97+
vSphere	TITLE	0.97+
350 odd health checks	QUANTITY	0.97+
YouTube	ORGANIZATION	0.97+

Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

Donna Prlich, Hitachi Vantara | PentahoWorld 2017

>> Announcer: Live, from Orlando, Florida, it's The Cube. Covering PentahoWorld 2017. Brought to you by, Hitachi Vantara. >> Welcome back to Orlando, everybody. This is PentahoWorld, #pworld17 and this is The Cube, The leader in live tech coverage. My name is Dave Vellante and I'm here with my co-host, Jim Kobielus Donna Prlich is here, she's the Chief Product Officer of Pentaho and a many-time Cube guest. Great to see you again. >> Thanks for coming on. >> No problem, happy to be here. >> So, I'm thrilled that you guys decided to re-initiate this event. You took a year off, but we were here in 2015 and learned a lot about Pentaho and especially about your customers and how they're applying this, sort of, end-to-end data pipeline platform that you guys have developed over a decade plus, but it was right after the acquisition by Hitachi. Let's start there, how has that gone? So they brought you in, kind of left you alone for awhile, but what's going on, bring us up to date. >> Yeah, so it's funny because it was 2015, it was PentahoWorld, second one, and we were like, wow, we're part of this new company, which is great, so for the first year we were really just driving against our core. Big-Data Integration, analytics business, and capturing a lot of that early big-data market. Then, probably in the last six months, with the initiation of Hitachi Ventara which really is less about Pentaho being merged into a company, and I think Brian covered it in a keynote, we're going to become a brand new entity, which Hitachi Vantara is now a new company, focused around software. So, obviously, they acquired us for all that big-data orchestration and analytics capability and so now, as part of that bigger organization, we're really at the center of that in terms of moving from edge to outcome, as Brian talked about, and how we focus on data, digital transformation and then achieving the outcome. So that's where we're at right now, which is exciting. So now we're part of this bigger portfolio of products that we have access to in some ways. >> Jim: And I should point out that Dave called you The CPO of Pentaho, but in fact you're the CPO of Hitachi Vantara, is that correct? >> No, so I am not. I am the CPO for the Pentaho product line, so it's a good point, though, because Pentaho brand, the product brand, stays the same. Because obviously we have 1,800 customers and a whole bunch of them are all around here. So I cover that product line for Hitachi Vantara. >> David: And there's a diverse set of products in the portfolios >> Yes. >> So I'm actually not sure if it makes sense to have a Chief Products officer for Hitachi Vantara, right? Maybe for different divisions it makes sense, right? But I've got to ask you, before the acquisition, how much were you guys thinking about IOT and Industrial IOT? It must have been on your mind, at about 2015 it certainly was a discussion point and GE was pushing all this stuff out there with the ads and things like that, but, how much was Pentaho thinking about it and how has that accelerated since the acquisition? >> At that time in my role, I had product marketing I think I had just taken Product Management and what we were seeing was all of these customers that were starting to leverage machine-generated data and were were thinking, well, this is IOT. And I remember going to a couple of our friendly analyst folks and they were like, yeah, that's IOT, so it was interesting, it was right before we were acquired. So, we'd always focus on these blueprints of we've got to find the repeatable patterns, whether it's Customer 360 in big data and we said, well they're is some kind of emerging pattern here of people leveraging sensor data to get a 360 of something. Whether it's a customer or a ship at sea. So, we started looking at that and going, we should start going after this opportunity and, in fact, some of the customers we've had for a long time, like IMS, who spoke today all around the connected cars. They were one of the early ones and then in the last year we've probably seen more than 100% growth in customers, purely from a Pentaho perspective, leveraging Machine-generated data with some other type of data for context to see the outcome. So, we were seeing it then, and then when we were acquired it was kind of like, oh this is cool now we're part of this bigger company that's going after IOT. So, absolutely, we were looking at it and starting to see those early use cases. >> Jim: A decade or more ago, Pentaho, at that time, became very much a pioneer in open-source analytics, you incorporated WECA, the open-source code base for machine-learning, data mining of sorts. Into the core of you're platform, today, here, at the conference you've announced Pentaho 8.0, which from what I can see is an interesting release because it brings stronger integration with the way the open-source analytic stack has evolved, there's some Spark Streaming integration, there's some Kafaka, some Hadoop and so forth. Can you give us a sense of what are the main points of 8.0, the differentiators for that release, and how it relates to where Pentaho has been and where you're going as a product group within Hiatachi Vantara. >> So, starting with where we've been and where we're going, as you said, Anthony DeShazor, Head of Customer Success, said today, 13 years, on Friday, that Pentaho started with a bunch of guys who were like, hey, we can figure out this BI thing and solve all the data problems and deliver the analytics in an open-source environment. So that's absolutely where we came form. Obviously over the years with big data emerging, we focused heavily on the big data integration and delivering the analytics. So, with 8.0, it's a perfect spot for us to be in because we look at IOT and the amount of data that's being generated and then need to address streaming data, data that's moving faster. This is a great way for us to pull in a lot of the capabilities needed to go after those types of opportunities and solve those types of challenges. The first one is really all about how can we connect better to streaming data. And as you mentioned, it's Spark Streaming, it's connecting to Kafka streams, it's connecting to the Knox gateway, all things that are about streaming data and then in the scale-up, scale-out kind of, how do we better maximize the processing resources, we announced in 7.1, I think we talked to you guys about it, the Adaptive Execution Layers, the idea that you could choose execution engine you want based on the processing you need. So you can choose the PDI engine, you can choose Spark. Hopefully over time we're going to see other engines emerge. So we made that easier, we added Horton Work Support to that and then this concept of, so that's to scale up, but then when you think about the scale-out, sometimes you want to be able to distribute the processing across your nodes and maybe you run out of capacity in a Pentaho server, you can add nodes now and then you can kind-of get rid of that capacity. So this concept of worker-nodes, and to your point earlier about the Hitachi Portfolio, we use some of the services in the foundry layer that Hitachi's been building as a platform. >> David: As a low balancer, right? >> As part of that, yes. So we could leverage what they had done which if you think about Hitachi, they're really good at storage, and a lot of things Pentaho doesn't have experience in, and infrastructure. So we said, well why are we trying to do this, why don't we see what these guys are doing and we leverage that as part of the Pentaho platform. So that's the first time we brought some of their technology into the mix with the Pentaho platform and I think we're going to see more of that and then, lastly, around the visual data prep, so how can we keep building on that experience to make data prep faster and easier. >> So can I ask you a really Columbo question on that sort-of load-balancing capabilities that you just described. >> That's a nice looking trench coat you're wearing. >> (laughter) gimme a little cigar. So, is that the equivalent of a resource negotiator? Do I think of that as sort of your own yarn? >> Donna: I knew you were going to ask me about that (laughter) >> Is that unfair to position it that way? >> It's a little bit different, conceptually, right, it's going to help you to better manage resources, but, if you think about Mesos and some of the capabilities that are out there that folks are using to do that, that's what we're leveraging, so it's really more about sometimes I just need more capacity for the Pentaho server, but I don't need it all the time. Not every customer is going to get to the scale that they need that so it's a really easy way to just keep bringing in as much capacity as you need and have it available. >> David: I see, so really efficient, sort of low-level kind of stuff. >> Yes. >> So, when you talk about distributed load execution, you're pushing more and more of the processing to the edge and, of course, Brian gave a great talk about edge to outcome. You and I were on a panel with Mark Hall and Ella Hilal about the, so called, "power of three" and you did a really good blog post on that the power of the IOT, and big data, and the third is either predictive analytics or machine learning, can you give us a quick sense for our viewers about what you mean by the power of three and how it relates to pushing more workloads to the edge and where Hitachi Vantara is going in terms of your roadmap in that direction for customers. >> Well, its interesting because one of the things we, maybe we have a recording of it, but kind of shrink down that conversation because it was a great conversation but we covered a lot of ground. Essentially that power of three is. We started with big data, so as we could capture more data we could store it, that gave us the ability to train and tune models much easier than we could before because it was always a challenge of, how do I have that much data to get my model more accurate. Then, over time everybody's become a data scientist with the emergence of R and it's kind of becoming a little bit easier for people to take advantage of those kinds of tools, so we saw more of that, and then you think about IOT, IOT is now generating even more data, so, as you said, you're not going to be able to process all of that, bring all that in and store it, it's not really efficient. So that's kind of creating this, we might need the machine learning there, at the edge. We definitely need it in that data store to keep it training and tuning those models, and so what it does is, though, is if you think about IMS, is they've captured all that data, they can use the predictive algorithms to do some of the associations between customer information and the censor data about driving habits, bring that together and so it's sort of this perfect storm of the amount of data that's coming in from IOT, the availability of the machine learning, and the data is really what's driving all of that, and I think that Mark Hall, on our panel, who's a really well-known data-mining expert was like, yeah, it all started because we had enough data to be able to do it. >> So I want to ask you, again, a product and maybe philosophy question. We've talked on the Cube a lot about the cornucopia of tooling that's out there and people who try to roll their own and. The big internet companies and the big banks, they get the resources to do it but they need companies like you. When we talk to your customers, they love the fact that there's an integrated data pipeline and you've made their lives simple. I think in 8.0 I saw spark, you're probably replacing MapReduce and making life simpler so you've curated a lot of these tools, but at the same time, you don't own you're own cloud, you're own database, et cetera. So, what's the philosophy of how you future-proof your platform when you know that there are new projects in Apache and new tooling coming out there. What's the secret sauce behind that? >> Well the first one is the open-source core because that just gave us the ability to have APIs, to extend, to build plugins, all of that in a community that does quite a bit of that, in fact, Kafka started with a customer that built a step, initially, we've now brought that into a product and created it as part of the platform but those are the things that in early market, a customer can do at first. We can see what emerges around that and then go. We will offer it to our customers as a step but we can also say, okay, now we're ready to productize this. So that's the first thing, and then I think the second one is really around when you see something like Spark emerge and we were all so focused on MapReduce and how are we going to make it easier and let's create tools to do that and we did that but then it was like MapReduce is going to go away, well there's still a lot of MapReduce out there, we know that. So we can see then, that MapReduce is going to be here and, I think the numbers are around 50/50, you probably know better than I do where Spark is versus MapReduce. I might be off but. >> Jim: If we had George Gilbert, he'd know. >> (laughs) Maybe ask George, yeah it's about 50/50. So you can't just abandon that, 'cause there's MapReduce out there, so it was, what are we going to do? Well, what we did in the Hadoop Distro days is we created a adaptive, big data layer that said, let's abstract a layer so that when we have to support a new distribution of Hadoop, we don't have to go back to the drawing board. So, it was the same thing with the execution engines. Okay, let's build this adaptive execution layer so that we're prepared to deal with other types of engines. I can build the transformation once, execute it anywhere, so that kind of philosophy of stepping back if you have that open platform, you can do those kinds of things, You can create those layers to remove all of that complexity because if you try to one-off and take on each one of those technologies, whether it's Spark or Flink or whatever's coming, as a product, and a product management organization, and a company, that's really difficult. So the community helps a ton on that, too. >> Donna, when you talk to customers about. You gave a great talk on the roadmap today to give a glimpse of where you guys are headed, your basic philosophy, your architecture, what are they pushing you for? Where are they trying to take you or where are you trying to take them? (laughs) >> (laughs) Hopefully, a little bit of both, right? I think it's being able to take advantage of the kinds of technologies, like you mentioned, that are emerging when they need them, but they also want us to make sure that all of that is really enterprise-ready, you're making it solid. Because we know from history and big data, a lot of those technologies are early, somebody has to get their knees skinned and all that with the first one. So they're really counting on us to really make it solid and quality and take care of all of those intricacies of delivering it in a non-open-source way where you're making it a real commercial product, so I think that's one thing. Then the second piece that we're seeing a lot more of as part of Hitachi we've moved up into the enterprise we also need to think a lot more about monitoring, administration, security, all of the things that go at the base of a pipeline. So, that scenario where they want us to focus. The great thing is, as part of Hitachi Vantara now, those aren't areas that we always had a lot of expertise in but Hitachi does 'cause those are kind of infrastructure-type technologies, so I think the push to do that is really strong and now we'll actually be able to do more of it because we've got that access to the portfolio. >> I don't know if this is a fair question for you, but I'm going to ask it anyway, because you just talked about some of the things Hitachi brings and that you can leverage and it's obvious that a lot of the things that Pentaho brings to Hitachi, the family but one of the things that's not talked about a lot is go-to-market, Hitachi data systems, traditionally don't have a lot of expertise at going to market with developers as the first step, where in your world you start. Has Pentaho been able to bring that cultural aspect to the new entity. >> For us, even though we have the open-source world, that's less of the developer and more of an architect or a CIO or somebody who's looking at that. >> David: Early adopter or. >> More and more it's the Chief Data Officer and that type of a persona. I think that, now that we are a entity, a brand new entity, that's a software-oriented company, we're absolutely going to play a way bigger role in that, because we brought software to market for 13 years. I think we've had early wins, we've had places where we're able to help. In an account, for instance, if you're in the data center, if that's where Hitachi is, if you start to get that partnership and we can start to draw the lines from, okay, who are the people that are now looking at, what's the big data strategy, what's the IOT strategy, where's the CDO. That's where we've had a much better opportunity to get to bigger sales in the enterprise in those global accounts, so I think we'll see more of that. Also there's the whole transformation of Hitachi as well, so I think there'll be a need to have much more of that software experience and also, Hitachi's hired two new executives, one on the sales side from SAP, and one who's now my boss, Brad Surak from GE Digital, so I think there's a lot of good, strong leadership around the software side and, obviously, all of the expertise that the folks at Pentaho have. >> That's interesting, that Chief Data Officer role is emerging as a target for you, we were at an event on Tuesday in Boston, there were about 200 Chief Data Officers there and I think about 25% had a Robotic Process Automation Initiative going on, they didn't ask about IOT just this little piece of IOT and then, Jim, Data Scientists and that whole world is now your world, okay great. Donna Prlich, thanks very much for coming to the Cube. Always a pleasure to see you. >> Donna: Yeah, thank you. >> Okay, Dave Velonte for Jim Kobielus. Keep it right there everybody, this is the Cube. We're live from PentahoWorld 2017 hashtag P-World 17. Brought to you by Hitachi Vantara, we'll be right back. (upbeat techno)

Published Date : Oct 26 2017

SUMMARY :

Brought to you by, Hitachi Vantara. Great to see you again. that you guys decided to that we have access to in some ways. I am the CPO for the Pentaho product line, of data for context to see the outcome. of 8.0, the differentiators on the processing you need. on that experience to that you just described. That's a nice looking So, is that the equivalent it's going to help you to David: I see, so really efficient, of the processing to in that data store to but at the same time, you to do that and we did Jim: If we had George have that open platform, you of where you guys are headed, that go at the base of a pipeline. and that you can leverage and more of an architect that the folks at Pentaho have. and that whole world is Brought to you by Hitachi

ENTITIES

Entity	Category	Confidence
Hitachi	ORGANIZATION	0.99+
Anthony DeShazor	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
George	PERSON	0.99+
Brian	PERSON	0.99+
David	PERSON	0.99+
2015	DATE	0.99+
Dave	PERSON	0.99+
Donna	PERSON	0.99+
Mark Hall	PERSON	0.99+
Dave Velonte	PERSON	0.99+
Ella Hilal	PERSON	0.99+
Donna Prlich	PERSON	0.99+
Pentaho	ORGANIZATION	0.99+
Brad Surak	PERSON	0.99+
Hitachi Vantara	ORGANIZATION	0.99+
13 years	QUANTITY	0.99+
Friday	DATE	0.99+
Mark Hall	PERSON	0.99+
George Gilbert	PERSON	0.99+
Tuesday	DATE	0.99+
Boston	LOCATION	0.99+
GE Digital	ORGANIZATION	0.99+
1,800 customers	QUANTITY	0.99+
second piece	QUANTITY	0.99+
last year	DATE	0.99+
GE	ORGANIZATION	0.99+
Orlando	LOCATION	0.99+
Orlando, Florida	LOCATION	0.99+
third	QUANTITY	0.99+
first step	QUANTITY	0.99+
Hitachi Ventara	ORGANIZATION	0.99+
two new executives	QUANTITY	0.99+
more than 100%	QUANTITY	0.99+
second one	QUANTITY	0.98+
PentahoWorld	EVENT	0.98+
today	DATE	0.98+
PentahoWorld	ORGANIZATION	0.98+
#pworld17	EVENT	0.98+
first one	QUANTITY	0.98+
first year	QUANTITY	0.97+
first time	QUANTITY	0.97+
three	QUANTITY	0.97+
one	QUANTITY	0.97+
Hiatachi Vantara	ORGANIZATION	0.96+
both	QUANTITY	0.96+
IMS	ORGANIZATION	0.96+
Kafka	TITLE	0.95+
about 200 Chief Data Officers	QUANTITY	0.95+

Seth Dobrin, IBM Analytics - IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany; it's The Cube. Covering IBM; fast-track your data. Brought to you by IBM. (upbeat techno music) >> For you here at the show, generally; and specifically, what are you doing here today? >> There's really three things going on at the show, three high level things. One is we're talking about our new... How we're repositioning our hybrid data management portfolio, specifically some announcements around DB2 in a hybrid environment, and some highly transactional offerings around DB2. We're talking about our unified governance portfolio; so actually delivering a platform for unified governance that allows our clients to interact with governance and data management kind of products in a more streamlined way, and help them actually solve a problem instead of just offering products. The third is really around data science and machine learning. Specifically we're talking about our machine learning hub that we're launching here in Germany. Prior to this we had a machine learning hub in San Francisco, Toronto, one in Asia, and now we're launching one here in Europe. >> Seth, can you describe what this hub is all about? This is a data center where you're hosting machine learning services, or is it something else? >> Yeah, so this is where clients can come and learn how to do data science. They can bring their problems, bring their data to our facilities, learn how to solve a data science problem in a more team oriented way; interacting with data scientists, machine learning engineers, basically, data engineers, developers, to solve a problem for their business around data science. These previous hubs have been completely booked, so we wanted to launch them in other areas to try and expand the capacity of them. >> You're hosting a round table today, right, on the main tent? >> Yep. >> And you got a customer on, you guys going to be talking about sort of applying practices and financial and other areas. Maybe describe that a little bit. >> We have a customer on from ING, Heinrich, who's the chief architect for ING. ING, IBM, and Horton Works have a consortium, if you would, or a framework that we're doing around Apache Atlas and Ranger, as the kind of open-source operating system for our unified governance platform. So much as IBM has positioned Spark as a unified, kind of open-source operating system for analytics, for a unified governance platform... For a governance platform to be truly unified, you need to be able to integrate metadata. The biggest challenge about connecting your data environments, if you're an enterprise that was not internet born, or cloud born, is that you have proprietary metadata platforms that all want to be the master. When everyone wants to be the master, you can't really get anything done. So what we're doing around Apache Atlas is we are setting up Apache Atlas as kind of a virtual translator, if you would, or a dictionary between all the different proprietary metadata platforms so that you can get a single unified view of your data environment across hybrid clouds, on premise, in the cloud, and across different proprietary vendor platforms. Because it's open-sourced, there are these connectors that can go in and out of the proprietary platforms. >> So Seth, you seem like you're pretty tuned in to the portfolio within the analytics group. How are you spending your time as the Chief Data Officer? How do you balance it between customer visits, maybe talking about some of the products, and then you're sort of day job? >> I actually have three days jobs. My job's actually split into kind of three pieces. The first, my primary mission, is really around transforming IBM's internal business unit, internal business workings, to use data and analytics to run our business. So kind of internal business unit transformation. Part of that business unit transformation is also making sure that we're compliant with regulations like GDBR and other regulations. Another third is really around kind of rethinking our offerings from a CDO perspective. As a CDO, and as you, Dave, I've only been with IBM for seven months. As a former client recently, and as a CDO, what is it that I want to see from IBM's offerings? We kind of hit on it a little bit with the unified governance platform, where I think IBM makes fantastic products. But as a client, if a salesperson shows up to me, I don't want them selling me a product, 'cause if I want an MDM solution, I'll call you up and say, "Hey, I need an MDM solution. "Give me a quote." What I want them showing up is saying, "I have a solution that's going to solve "your governance problem across your portfolio." Or, "I'm going to solve your data science problem." Or, "I'm going to help you master your data, "and manage your data across "all these different environments." So really working with the offering management and the Dev teams to define what are these three or four, kind of business platforms that we want to settle on? We know three of them at least, right? We know that we have a hybrid data management. We have unified governance. We have data science and machine learning, and you could think of the Z franchise as a fourth platform. >> Seth, can you net out how governance relates to data science? 'Cause there is governance of the statistical models, machine learning, and so forth, version control. I mean, in an end to end machine learning pipeline, there's various versions of various artifacts they have to be managed in a structured way. Is your unified governance bundle, or portfolio, does it address those requirements? Or just the data governance? >> Yeah, so the unified governance platform really kind of focuses today on data governance and how good data governance can be an enabler of rapid data science. So if you have your data all pre-governed, it makes it much quicker to get access to data and understand what you can and can't do with data; especially being here in Europe, in the context of the EU GDPR. You need to make sure that your data scientists are doing things that are approved by the user, because basically your data, you have to give explicit consent to allow things to be done with it. But long term vision is that... essentially the output of models is data, right? And how you use and deploy those models also need to be governed. So the long term vision is that we will have a governance platform for all those things, as well. I think it makes more sense for those things to be governed in the data science platform, if you would. And we... >> We often hear separate from GDPR and all that, is something called algorithmic accountability; that more is being discussed in policy circles, in government circles around the world, as strongly related to everything you're describing. Being able to trace the lineage of any algorithmic decision back to the data, the metadata, and so forth, and the machine learning models that might have driven it. Is that where IBM's going with this portfolio? >> I think that's the natural extension of it. We're thinking really in the context of them as two different pieces, but if you solve them both and you connect them together, then you have that problem. But I think you're absolutely right. As we're leveraging machine learning and artificial intelligence, in general, we need to be able to understand how we got to a decision, and that includes the model, the data, how the data was gathered, how the data was used and processed. So it is that entire pipeline, 'cause it is a pipeline. You're not doing machine learning or AI in a vacuum. You're doing it in the context of the data, and you're doing it in the context about the individuals or the organizations that you're trying to influence with the output of those models. >> I call it Dev ops for data science. >> Seth, in the early Hadoop days, the real headwind was complexity. It still is, by the way. We know that. Companies like IBM are trying to reduce that complexity. Spark helps a little bit So the technology will evolve, we get that. It seems like one of the other big headwinds right now is that most companies don't have a great understanding of how they can take data and monetize it, turn it into value. Most companies, many anyway, make the mistake of, "Well, I don't really want to sell my data," or, "I'm not really a data supplier." And they're kind of thinking about it, maybe not in the right way. But we seem to be entering a next wave here, where people are beginning to understand I can cut costs, I can do predictive maintenance, I can maybe not sell the data, but I can enhance what I'm doing and increase my revenue, maybe my customer retention. They seem to be tuning, more so; largely, I think 'cause of the chief data officer roles, helping them think that through. I wonder if you would give us your point of view on that narrative. >> I think what you're describing is kind of the digital transformation journey. I think the end game, as enterprises go through a digital transformation, the end game is how do I sell services, outcomes, those types of things. How do I sell an outcome to my end user? That's really the end game of a digital transformation in my mind. But before you can get to that, before you transform your business's objectives, there's a couple of intermediary steps that are required for that. The first is what you're describing, is those kind of data transformations. Enterprises need to really get a handle on their data and become data driven, and start then transforming their current business model; so how do I accelerate my current business leveraging data and analytics? I kind of frame that, that's like the data science kind of transformation aspect of the digital journey. Then the next aspect of it is how do I transform my business and change my business objectives? Part of that first step is in fact, how do I optimize my supply chain? How do I optimize my workforce? How do I optimize my goals? How do I get to my current, you know, the things that Wall Street cares about for business; how do I accelerate those, make those faster, make those better, and really put my company out in front? 'Cause really in the grand scheme of things, there's two types of companies today; there's the company that's going to be the disruptor, and there's companies that's going to get disrupted. Most companies want to be the disruptors, and it's a process to do that. >> So the accounting industry doesn't have standards around valuing data as an asset, and many of us feel as though waiting for that is a mistake. You can't wait for that. You've got to figure out on your own. But again, it seems to be somewhat of a headwind because it puts data and data value in this fuzzy category. But there are clearly the data haves and the data have-nots. What are you seeing in that regard? >> I think the first... When I was in my former role, my former company went through an exercise of valuing our data and our decisions. I'm actually doing that same exercise at IBM right now. We're going through IBM, at least in the analytics business unit, the part I'm responsible for, and going to all the leaders and saying, "What decisions are you making?" "Help me understand the decisions that you're making." "Help me understand the data you need "to make those decisions." And that does two things. Number one, it does get to the point of, how can we value the decisions? 'Cause each one of those decisions has a specific value to the company. You can assign a dollar amount to it. But it also helps you change how people in the enterprise think. Because the first time you go through and ask these questions, they talk about the dashboards they want to help them make their preconceived decisions, validated by data. They have a preconceived notion of the decision they want to make. They want the data to back it up. So they want a dashboard to help them do that. So when you come in and start having this conversation, you kind of stop them and say, "Okay, what you're describing is a dashboard. "That's not a decision. "Let's talk about the decision that you want to make, "and let's understand the real value of that decision." So you're doing two things, you're building a portfolio of decisions that then becomes to your point, Jim, about Dev ops for data science. It's your backlog for your data scientists, in the long run. You then connect those decisions to data that's required to make those, and you can extrapolate the data for each decision to the component that each piece of data makes up to it. So you can group your data logically within an enterprise; customer, product, talent, location, things like that, and you can assign a value to those based on decisions they support. >> Jim: So... >> Dave: Go ahead, please. >> As a CDO, following on that, are you also, as part of that exercise, trying to assess the value of not just the data, but of data science as a capability? Or particular data science assets, like machine learning models? In the overall scheme of things, that kind of valuation can then drive IBM's decision to ramp up their internal data science initiatives, or redeploy it, or, give me a... >> That's exactly what happened. As you build this portfolio of decisions, each decision has a value. So I am now assigning a value to the data science models that my team will build. As CDOs, CDOs are a relatively new role in many organizations. When money gets tight, they say, "What's this guy doing?" (Dave laughing) Having a portfolio of decisions that's saying, "Here's real value I'm adding..." So, number one, "Here's the value I can add in the future," and as you check off those boxes, you can kind of go and say, "Here's value I've added. "Here's where I've changed how the company's operating. "Here's where I've generated X billions of dollars "of new revenue, or cost savings, or cost avoidance, "for the enterprise." >> When you went through these exercises at your previous company, and now at IBM, are you using standardized valuation methodologies? Did you kind of develop your own, or come up with a scoring system? How'd you do that? >> I think there's some things around, like net promoter score, where there's pretty good standards on how to assign value to increases in net promoter score, or decreases in net promoter score for certain aspects of your business. In other ways, you need to kind of decide as an enterprise, how do we value our assets? Do we use a three year, five year, ten year MPV? Do we use some other metric? You need to kind of frame it in the reference that your CFO is used to talking about so that it's in the context that the company is used to talking about. Most companies, it's net present value. >> Okay, and you're measuring that on an ongoing basis. >> Seth: Yep. >> And fine tuning as you go along. Seth, we're out of time. Thanks so much for coming back in The Cube. It was great to see you. >> Seth: Yeah, thanks for having me. >> You're welcome, good luck this afternoon. >> Seth: Alright. >> Keep it right there, buddy. We'll be back. Actually, let me run down the day here for you, just take a second to do that. We're going to end our Cube interviews for the morning, and then we're going to cut over to the main tent. So in about an hour, Rob Thomas is going to kick off the main tent here with a keynote, talking about where data goes next. Hilary Mason's going to be on. There's a session with Dez Blanchfield on data science as a team sport. Then the big session on changing regulations, GDPRs. Seth, you've got some customers that you're going to bring on and talk about these issues. And then, sort of balancing act, the balancing act of hybrid data. Then we're going to come back to The Cube and finish up our Cube interviews for the afternoon. There's also going to be two breakout sessions; one with Hilary Mason, and one on GDPR. You got to go to IBMgo.com and log in and register. It's all free to see those breakout sessions. Everything else is open. You don't even have to register or log in to see that. So keep it right here, everybody. Check out the main tent. Check out siliconangle.com, and of course IBMgo.com for all the action here. Fast track your data. We're live from Munich, Germany; and we'll see you a little later. (upbeat techno music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. that allows our clients to interact with governance and expand the capacity of them. And you got a customer on, you guys going to be talking about and Ranger, as the kind of open-source operating system How are you spending your time as the Chief Data Officer? and the Dev teams to define what are these three or four, I mean, in an end to end machine learning pipeline, in the data science platform, if you would. and the machine learning models that might have driven it. and you connect them together, then you have that problem. I can maybe not sell the data, How do I get to my current, you know, But again, it seems to be somewhat of a headwind of decisions that then becomes to your point, Jim, of not just the data, but of data science as a capability? and as you check off those boxes, you can kind of go and say, You need to kind of frame it in the reference that your CFO And fine tuning as you go along. and we'll see you a little later.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
ING	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
Germany	LOCATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Thomas	PERSON	0.99+
ten year	QUANTITY	0.99+
five year	QUANTITY	0.99+
seven months	QUANTITY	0.99+
Asia	LOCATION	0.99+
three year	QUANTITY	0.99+
three	QUANTITY	0.99+
four	QUANTITY	0.99+
Heinrich	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Dez Blanchfield	PERSON	0.99+
two types	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
three days	QUANTITY	0.99+
two things	QUANTITY	0.99+
each piece	QUANTITY	0.99+
today	DATE	0.99+
Dav	PERSON	0.99+
each	QUANTITY	0.99+
first	QUANTITY	0.99+
Munich, Germany	LOCATION	0.99+
third	QUANTITY	0.99+
both	QUANTITY	0.99+
billions of dollars	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
two different pieces	QUANTITY	0.98+
three things	QUANTITY	0.98+
DB2	TITLE	0.98+
first step	QUANTITY	0.98+
GDPR	TITLE	0.97+
Apache Atlas	ORGANIZATION	0.97+
fourth platform	QUANTITY	0.97+
2017	DATE	0.97+
three pieces	QUANTITY	0.97+
IBM Analytics	ORGANIZATION	0.96+
first time	QUANTITY	0.96+
single	QUANTITY	0.96+
Spark	TITLE	0.95+
Ranger	ORGANIZATION	0.91+
two breakout sessions	QUANTITY	0.88+
about an hour	QUANTITY	0.86+
each decision	QUANTITY	0.85+
Cube	COMMERCIAL_ITEM	0.84+
each one	QUANTITY	0.83+
this afternoon	DATE	0.82+
Cube	ORGANIZATION	0.8+
San Francisco, Toronto	LOCATION	0.79+
GDPRs	TITLE	0.76+
GDBR	TITLE	0.75+

Joe Goldberg, BMC Software - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, it's The Cube covering DataWorks Summit 2017. Brought to you by Horton works. >> Hi. Welcome back to The Cube. We are live at day one of the DataWorks Summit in San Jose, in the heart of Silicon Valley, hosted by Hortonworks. We've had a great day so far. Lots of innovation. Lots of great announcements. We're very excited to be joined by one of this week's keynotes and Cube alumni, Joe Goldberg, Innovation Evangelist at BMC Software. Welcome back to The Cube. >> Thank you very much. Always a pleasure to be here. >> Exactly and we're happy to have you back. So, talk to us, what's happening with BMC? What are you guys doing there? What are people going to learn in your keynote on Thursday? >> So BMC has been really working with all of our customers to modernize, not only our tool chain, but the way automation is used and deployed throughout the organization. We actually did a survey recently, The State of Automation. We got pretty much the kind of results we would've expected, but this let us really sort of make tangible what we have sort of always felt was, you know the state of this kind of approach to how critical automation is in the enterprise. We had a response from leaders and CXOs that 93% thought that automation was key to helping them make that digital transformation that everyone is involved in today. So, that's been one of the key elements that has really kind of driven everything that we've been doing with BMC today. >> Now, BMC's known especially for handling workflows that operate more than a batch work >> Joe Goldberg: Yes So high certainty, very much predictability in terms of when things going to happen, how long's it going to take, what action's going to take place. Very, very complex types of processing takes place. I'm always fascinated and I've talked to other customers that are wondering about this when you come back to the State of Automation that we want to move, everybody wants to move to interactive. >> Joe Goldberg: Yes. >> But often the jump to interactive takes place well in advance of predictability of how the data's actually being constructed and put together and aggregated in the back end. Talk a little bit about the priorities. How does one...? Cause it's really not a chicken and egg kind of a problem. How does one anticipate excellence in the other? So what we've been hearing and actually I think of the previous Hortonworks or DataWorks Summit, we had one of our customers talk about their approach to what was a fundamental data architecture for them, which was the separation between the speed and batch layer. And I think you hear an awful lot of that kind of conversation. And they run in parallel and from our perspective managing the batch layer really underpins the kind of real actionable insides that you can extract from the speed layer, which is focusing on capturing that very small percentage of what is really the signal in the data, but then being able to take that and enrich it with what you've been collecting and managing using the batch layer. I think that that's the kind of approach that we've seen from a lot of customers, where certainly all of the cool stuff and the focus is on the interactive and the realtime and streaming. But in order to really be able to be predictive, because you know there's no magic, we still don't know how to tell the future. The only to be able to do that is by making sure that you are basing yourself on history that is well, sort of collected, curated, make sure that you have actually captured it, that you've enriched it from a variety of different sources. And that's where we come in. What we have been focusing on is providing a set of facilities for managing batch that is... I talk about hyper heterogeneity, I know that's a mouthful, but that's really what the new enterprise environment is like. So you add or you know, a layer on top of your conventional applications and your conventional data, all of this new data formats and data does now arriving in real time in high volume. I think that taking that kind of an approach is really the only way that you can ensure that you are capturing all of your... Ingesting all of the data that's coming in from all of your endpoints, including you know, IOT applications and really being able to combine it with all of the corporate sort of knowledge that you've accumulated through your traditional sources. >> So, batches historically meant, again a lot of precise code, it had to be written to handle complex jobs and it scared off a lot of folks into thinking about interactive. In the last 10 years, there's been some pretty significant advances in how we think about putting together batch workflows, become much more programmable. How does control (mumbles) and some of the other tool set that BNC provides, How does it fit into? How does it look more like the types of application development, tasks and methods that are becoming increasingly popular, as you think about delivering the outcomes of big data processing to other applications or to other segments? >> So, you know that's very, that's a great question. Its almost like, thanks for the set up. So, you can see. >> Well let's not ask it then. (laughs) >> You can see the shirt that I'm wearing and of course this is very intentional, but our history has been that we've come from the data center, operations focus. And the transition in the marketplace today has been that really the focus has shifted, whether you talk about shift left or everything as code, where the new methods of building and delivering applications really look at everything manual that is done, coding to create an application that's done upfront. And then the rigger for enterprise operations is built in through this automated delivery pipeline. And so, obviously you have to invert this kind of approach that we've had in terms of layering management tools on at the very end and instead you have to be able to inject them into your application early. So, we feel that certainly it's true for all applications and it's I think doubly true in data applications, that the automation and the operational instrumentation is an equal partner to your business logic under the code that you write and so it needs to be created right upfront and then moved together with all of the rest of your application components through that delivery pipeline in a CIDC fashion. And so that is what we have done. And again that what the concept is of Jawless. >> So, as you think about what the next step is, is batch going to, presumably batch will be sustained as mode of operation. How is it going to become even more comfortable to a lot of the development methodologys as we move forward? How do you think it's going to be evolved as a tool for increasing the amount of predictability in that back end? >> So, I think that the key to continuing to evolve this Jawless code approach is to enable developers to be able to build and work with that operational plumbing in the same way they work with their business logic. >> Or any other resource? >> Exactly. So, you know, you think about what are the tools that developers have today when they build, whether you're writing in Java or C or R or Scala, there are development environments, there are these tools that let you test that let you step through your logic to be able to identify and find any flaws, you know sort of bugs in your code. And in order for jawless code to really meet the test of being code, we are working on providing the same kind of capabilities to work with our objects that developers expect to have for programming languages. >> So Joe, I'm not going to shift us back last question here. Kind of looking at more of a business industry level, to do big data write, to bring Hadoop to an enterprise successfully, what are some of the mission critical elements that c-suite really needs to embrace in order to be successful across big industries, like healthcare, financial services, Telco? >> So, I think they have to be able to apply the same requirements and the test for how a big data application moves into their enterprise in terms of, not only how it's operated, but how is it made accessible to all of the constituents that need to use it. One of the key elements we hear frequently is that, and I think it's a danger that when technicians solely create what is the end deliverable tool, it frequently is very technical and it has to be consumable by the people that actually need to use it. And so you have to strike this balance between providing sufficient technical sophistication and business usability and I think that that's kind of a goal for being successful in implementing any kind of technology and certainly big data. >> Excellent. Well, Joe Goldberg, thank you so much for coming back to the Cube and joining my cohost, Peter Burris and I for this great chat. And people can watch your keynote on Thursday. >> Yes. >> This week, on the 15th of June. So again for my cohost Peter Burris. I am Lisa Martin. Thanks so much for watching the Cube live, again at day one of the DataWorks Summit. Stick around. We'll be right back. (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Horton works. in San Jose, in the heart of Silicon Valley, Always a pleasure to be here. What are people going to learn in your keynote on Thursday? We got pretty much the kind of results we would've expected, that are wondering about this when you come back is really the only way that you can ensure the outcomes of big data processing to other applications So, you know that's very, that's a great question. Well let's not ask it then. and so it needs to be created right upfront How is it going to become even more comfortable So, I think that the key to continuing to evolve that let you step through your logic that c-suite really needs to embrace and it has to be consumable by the people for coming back to the Cube again at day one of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Joe Goldberg	PERSON	0.99+
Lisa Martin	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
San Jose	LOCATION	0.99+
Thursday	DATE	0.99+
Joe	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
93%	QUANTITY	0.99+
BMC Software	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
15th of June	DATE	0.99+
Scala	TITLE	0.99+
Java	TITLE	0.99+
One	QUANTITY	0.99+
C	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
The Cube	ORGANIZATION	0.97+
today	DATE	0.96+
DataWorks Summit 2017	EVENT	0.95+
Horton	PERSON	0.94+
Jawless	TITLE	0.94+
Hortonworks	ORGANIZATION	0.94+
this week	DATE	0.94+
Cube	ORGANIZATION	0.91+
R	TITLE	0.9+
last 10 years	DATE	0.87+
day one	QUANTITY	0.86+
BNC	ORGANIZATION	0.82+
Hortonworks	EVENT	0.8+
The State of	TITLE	0.67+
elements	QUANTITY	0.58+
Software	EVENT	0.54+
Cube	PERSON	0.43+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Mike Merritt-Holmes, Think Big - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Covering Data Works Summit Europe 2017 brought to you by Horton Works. (uptempo, energetic music) >> Okay, welcome back everyone. We're here live in Germany at Munich for DataWorks Summit 2017, formerly Hadoop Summit. I'm John Furrier, my co-host Dave Vellante. Our next guest is Mike Merritt-Holmes, is senior Vice President of Global Services Strategy at Think Big, a Teradata company, formerly the co-founder of the Big Data Partnership merged in with Think Big and Teradata. Mike, welcome to The Cube. >> Mike: Thanks for having me. >> Great having an entrepreneur on, you're the co-founder, which means you've got that entrepreneurial blood, and I got to ask you, you know, you're in the big data space, you got to be pretty pumped by all the hype right now around AI because that certainly gives a lot of that extra, extra steroid of recognition. People love AI it gives a face to it, and certainly IOT is booming as well, Internet of Things, but big data's cruising along. >> I mean it's a great place to be. The train is certainly going very, very quickly right now. But the thing for us is, we've been doing data science and AI and trying to build business outcomes, and value for businesses for a long time. It's just great now to see this really, the data science and AI both were really starting to take effect and so companies are starting to understand it and really starting to really want to embrace it which is amazing. >> It's inspirational too, I mean I have a bunch of kids in my family, some are in college and some are in high school, even the younger generation are getting jazzed up on just software, right, but the big data stuffs been cruising along now. It's been a good, decade now of really solid DevOps culture, cloud now accelerating, but now the customers are forcing the vendors to be very deliberate in delivering great product, because the demand (chuckling) for real time, the demand for more stuff, is at an all time high. Can you elaborate your thoughts on, your reaction to what customers are doing, because they're the ones driving everyone, not to create friction, to create simplicity. >> Yeah, and you know, our customers are global organizations, trying to leverage this kind of technology, and they are, you know, doing an awesome amount of stuff right now to try to move them from, effectively, a step change in their business, whether it's, kind of, shipping companies doing preventive asset maintenance, or whether it's retailers looking to target customers in a more personalized way, or really understand who their customers are, where they come from, they're leveraging all those technologies, and really what they're doing is pushing the boundaries of all of them, and putting more demands on all of the vendors in the space to say, we want to do this quicker, faster, but more easily as well. >> And then the things that you're talking about, I want to get your thoughts on, because this is the conversation that you're having with customers, I want to extract is, have those kind of data-driven mindset questions, have come out the hype of the Hadoob. So, I mean we've been on a hype cycle for awhile, but now its back to reality. Where are we with the customer conversations, and, from your stand point, what are they working on? I mean, is it mostly IT conversation? Is it a frontoffice conversation? Is it a blend of both? Because, you know, data science kind of threads both sides of the fence there. >> Yeah, I mean certainly you can't do big data without IT being involved, but since the start, I mean, we've always been engaged with the business, it's always been about business outcome, because you bring data into a platform, you provide all this data science capability, but unless you actually find ROI from that, then there's no point, because you want to be moving the business forward, so it's always been about business engagement, but part of that has always been also about helping them to change their mindset. I don't want a report, I want to understand why you look at that report and what's the thing you're looking for, so we can start to identify that for you quicker. >> What's the coolest conversation you've been in, over the past year? >> Uh, I mean, I can't go into too much details, but I've had some amazing conversations with companies like Lego, for instance, they're an awesome company to work with. But when you start to see some of the things we're doing, we're doing some amazing object recognition with deep-learning in Japan. We're doing some ford analytics in the Nordics with deep-learning, we're doing some amazing stuff that's really pushing the boundaries, and when you start to put those deep-learning aspects into real world applications, and you start to see, customers clambering over to want to be part of that, it's a really exciting place to be. >> Let me just double-click on that for a second, because a lot of, the question I get a lot on The Cube, and certainly off-camera is, I want to do deep-learning, I want to do AI, I love machine learning, I hear, oh, it's finally coming to reality so people see it forming. How do they get started, what are some of the best practices of getting involved in deep-learning? Is it using open-source, obviously, is one avenue, but what advice would you give customers? >> From a deep-learning perspective, so I think first of all, I mean, a lot of the greatest deep-learning technologies, run open-source, as you rightly said, but I think actually there's a lot of tutorials and stuff on there, but really what you need is someone who has done it before, who knows where the pitfalls are, but also know when to use the right technology at the right time, and also to know around some of the aspects about whether using a deep-learning methodology is going to be the right approach for your business problem. Because a lot of companies are, like, we want to use this deep-learning thing, its amazing, but actually its not appropriate, necessarily, for the use case you're trying to draw from. >> It's the classic holy grail, where is it, if you don't know what you're looking for, it's hard to know when to apply it. >> And also, you've got to have enough data to utilize those methods as well, so. >> You hear a lot about the technical complexity associated with Hadoop specifically, but just ol' big data generally. I wonder if you could address that, in terms of what you're seeing, how people are dealing with that technical complexity but what other headwinds are there, in terms of adopting these new capabilities. >> Yeah, absolutely, so one of the challenges that we still see is that customers are struggling to leverage value from their platform, and normally that's because of the technical complexities. So we really, we introduced to the open-source world last month Kaylo, something you can download free of charge. It's completely open-source on the Apache license, and that really was about making it easier for customers to start to leverage the data on the platform, to self-serve injection onto that, and for data scientists to wrangle the data better. So, I think there's a real push right now about that next level up, if you like, in the technology stack to start to enable non-technical users to start to do interesting things on the platform directly, rather than asking someone to do it for them. And that, you know, we've had technologies in the PI space like Tableau, and, obviously, the (mumbling) did a data-warehouse solutions on Teradata that have been giving customers something, before and previously, but actually now they're asking for more, not just that, but more as well. And that's where we are starting to see the increases. >> So that's sort of operationalizing analytics as an example, what are some of the business complexities and challenges of actually doing that? >> That's a very good question, because, I think, when you find out great insight, and you go, wow you've built this algorithm, I've seen things I've never seen before, then the business wants to have that always on they want to know that it's that insight all the time is it changing, is it going up, is it going down do I need to change my business decisions? And doing that and making that operational means, not only just deploying it but also monitoring those models, being able to keep them up to date regularly, understanding whether those things are still accurate or not, because you don't want to be making business decisions, on algorithms that are now a bit stale. So, actually operationalizing it, is about building out an entire capability that's keeping these things accurate, online, and, therefore, there's still a bit of work to do, I think, actually in the marketplace still, around building out an operational capability. >> So you kind of got bottom-up, top-down. Bottom-up is the you know the Hadoop experiments, and then top-down is CXO saying we need to do big data. Have those two constituencies come together now, who's driving the bus? Are they aligned or is it still, sort of, a mess organizationally? >> Yeah, I mean, generally, in the organization, there's someone playing the Chief Data Officer, whether they have that as a title or a roll, ultimately someone is in charge of generating value from the data they have in the organization. But they can't do that with IT, and I think where we've seen companies struggle is where they've driven it from the bottom-up, and where they succeed is where they drive it from the top-down, because by driving it from the top-down, you really align what you're doing with the business and strategy that you have. So, the company strategy, and what you're trying to achieve, but ultimately, they both need to meet in the middle, and you can't do one without the other. >> And one of our practitioner friends, who's describing this situation in our office in Palo Alto, a couple of weeks ago. he said, you know, the challenge we have as an organization is, you've got top people saying alright, we're moving. And they start moving, the train goes, and then you've got kind of middle management, sort of behind them, and then you got the doers that are far behind, and aligning those is a huge challenge for this particular organization. How do you recommend organizations to address that alignment challenge, does Think Big have capabilities to help them through that, or is that, sort of, you got to call Accenture? >> In essence, our reason for being is to help with those kind of things, and, you know, whether it's right from the start, so, oh, my God, my Chief Data Officer or my CEO is saying we need to be doing this thing right now, come on, let's get on with it, and we help them to understand what does that mean, what are the use cases, how, where's the value going to come from, what's that architecting to look like, or whether its helping them to build out capability, in terms of data science or building out the cluster itself, and then managing that and providing training for staff. Our whole reason for being is supporting that transformation as a business, from, oh, my God, what do I do about this thing, to, I'm fully embracing it, I know what's going on, I'm enabling my business, and I'm completely comfortable with that world. >> There was a lot talk three, or four or five years ago, about the ROI of so-called big data initiatives, not being really, you know, there were edge cases which were huge ROI, but there was a lot of talk about not a lot of return. My question is, has that, first question, has that changed, are you starting to see much bigger phone numbers coming back where the executives are saying yeah, lets double down on this. >> Definitely, I'm definitely seeing that. I mean, I think it's fair to say that companies are a bit nervous about reporting their ROI around this stuff, in some cases, so there's more ROI out there than you necessarily see out in the public place, but-- >> Why is that? Because they don't want to expose to the competition, or they don't want to front run their earnings, or whatever it is? >> They're trying to get a competitive edge. The minute you start saying, we're doing this, their competitors have an opportunity to catch up. >> John: Very secretive. >> Yeah and I think, it's not necessarily about what they're doing, it's about keeping the edge over their customers, really, over their competitors. So, but what we're seeing is that many customers are getting a lot of ROI more recently because they're able to execute better, rather than being struggling with the IT problems, and even just recently, for instance, we had a customer of ours, the CEO phones us up and says, you know what, we've got this problem with our sales. We don't really know why this is going down, you know, in this country, in this part of the world, it's going up, in this country, it's going down, we don't know why, and that's making us very nervous. Could you come in and just get the data together, work out why it's happening, so that we can understand what it is. And we came in, and within weeks, we were able to give them a very good insight into exactly why that is, and they changed their strategy, moving forward, for the next year, to focus on addressing that problem, and that's really amazing ROI for a company to be able to get that insight. Now, we're working with them to operationalize that, so that particular insight is always available to them, and that's an example of how companies are now starting to see that ROI come through, and a lot of it is about being able to articulate the right business question, rather than trying to worry about reports. What is the business question I'm trying to solve or answer, and that's when you can start to see the ROI come through. >> Can you talk about the customer orientation when they get to that insight, because you mentioned earlier that they got used to the reports, and you mentioned visualization, Tableau, they become table states, once you get addicted to the visualization, you want to extract more insights so the pressure seems to be getting more insight. So, two questions, process gap around what they need to do process-wise, and then just organizational behavior. Are they there mentally, what are some of the criteria in your mind, in your experiments, with customers around the processes that they go through, and then organizational mindset. >> Yeah, so what I would say is, first of all, from an organizational mindset perspective, it's very important to start educating, not just the analysis team, but the entire business on what this whole machine-learning, big data thing is all about, and how to ask the right questions. So, really starting to think about the opportunities you have to move your business forward, rather than what you already know, and think forward rather than retrospective. So, the other thing we often have to teach people, as well, is that this isn't about what you can get from the data warehouse, or replacing your data warehouse or anything like that. It's about answering the right questions, with the right tools, and here is a whole set of tools that allow you to answer different questions that you couldn't before, so leverage them. So, that's very important, and so that mindset requires time actually, to transform business into that mindset, and a lot of commitment from the business to make that happen. >> So, mindset first, and then you look at the process, then you get to the product. >> Yep, so, and basically, once you have that mindset, you need to set up an engine that's going to run, and start to drive the ROI out, and the engine includes, you know, your technical folk, but also your business users, and that engine will then start to build up momentum. The momentum builds more interest, and, overtime, you start to get your entire business into using these tools. >> It kind of makes sense, just kind of riffing in real time here, so the product-gap conversation should probably come after you lay that out first, right? >> Totally, yeah, I mean, you don't choose a product before you know what you need to do with it. So, but actually often companies don't know what they need to do with it, because they've got the wrong mindset in the first place. And so part of the road map stuff that we do, that we have a road map offering, is about changing that mindset, and helping them to get through that first stage, where we start to put, articulate the right use cases, and that really is driving a lot of value for our customers. Because they start from the right place-- >> Sometimes we hear stories, like the product kind of gives them a blind spot, because they tend to go into, with a product mindset first, and that kind of gives them some baggage, if you will. >> Well, yeah, because you end up with a situation, where you go, you get a product in, and then you say what can we do with it. Or, in fact, what happens is the vendor will say, these are the things you could do, and they give you use cases. >> It constrains things, forecloses tons of opportunities, because you're stuck within a product mindset. >> Yeah, exactly that, and you're not, you don't want to be constrained. And that's why open-source, and the kind of ecosystem that we have within the big data space is so powerful, because there's so many different tools for different things but don't choose your tool until you know what you're trying to achieve. >> I have a market question, maybe you just give us opinion, caveat, if you like, it's sort of a global, macro view. When we started first looking at the big data market, we noticed right away the dominant portion of revenue was coming from services. Hardware was commodity, so, you know, maybe sort of less than you would, obviously, in a mainframe world, and open-source software has a smaller contribution, so services dominated, and, frankly, has continued to dominate, since the early days. Do you see that changing, or do you think those percentages, if you will, will stay relatively constant? >> Well, I think it will change over time, but not in the near future, for sure, there's too much advancement in the technology landscape for that to stop, so if you had a set of tools that weren't really evolving, becoming very mature, and that's what tools you had, ultimately, the skill sets around them start to grow, and it becomes much easier to develop stuff, and then companies start to build out industry- or solutions-specific stuff on top, and it makes it very easy to build products. When you have an ecosystem that's evolving, growing with the speed it is, you're constantly trying to keep up with that technology, and, therefore, services have to play an awful big part in making sure that you are using the right technology, at the right time, and so, for the near future, for certain, that won't change. >> Complexity is your friend. >> Yeah, absolutely. Well, you know, we live in a complex world, but we live and breathe this stuff, so what's complex to some is not to us, and that's why we add value, I guess. >> Mike Merritt-Holmes here inside The Cube with Teradata Think Big. Thanks for spending the time sharing your insights. >> Thank you for having me. >> Understand the organizational mindset, identify the process, then figure out the products. That's the insight here on The Cube, more coverage of Data Works Summit 2017, here in Germany after this short break. (upbeat electronic music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Horton Works. formerly the co-founder of and I got to ask you, you know, I mean it's a great place to be. but the big data stuffs and they are, you know, of the fence there. that for you quicker. and when you start to put but what advice would you give customers? a lot of the greatest if you don't know what you're looking for, got to have enough data I wonder if you could address that, and for data scientists to and you go, wow you've Bottom-up is the you know and you can't do one without the other. and then you got the is to help with those kind of things, not being really, you know, in the public place, but-- The minute you start and that's when you can start so the pressure seems to and a lot of commitment from the business then you get to the product. and the engine includes, you and helping them to get because they tend to go into, and then you say what can we do with it. because you're stuck and the kind of ecosystem that we have of less than you would, and so, for the near future, Well, you know, we live Thanks for spending the identify the process, then

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Japan	LOCATION	0.99+
Mike	PERSON	0.99+
John Furrier	PERSON	0.99+
Lego	ORGANIZATION	0.99+
Mike Merritt-Holmes	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Think Big	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
first question	QUANTITY	0.99+
Munich	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
last month	DATE	0.99+
one	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Big Data Partnership	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.98+
two constituencies	QUANTITY	0.98+
next year	DATE	0.98+
first	QUANTITY	0.98+
Nordics	LOCATION	0.98+
first stage	QUANTITY	0.98+
#DW17	EVENT	0.97+
Data Works Summit 2017	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.96+
Tableau	TITLE	0.95+
Hadoop	TITLE	0.95+
four	DATE	0.93+
Hadoop Summit	EVENT	0.93+
five years ago	DATE	0.9+
Apache	TITLE	0.89+
The Cube	ORGANIZATION	0.87+
Vice President	PERSON	0.87+
Data Works Summit Europe 2017	EVENT	0.83+
a couple of weeks ago	DATE	0.82+
one avenue	QUANTITY	0.82+
DataWorks Summit Europe 2017	EVENT	0.8+
Kaylo	PERSON	0.8+
past year	DATE	0.79+
Global Services Strategy	ORGANIZATION	0.79+
Teradata Think Big	ORGANIZATION	0.77+
three	QUANTITY	0.76+
double	QUANTITY	0.75+
Think Big -	EVENT	0.71+
Covering	EVENT	0.69+
Hadoob	ORGANIZATION	0.62+
decade	QUANTITY	0.58+
second	QUANTITY	0.58+
Cube	COMMERCIAL_ITEM	0.56+
CXO	PERSON	0.48+
Cube	ORGANIZATION	0.46+
#theCUBE	ORGANIZATION	0.45+

Adam Wilson & Joe Hellerstein, Trifacta - Big Data SV 17 - #BigDataSV - #theCUBE

>> Commentator: Live from San Jose, California. It's theCUBE covering Big Data Silicon Valley 2017. >> Okay, welcome back everyone. We are here live in Silicon Valley for Big Data SV (mumbles) event in conjunction with Strata + Hadoop. Our companion event, the Big Data NYC and we're here breaking down the Big Data world as it evolves and goes to the next level up on the step function, AI machine learning, IOT really forcing people to really focus on a clear line of the side of the data. I'm John Furrier with our announcer from Wikibon, George Gilbert and our next guest, our two executives from Trifacta. The founder and Chief Strategy Officer, Joe Hellerstein and Adam Wilson, the CEO. Guys, welcome to theCUBE. Welcome back. >> Great to be here. >> Good to be here. >> Founder, co-founder? >> Co-founder. >> Co-founder. He's a multiple co-founders. I remember it 'cause you guys were one of the first sites that have the (mumbles) in the about section on all the management team. Just to show you how technical you guys are. Welcome back. >> And if you're Trifacta, you have to have three founders, right? So that's part of the tri, right? >> The triple threat, so to speak. Okay, so a big year for you guys. Give us the update. I mean, also we had Alation announce this partnering going on and some product movement. >> Yup. >> But there's a turbulent time right now. You have a lot of things happening in multiple theaters to technical theater to business theater. And also within the customer base. It's a land grand, it seems to be on the metadata and who's going to control what. What's happening? What's going on in the market place and what's the update from you guys? >> Yeah, yeah. Last year was an absolutely spectacular year for Trifacta. It was four times growth in bookings, three times growth in customers. You know, it's been really exciting for us to see the technology get in the hands of some of the largest companies on the planet and to see what they're able to do with it. From the very beginning, we really believed in this idea of self service and democratization. We recognize that the wrangling of the data is often where a lot of the time and the effort goes. In fact, up to 80% of the time and effort goes in a lot of these analytic projects and to the extent that we can help take the data from (mumbles) in a more productive way and to allow more people in an organization to do that. That's going to create information agility that that we feel really good about and there are customers and they are telling us is having an impact on their use of Big Data and Hadoop. And I think you're seeing that transition where, you know, in the very beginning there was a lot of offloading, a lot of like, hey we're going to grab some cost savings but then in some point, people scratch their heads and said, well, wait a minute. What about the strategic asset that we were building? That was going to change the way people work with the data. Where is that piece of it? And I think as people started figuring out in order to get our (mumbles), we got to have users and use cases on these clusters and the data like itself is not a used case. Tools like Trifacta have been absolutely instrumental and really fueling that maturity in the market and we feel great about what's happening there. >> I want to get some more drilled out before we get to some of these questions for Joe too because I think you mentioned, you got some quotes. I just want to double up a click on that. It always comes up in the business model question for people. What's your business model? >> Sure. >> And doing democratization is really hard. Sometimes democratization doesn't appear until years later so it's one of those elusive things. You see it and you believe it but then making it happen are two different things. >> Yeah, sure. >> So. And appreciate that the vision they-- (mumbles) But ultimately, at the end of the day, that business model comes down to how you organized. Prove points. >> Yup. >> Customers, partnerships. >> Yeah. >> We had Alation on Stephanie (mumbles). Can you share just and connect the dots on the business model? >> Sure. >> With respect to the product, customers, partners. How was that specifically evolving? >> Adam: Sure. >> Give some examples. >> Sure, yeah. And I would say kind of-- we felt from the beginning that, you know, we wanted to turn what was traditionally a very complex messy problem dealing with data, you know, in the user experience problem that was powered by machine learning and so, a lot of it was down to, you know, how we were going to build and architect the technology needed (mumbles) for really getting the power in the hands of the people who know the data best. But it's important, and I think this is often lost in Silicon Valley where the focus on innovation is all around technology to recognize that the business model also has to support democritization so one of the first things we did coming in was to release a free version of the product. So Trifacta Wrangler that is now being used by over 4500 companies, ten of thousands of users and the power of that in terms of getting people something of value that they could start using right away on spreadsheets and files and small data and allowing them to get value but then also for us, the exchange is that we're actually getting a chance to curate at scale usage data across all of these-- >> Is this a (mumbles) product? >> It's a hybrid product. >> Okay. >> So the data stays local. It never leaves their local laptop. The metadata is hashed and put into the cloud and now we're-- >> (mumbles) to that. >> Absolutely. And so now we can use that as training data that actually has more people wrangle, the product itself gets smarter based on that. >> That's good. >> So that's creating real tangible value for customers and for us is a source of very strategic advantage and so we think that combination of the technology innovation but also making sure that we can get this in the hands of users and they can get going and as their problem grows up to be bigger and more complicated, not just spreadsheets and files on the desktop but something more complicated, then we're right there along with them for products that would have been modified. >> How about partnerships with Alation? How they (mumbles)? What are all the deals you got going on there? >> So Alation has been a great partner for us for a while and we've really deepened the integration with the announcements today. We think that cataloging and data wrangling are very complimentary and they're a natural fit. We've got customers like Munich Re, like eBay as well as MarketShare that are using both solutions in concert with one another and so, we really felt that it was natural to tighten that coupling and to help people go from inventorying what's going on in their data legs and their clusters to then cleansing, standardizing. Essentially making it fit for purpose and then ensuring that metadata can roundtrip back into the catalog. And so that's really been an extension of what we're doing also at the technical level with technologies like Cloudera Navigator with Atlas and with the project that Joe's involved with at Berkeley called Ground. So I don't know if you want to talk-- >> Yeah, tell him about Ground. >> Sure. So part of our outlook on this and this speaks to the kind of way that the landscape in the industry's shaping out is that we're not going to see customers buying until it's sort of lock in on the key components of the area for (mumbles). So for example, storage, HD (mumbles). This is open and that's key, I think, for all the players in this base at HTFS. It's not a product from a storage vendor. It's an open platform and you can change vendors along the way and you could role your own and so on. So metadata, to my mind, is going to move in the same direction. That the storage of metadata, the basic component tree that keeps the metadata, that's got to be open to give people the confidence that they're going to pour the basic descriptions of what's in their business and what their people are doing into a place that they know they can count on and it will be vendor neutral. So the catalog vendors are, in my mind, providing a functionality above that basic storage that relates to how do you search the catalog, what does the catalog do for you to suggest things, to suggest data sets that you should be looking at. So that's a value we have on top but below that what we're seeing is, we're seeing Horton and Cloudera coming out with either products re opensource and it's sort of the metadata space and what would be a shame is if the two vendors ended up kind of pointing guns inward and kind of killing the metadata storage. So one of the things that I got interested in as my dual role as a professor at Berkeley and also as a founder of a company in this space was we want to ensure that there's a free open vendor neutral metadata solution. So we began building out a project called Ground which is both a platform for metadata storage that can be sitting underneath catalog vendors and other metadata value adds. And it's also a platform for research much as we did with Spark previously at Berkeley. So Ground is a project in our new lab at Berkeley. The RISELab which is the successor to the AMPLab that gave us Spark. And Ground has now got, you know, collaboratives from Cloudera, from LinkedIn. Capital One has significantly invested in Ground and is putting engineers behind it and contributors are coming also from some startups to build out an open-sourced platform for metadata. >> How old has Ground been around? >> Joe: Ground's been around for about 12 months. It's very-- >> So it's brand new. How do people get involved? >> Brand new. >> Just standard similar to the way the AMPLab was? Just jump in and-- >> Yeah, you know-- >> Go away and-- >> It comes up on GitHub. There's (mumbles) to go download and play with. It's in alpha. And you know, we hope we (mumbles) and the usual opensource still. >> This is interesting. I like this idea because one thing you've been riffing on the cue ball of time is how do you make data addressable? Because ultimately, you know, real time you need to have access to data really really low (mumbles) to see the inside to make it work. Hence the data swamp problem right? So, how do you guys see that? 'Cause now I can just pop in. I can hear the objections. Oh, security! You know. How do you guys see the protections? I'd love to help get my data in there and get something back in return in a community model. Security? Is it the hashing? What's the-- How do you get any security (mumbles)? Or what are the issues? >> Yeah, so I mean the straightforward issues are the traditional issues of authorization and encryption and those are issues that are reasonably well-plumed out in the industry and you can go out and you can take the solutions from people like Clutter or from Horton and those solutions have plugin quite nicely actually to a variety of platforms. And I feel like that level of enterprise security is understood. It's work for vendors to work with that technology so when we went out, we make sure we were carburized in all the right ways at Trifacta to work with these vendors and that we integrated well with Navigator, we integrated with Atlas. That was, you know, there was some labor there but it's understood. There's also-- >> It's solvable basically. >> It's solvable basically and pluggable. There are research questions there which, you know, on another day we could talk about but for instance if you don't trust your cloud hosting service what do you do? And that's like an open area that we're working on at Berkeley. Intel SGX is a really interesting technology and that's based probably a topic for another day. >> But you know, I think it's important-- >> The sooner we get you out of the studio, Paolo Alto would love to drill on that. >> I think it's important though that, you know, when we talk about self service, the first question that comes up is I'm only going to let you self service as far as I can govern what's going on, right? And so I think those things-- >> Restrictions, guard rails-- >> Really going hand in here. >> About handcuffs. >> Yeah so, right. Because that's always a first thing that kind of comes out where people say, okay wait minute now is this-- if I've now got, you know-- you've got an increasing number of knowledge workers who think that is their-- and believe that it is their unalienable right to have access to data. >> Well that's the (mumbles) democratization. That's the top down, you know, governance control point. >> So how do you balance that? And I think you can't solve for one side of that equation without the other, right? And that's really really critical. >> Democratization is anarchization, right? >> Right, exactly. >> Yes, exactly. But it's hard though. I mean, and you look at all the big trends where there was, you know, web one data, web (mumbles), all had those democratization trends but they took six years to play out and I think there might be a more auxiliary with cloud when you point about this new stop. Okay George, go ahead. You might get in there. >> I wanted to ask you about, you know, what we were talking about earlier and what customers are faced with which is, you know, a lot of choice and specialization because building something end to end and having it fully functional is really difficult. So... What are the functional points where you start driving the guard rails in that Ikee cares about and then what are the user experience points where you have critical mass so that the end users then draw other compliant tools in. You with me? On sort of the IT side and the user side and then which tools start pulling those standards? >> Well, I would say at the highest level, to me what's been very interesting especially would be with that's happened in opensource is that people have now gotten accustomed to the idea that like I don't have to go buy a big monolithic stacks where the innovation moves only as fast as the slowest product in the stack or the portfolio. I can grab onto things and I can download them today and be using them tomorrow. And that has, I think, changed the entire approach that companies like Trifacta are taking to how we how we build and release product to market, how we inter operate with partners like Alation and Waterline and how we integrate with the platform vendors like Cloudera, MapR, and Horton because we recognize that we are going to have to be meniacal focused on one piece of this puzzle and to go very very deep but then play incredibly well both, you know, with all the rest of the ecosystem and so I think that is really colored our entire product strategy and how we go to market and I think customers, you know, they want the flexibility to change their minds and the subscription model is all about that, right? You got to earn it every single year. >> So what's the future of (mumbles)? 'Cause that brings up a good point we were kind of critical of Google and you mentioned you guys had-- I saw in some news that you guys were involved with Google. >> Yup. >> Being enterprise ready is not just, hey we have the great tech and you buy from us, damn it we're Google. >> Right. >> I mean, you have to have sales people. You have to have automation mechanism to create great product. Will the future of wrangling and data prep go into-- where does it end up? Because enterprises want, they want certain things. They're finicky of things. >> Right, right. >> As you guys know. So how does the future of data prep deal with the, I won't say the slowness of the enterprise, but they're more conservative, more SLA driven than they are price performance. >> But they're also more fragmented than ever before and you know, while that may not be a great thing for the customers for a company that's all about harmonizing data that's actually a phenomenal opportunity, right? Because we want to be the decision that customers make that guarantee that all their other decisions are changeable, right? And I go and-- >> Well they have legacy systems of record. This is the challenge, right? So I got the old oracle monolithic-- >> That's fine. And that's good-- >> So how do you-- >> The more the merrier, right? >> Does that impact you guys at all? How did you guys handle that situation? >> To me, to us that is more fragmentation which creates more need for wrangling because that introduces more complexity, right? >> You guys do well in that environment. >> Absolutely. And that, you know, is only getting bigger, worse, and more complicated. And especially as people go from (mumbles) to cloud as people start thinking about moving from just looking at transactions to interactions to now looking at behavior data and the IOT-- >> You're welcome in that environment. >> So we welcome that. In fact, that's where-- we went to solve this problem for Hadoop and Big Data first because we wanted to solve the problems at scale that were the most complicated and over time we can always move downstream to sort of more structured and smaller data and that's kind of what's happened with our business. >> I guess I want to circle back to this issue of which part of this value chain of refining data is-- if I'm understanding you right, the data wrangling is the anchor and once a company has made that choice then all the other tool choices have to revolve around it? Is that a-- >> Well think about this way, I mean, the bulk of the time when you talk to the analysts and also the bulk of the labor cost and these things isn't getting the data from its raw form into usage. That whole process of wrangling which is not really just data prep. It's all the things you do all day long to kind of massage these data sets and get 'em from here to there and make 'em work. That space is where the labor cost is. That also means that's spaces were the value add is because that's where your people power or your business context is really getting poured in to understand what do I have, what am I doing with it and what do I want to get out of it. As we move from bottom line IT to top line value generation with data, it becomes all the more so, right? Because now it's not just the matter of getting the reports out every month. It's also what did that brilliant in sales do to that dataset to get that much left? I need to learn from her and do a similar thing. Alright? So, that whole space is where the value is. What that means is that, you know, you don't want that space to be tied to a particular BI tool or a particular execution edge. So when we say that we want to make a decision in the middle of that enables all the other decisions, what you really want to make sure is that that work process in there is not tightly bound to the rest of the stack. Okay? And so you want to particularly pick technologies in that space that will play nicely with different storage, that play nicely with different execution environments. Today it's a dupe, tomorrow it's Amazon, the next day it's Google and they have different engines back there potentially. And you want it certainly makes your place with all the analytic and visualizations-- >> So decouple from all that? >> You want to decouple that and you want to not lock yourself in 'cause that's where the creativity's happening on the consumption side and that's where the mess that you talked about is just growing on the production side so data production is just getting more complicated. Data consumption's getting more interesting. >> That's actually a really really cool good point. >> Elaborating on that, does that mean that you have to open up interfaces with either the UI layer or at the sort of data definition layer? Or does that just mean other companies have to do the work to tie in to the styles? The styles and structures that you have already written? >> In fact it's sort of the opposite. We do the work to tie in to a lot of this, these other decisions in this infrastructure, you know. We don't pretend for a minute that people are going to sort of pick a solution like Trifacta and then build their organization around it. As your point, there's tons of legacy, technology out there. There is all kinds of things moving. Absolutely. So we, a big part of being the decoder ring for data for Trifacta and saying it's like listen, we are going to inter operate with your existing investments and we're going to make sure that you can always get at your data, you can always take it from whatever state its in to whatever state you need to be in, you can change your mind along the way. And that puts a lot of owners on us and that's the reason why we have to be so focused on this space and not jump into visualization and analytics and not jump in to its storage and processing and not try to do the other things to the right or left. Right? >> So final question. I'd like you guys both to take a stab at it. You know, just going to pivot off at what Joe was saying. Some of the most interesting things are happening in the data exploration kind of discovery area from creativity to insights to game changing stuff. >> Yup. >> Ventures potentially. >> Joe: Yup. >> The problem of the complexity, that's conflict. >> Yeah. >> So how does we resolve this? I mean, besides the Trifacta solution which you guys are taming, creating a platform for that, how do people in industry work together to solve that problem? What's the approach? >> So I think actually there's a couple sort of heartening trends on this front that make me pretty optimistic. One of these is that the inside of structures are in the enterprises we work with becoming quite aligned between IT and the line of business. It's no longer the case that the line of business that are these annoying people that they're distracting IT from their bottom line function. IT's bottom line function is being translated into a what's your value for the business question? And the answer for a savvy IT management person is, I will try to empower the people around me to be rabid fans and I will also try to make sure that they do their own works so I don't have to learn how to do it for them. Right? And so, that I think is happening-- >> Guys to this (mumbles) business guys, a bunch of annoying guys who don't get what I need, right? So it works both ways, right? >> It does, it does. And I see that that's improving sort of in the industry as the corporate missions around data change, right? So it's no longer that the IT guys really only need to take care of executives and everyone else doesn't matter. Their function really is to serve the business and I see that alignment. The other thing that I think is a huge opportunity and the part of who I-- we're excited to be so tightly coupled with Google and also have our stuff running in Amazon and at Microsoft. It's as people read platform to the cloud, a lot of legacy becomes a shed or at least become deprecated. And so there is a real-- >> Or containerized or some sort of microservice. >> Yeah. >> Right, right. >> And so, people are peeling off business function and as part of that cost savings to migrate it to the cloud, they're also simplified. And you know, things will get complicated again. >> What's (mumbles) solution architects out there that kind of re-boot their careers because the old way was, hey I got networks, I got apps and stacks and so that gives the guys who could be the new heroes coming in. >> Right. >> And thinking differently about enabling that creativity. >> In the midst of all that, everything you said is true. IT is a massive place and it always will be. And tools that can come in and help are absolutely going to be (mumbles). >> This is obvious now. The tension's obviously eased a bit in the sense that there's clear line of sight that top line and bottom line are working together now on. You mentioned that earlier. Okay. Adam, take a stab at it. (mumbling) >> I was just going to-- hey, I know it's great. I was just going to give an example, I think, that illustrates that point so you know, one of our customers is Pepsi. And Pepsi came to us and they said, listen we work with retailers all over the world and their reality is that, when they place orders with us, they often get it wrong. And sometimes they order too much and then they return it, it spoils and that's bad for us. Or they order too little and they stock out and we miss revenue opportunities. So they said, we actually have to be better at demand planning and forecasting than the orders that are literally coming in the door. So how do we do that? Well, we're getting all of the customers to give us their point of sale data. We're combining that with geospatial data, with weather data. We're like looking at historical data and industry averages but as you can see, they were like-- we're stitching together data across a whole variety of sources and they said the best people to do this are actually the category managers and the people responsible for the brands 'cause they literally live inside those businesses and they understand it. And so what happened was they-- the IT organization was saying, look listen, we don't want to be the people doing the janitorial work on the data. We're going to give that work over to people who understand it and they're going to be more productive and get to better outcomes with that information and that brings us up to go find new and interesting sources and I think that collaborative model that you're starting to see emerge where they can now be the data heroes in a different way by not being the ones beating the bottleneck on provisioning but rather can go out and figure out how do we share the best stuff across the organization? How do we find new sources of information to bring in that people can leverage to make better decisions? That's in incredibly powerful place to be and you know, I think that that model is really what's going to be driving a lot of the thinking at Trifacta and in the industry over the next couple of years. >> Great. Adam Wilson, CEO of Trifacta. Joe Hellestein, CTO-- Chief Strategy Officer of Trifacta and also a professor at Berkeley. Great story. Getting the (mumbles) right is hard but under the hood stuff's complicated and again, congratulations about sharing the Ground project. Ground open source. Open source lab kind of thing at-- in Berkeley. Exciting new stuff. Thanks so much for coming on theCUBE. I appreciate great conversation. I'm John Furrier, George Gilbert. You're watching theCUBE here at Big Data SV in conjunction with Strata and Hadoop. Thanks for watching. >> Great. >> Thanks guys.

Published Date : Mar 16 2017

SUMMARY :

It's theCUBE covering Big Data Silicon Valley 2017. and Adam Wilson, the CEO. that have the (mumbles) in the about section Okay, so a big year for you guys. and what's the update from you guys? and really fueling that maturity in the market in the business model question for people. You see it and you believe it but then that business model comes down to how you organized. on the business model? With respect to the product, customers, partners. that the business model also has to support democritization So the data stays local. the product itself gets smarter and files on the desktop but something more complicated, and to help people go from inventorying that relates to how do you search the catalog, It's very-- So it's brand new. and the usual opensource still. I can hear the objections. and that we integrated well with Navigator, There are research questions there which, you know, The sooner we get you out and believe that it is their unalienable right That's the top down, you know, governance control point. And I think you can't solve for one side of that equation and I think there might be a more auxiliary with cloud so that the end users then draw other compliant tools in. and how we go to market and I think customers, you know, I saw in some news that you guys hey we have the great tech and you buy from us, I mean, you have to have sales people. So how does the future of data prep deal with the, So I got the old oracle monolithic-- And that's good-- in that environment. and the IOT-- You're welcome in that and that's kind of what's happened with our business. the bulk of the time when you talk to the analysts and you want to not lock yourself in and that's the reason why we have to be in the data exploration kind of discovery area The problem of the complexity, in the enterprises we work with becoming quite aligned And I see that that's improving sort of in the industry as or some sort of microservice. and as part of that cost savings to migrate it to the cloud, so that gives the guys who could be In the midst of all that, everything you said is true. in the sense that there's clear line of sight and in the industry over the next couple of years. and again, congratulations about sharing the Ground project.

ENTITIES

Entity	Category	Confidence
Joe Hellerstein	PERSON	0.99+
George	PERSON	0.99+
Joe	PERSON	0.99+
George Gilbert	PERSON	0.99+
Joe Hellestein	PERSON	0.99+
John Furrier	PERSON	0.99+
Trifacta	ORGANIZATION	0.99+
Pepsi	ORGANIZATION	0.99+
Adam Wilson	PERSON	0.99+
Adam	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Waterline	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Berkeley	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
Alation	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Stephanie	PERSON	0.99+
Horton	ORGANIZATION	0.99+
LinkedIn	ORGANIZATION	0.99+
six years	QUANTITY	0.99+
one	QUANTITY	0.99+
MapR	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Capital One	ORGANIZATION	0.99+
first question	QUANTITY	0.99+
Today	DATE	0.99+
One	QUANTITY	0.99+
Last year	DATE	0.99+
two executives	QUANTITY	0.99+
Trifacta	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
one piece	QUANTITY	0.98+
both solutions	QUANTITY	0.98+
today	DATE	0.98+
over 4500 companies	QUANTITY	0.98+
Intel	ORGANIZATION	0.98+
both ways	QUANTITY	0.98+
both	QUANTITY	0.98+
three founders	QUANTITY	0.97+
two vendors	QUANTITY	0.97+
first sites	QUANTITY	0.97+
Ground	ORGANIZATION	0.97+
Munich Re	ORGANIZATION	0.97+
about 12 months	QUANTITY	0.97+
NYC	LOCATION	0.96+
first thing	QUANTITY	0.96+
four times	QUANTITY	0.96+
eBay	ORGANIZATION	0.95+
Wikibon	ORGANIZATION	0.95+
Paolo Alto	PERSON	0.95+
next day	DATE	0.95+
three times	QUANTITY	0.94+
ten of thousands of users	QUANTITY	0.93+
one side	QUANTITY	0.93+
years later	DATE	0.92+

Arun Murthy, Hortonworks - Spark Summit East 2017 - #SparkSummit - #theCUBE

>> [Announcer] Live, from Boston, Massachusetts, it's the Cube, covering Spark Summit East 2017, brought to you by Data Breaks. Now, your host, Dave Alante and George Gilbert. >> Welcome back to snowy Boston everybody, this is The Cube, the leader in live tech coverage. Arun Murthy is here, he's the founder and vice president of engineering at Horton Works, father of YARN, can I call you that, godfather of YARN, is that fair, or? (laughs) Anyway. He's so, so modest. Welcome back to the Cube, it's great to see you. >> Pleasure to have you. >> Coming off the big keynote, (laughs) you ended the session this morning, so that was great. Glad you made it in to Boston, and uh, lot of talk about security and governance, you know we've been talking about that years, it feels like it's truly starting to come into the main stream Arun, so. >> Well I think it's just a reflection of what customers are doing with the tech now. Now, three, four years ago, a lot of it was pilots, a lot of it was, you know, people playing with the tech. But increasingly, it's about, you know, people actually applying stuff in production, having data, system of record, running workloads both on prem and on the cloud, cloud is sort of becoming more and more real at mainstream enterprises. So a lot of it means, as you take any of the examples today any interesting app will have some sort of real time data feed, it's probably coming out from a cell phone or sensor which means that data is actually not, in most cases not coming on prem, it's actually getting collected in a local cloud somewhere, it's just more cost effective, why would we put up 25 data centers if you don't have to, right? So then you got to connect that data, production data you have or customer data you have or data you might have purchased and then join them up, run some interesting analytics, do geobased real time threat detection, cyber security. A lot of it means that you need a common way to secure data, govern it, and that's where we see the action, I think it's a really good sign for the market and for the community that people are pushing on these dimensions of the broader, because, getting pushed in this dimension because it means that people are actually using it for real production work loads. >> Well in the early days of Hadoop you really didn't talk that much about cloud. >> Yeah. >> You know, and now, >> Absolutely. >> It's like, you know, duh, cloud. >> Yeah. >> It's everywhere, and of course the whole hybrid cloud thing comes into play, what are you seeing there, what are things you can do in a hybrid, you know, or on prem that you can't do in a public cloud and what's the dynamic look like? >> Well, it's definitely not an either or, right? So what we're seeing is increasingly interesting apps need data which are born in the cloud and they'll stay in the cloud, but they also need transactional data which stays on prem, you might have an EDW for example, right? >> Right. >> There's not a lot of, you know, people want to solve business problems and not just move data from one place to another, right? Or back from one place to another, so it's not interesting to move an EDW to the cloud, and similarly it's not interesting to bring your IOT data or sensor data back into on-prem, right? Just makes sense. So naturally what happens is, you know, at Hortonworks we talk of kinds of modern app or a modern data app, which means a modern data app has to spare, has to sort of, you know, it can pass both on-prem data and cloud data. >> Yeah, you talked about that in your keynote years ago. Furio said that the data is the new development kit. And now you're seeing the apps are just so dang rich, >> Exactly, exactly. >> And they have to span >> Absolutely. >> physical locations, >> Yeah. >> But then this whole thing of IOT comes up, we've been having a conversation on The Cube, last several Cubes of, okay, how much stays out, how much stays in, there's a lot of debates about that, there's reasons not to bring it in, but you talked today about some of the important stuff will come back. >> Yeah. >> So the way this is, this all is going to be, you know, there's a lot of data that should be born in the cloud and stay there, the IOT data, but then what will happen increasingly is, key summaries of the data will move back and forth, so key summaries of your EDW will move to the cloud, sometimes key summaries of your IOT data, you know, you want to do some sort of historical training in analytics, that will come back on-prem, so I think there's a bi-directional data movement, but it just won't be all the data, right? It'll be key interesting summaries of the data but not all of it. >> And a lot of times, people say well it doesn't matter where it lives, cloud should be an operating model, not a place where you put data or applications, and while that's true and we would agree with that, from a customer standpoint it matters in terms of performance and latency issues and cost and regulation, >> And security and governance. >> Yeah. >> Absolutely. >> You need to think those things through. >> Exactly, so I mean, so that's what we're focused on, to make sure that you have a common security and governance model regardless of where data is, so you can think of it as, infrastructure you own and infrastructure you lease. >> Right. >> Right? Now, the details matter of course, when you go to the cloud you lose S3 for example or ADLS from Microsoft, but you got to make sure that there's a common sort of security governance front and top of it, in front of it, as an example one of the things that, you know, in the open source community, Ranger's a really sort of key project right now from a security authorization and authentication standpoint. We've done a lot of work with our friends at Microsoft to make sure, you can actually now manage data in Wasabi which is their object store, data stream, natively with Ranger, so you can set a policy that says only Dave can access these files, you know, George can access these columns, that sort of stuff is natively done on the Microsoft platform thanks to the relationship we have with them. >> Right. >> So that's actually really interesting for the open source communities. So you've talked about sort of commodity storage at the bottom layer and even if they're different sort of interfaces and implementations, it's still commodity storage, and now what's really helpful to customers is that they have a common security model, >> Exactly. >> Authorization, authentication, >> Authentication, lineage prominence, >> Oh okay. >> You want to make sure all of these are common sources across. >> But you've mentioned off of the different data patterns, like the stuff that might be streaming in on the cloud, what, assuming you're not putting it into just a file system or an object store, and you want to sort of merge it with >> Yeah. >> Historical data, so what are some of the data stores other than the file system, in other words, newfangled databases to manage this sort of interaction? >> So I think what you're saying is, we certainly have the raw data, the raw data is going to line up in whatever cloud native storage, >> Yeah. >> It's going to be Amazon, Wasabi, ADLS, Google Storage. But then increasingly you want, so now the patterns change so you have raw data, you have some sort of an ETL process, what's interesting in the cloud is that even the process data or, if you take the unstructured raw data and structure it, that structured data also needs to live on the cloud platform, right? The reason that's important is because A, it's cheaper to use the native platform rather than set up your own database on top of it. The other one is you also want to take advantage of all the native sources that the cloud storage provides, so for example, linking your application. So automatically data in Wasabi, you know, if you can set up a policy and easily say this structured data stable that I have of which is a summary of all the IOT activity in the last 24 hours, you can, using the cloud provider's technologies you can actually make it show up easily in Europe, like you don't have to do any work, right? So increasingly what we Hortonworks focused a lot on is to make sure that we, all of the computer engines, whether it's Spark or Hive or, you know, or MapReduce, it doesn't really matter, they're all natively working on the cloud provider's storage platform. >> [George] Okay. >> Right, so, >> Okay. >> That's a really key consideration for us. >> And the follow up to that, you know, there's a bit of a misconception that Spark replaces Hadoop, but it actually can be a processing, a compute engine for, >> Yeah. >> That can compliment or replace some of the compute engines in Hadoop, help us frame, how you talk about it with your customers. >> For us it's really simple, like in the past, the only option you had on Hadoop to do any computation was MapReduce, that was, I started working in MapReduce 11 years ago, so as you can imagine, it's a pretty good run for any technology, right? Spark is definitely the interesting sort of engine for sort of the, anything from mission learning to ETL for data on top of Hadoop. But again, what we focus a lot on is to make sure that every time we bring in, so right now, when we started on HTP, the first on HTP had about nine open source projects literally just nine. Today, the last one we shipped was 2.5, HTP 2.5 had about 27 I think, like it's a huge sort of explosion, right? But the problem with that is not just that we have 27 projects, the problem is that you're going to make sure each of the 27 work with all the 26 others. >> It's a QA nightmare. >> Exactly. So that integration is really key, so same thing with Spark, we want to make sure you have security and YARN (mumbles), like you saw in the demo today, you can now run Spark SQL but also make sure you get low level (mumbles) masking, all of the enterprise capabilities that you need, and I was at a financial services three or four weeks ago in Chicago. Today, to do equivalent of what I showed today on demo, they need literally, they have a classic ADW, and they have to maintain anywhere between 1500 to 2500 views of the same database, that's a nightmare as you can imagine. Now the fact that you can do this on the raw data using whether it's Hive or Spark or Peg or MapReduce, it doesn't really matter, it's really key, and that's the thing we push to make sure things like YARN security work across all the stacks, all the open source techs. >> So that makes life better, a simplification use case if you will, >> Yeah. >> What are some of the other use cases that you're seeing things like Spark enable? >> Machine learning is a really big one. Increasingly, every product is going to have some, people call it, machine learning and AI and deep learning, there's a lot of techniques out there, but the key part is you want to build a predictive model, in the past (mumbles) everybody want to build a model and score what's happening in the real world against model, but equally important make sure the model gets updated as more data comes in on and actually as the model scores does get smaller over time. So that's something we see all over, so for example, even within our own product, it's not just us enabling this for the customer, for example at Hortonworks we have a product called SmartSense which allows you to optimize how people use Hadoop. Where the, what are the opportunities for you to explore deficiencies within your own Hadoop system, whether it's Spark or Hive, right? So we now put mesh learning into SmartSense. And show you that customers who are running queries like you are running, Mr. Customer X, other customers like you are tuning Hadoop this way, they're running this sort of config, they're using these sort of features in Hadoop. That allows us to actually make the product itself better all the way down the pipe. >> So you're improving the scoring algorithm or you're sort of replacing it with something better? >> What we're doing there is just helping them optimize their Hadoop deploys. >> Yep. >> Right? You know, configuration and tuning and kernel settings and network settings, we do that automatically with SmartSense. >> But the customer, you talked about scoring and trying to, >> Yeah. >> They're tuning that, improving that and increasing the probability of it's accuracy, or is it? >> It's both. >> Okay. >> So the thing is what they do is, you initially come with a hypothesis, you have some amount of data, right? I'm a big believer that over time, more data, you're better off spending more, getting more data into the system than to tune that algorithm financially, right? >> Interesting, okay. >> Right, so you know, for example, you know, talk to any of the big guys on Facebook because they'll do the same, what they'll say is it's much better to get, to spend your time getting 10x data to the system and improving the model rather than spending 10x the time and improving the model itself on day one. >> Yeah, but that's a key choice, because you got to >> Exactly. >> Spend money on doing either, >> One of them. >> And you're saying go for the data. >> Go for the data. >> At least now. >> Yeah, go for data, what happens is the good part of that is it's not just the model, it's the, what you got to really get through is the entire end to end flow. >> Yeah. >> All the way from data aggregation to ingestion to collection to scoring, all that aspect, you're better off sort of walking through the paces like building the entire end to end product rather than spending time in a silo trying to make a lot of change. >> We've talked to a lot of machine learning tool vendors, application vendors, and it seems like we got to the point with Big Data where we put it in a repository then we started doing better at curating it and understanding it then starting to do a little bit exploration with business intelligence, but with machine learning, we don't have something that does this end to end, you know, from acquiring the data, building the model to operationalizing it, where are we on that, who should we look to for that? >> It's definitely very early, I mean if you look at, even the EDW space, for example, what is EDW? EDW is ingestion, ETL, and then sort of fast query layer, Olap BI, on and on and on, right? So that's the full EDW flow, I don't think as a market, I mean, it's really early in this space, not only as an overall industry, we have that end to end sort of industrialized design concept, it's going to take time, but a lot of people are ahead, you know, the Google's a world ahead, over time a lot of people will catch up. >> We got to go, I wish we had more time, I had so many other questions for you but I know time is tight in our schedule, so thanks so much Arun, >> Appreciate it. For coming on, appreciate it, alright, keep right there everybody, we'll be back with our next guest, it's The Cube, we're live from Spark Summit East in Boston, right back. (upbeat music)

Published Date : Feb 9 2017

SUMMARY :

brought to you by Data Breaks. father of YARN, can I call you that, Glad you made it in to Boston, So a lot of it means, as you take any of the examples today you really didn't talk that has to sort of, you know, it can pass both on-prem data Yeah, you talked about that in your keynote years ago. but you talked today about some of the important stuff So the way this is, this all is going to be, you know, And security and You need to think those so that's what we're focused on, to make sure that you have as an example one of the things that, you know, in the open So that's actually really interesting for the open source You want to make sure all of these are common sources in the last 24 hours, you can, using the cloud provider's in Hadoop, help us frame, how you talk about it with like in the past, the only option you had on Hadoop all of the enterprise capabilities that you need, Where the, what are the opportunities for you to explore What we're doing there is just helping them optimize and network settings, we do that automatically for example, you know, talk to any of the big guys is it's not just the model, it's the, what you got to really like building the entire end to end product rather than but a lot of people are ahead, you know, the Google's everybody, we'll be back with our next guest, it's The Cube,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
George Gilbert	PERSON	0.99+
Dave Alante	PERSON	0.99+
Arun Murthy	PERSON	0.99+
Europe	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
10x	QUANTITY	0.99+
Boston	LOCATION	0.99+
Chicago	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
George	PERSON	0.99+
Arun	PERSON	0.99+
Wasabi	ORGANIZATION	0.99+
25 data centers	QUANTITY	0.99+
Today	DATE	0.99+
Hadoop	TITLE	0.99+
Wasabi	LOCATION	0.99+
YARN	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
ADLS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Horton Works	ORGANIZATION	0.99+
today	DATE	0.99+
Data Breaks	ORGANIZATION	0.99+
1500	QUANTITY	0.98+
SmartSense	TITLE	0.98+
S3	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
One	QUANTITY	0.98+
27 projects	QUANTITY	0.98+
three	DATE	0.98+
Google	ORGANIZATION	0.98+
Furio	PERSON	0.98+
Spark	TITLE	0.98+
2500 views	QUANTITY	0.98+
first	QUANTITY	0.97+
Spark Summit East	LOCATION	0.97+
both	QUANTITY	0.97+
Spark SQL	TITLE	0.97+
Google Storage	ORGANIZATION	0.97+
26	QUANTITY	0.96+
Ranger	ORGANIZATION	0.96+
four weeks ago	DATE	0.95+
one	QUANTITY	0.94+
each	QUANTITY	0.94+
four years ago	DATE	0.94+
11 years ago	DATE	0.93+
27 work	QUANTITY	0.9+
MapReduce	TITLE	0.89+
Hive	TITLE	0.89+
this morning	DATE	0.88+
EDW	TITLE	0.88+
about nine open source	QUANTITY	0.88+
day one	QUANTITY	0.87+
nine	QUANTITY	0.86+
years	DATE	0.84+
Olap	TITLE	0.83+
Cube	ORGANIZATION	0.81+
a lot of data	QUANTITY	0.8+

Shaun Connolly, Hortonworks - BigDataNYC - #BigDataNYC - #theCUBE

(upbeat electronic music) >> Male Voiceover: Live from New York, it's the Cube, covering big data New York City 2016. Brought to you by headline sponsors Sisco, IBM, Nvidia, and our ecosystem sponsors. Now, here are your hosts. Dave Vellante and Peter Burress. >> We're back in the Big Apple. This is the Cube, the worldwide leader in live tech coverage, we're here at Big Data NYC, Big Data week is part of strata plus dupe world. Shaun Connolly is here as the vice president of strategy at Horton Works, long time friend and Cube alum, great to see you again. >> Thanks for having me, were back at the same venue last year, always a pleasure. >> Yeah, it's good, we're growing, I guess the event's growing, we haven't been over there yet, but some of our guys have, but what's it like over there? >> You know, it feels the same, some of the different use cases, I think last year was streaming, we're hearing more machine learning and things like that as far as use cases, so similar vibe. >> Yeah, so things are evolving, right? How's Hortonworks evolving? >> We're continuing to report our quarterly earnings as the only publicly traded company in this space, things from a business perspective are doing well. Our connected data platforms strategy which we unveiled at the beginning of this year, which is written data in motion and data at rest and enabling these new gen transformational applications continues to play out. The data in motion piece is sort of decoupled and unrelated to a hadou platform, it's really about acquiring and handling the FedEx for data delivery type notions, data logistics, secure transmission. That's based on the Apache Ni-Fi tech that was originally built sort of at the NSA over the past eight years, so. Really a nice robust piece of technology that we've pushed out to the edge in our latest release so you can really skin these down into a secure site to site transmission. A lot of sophisticated capabilities there, so we're seeing a lot of uptake in that sort of architectural vision, the products are maturing, both on prem and in the cloud, things are pretty exciting. >> Well this cloud thing seems pretty real. (Shaun laughing) You can get a lot of traction, right? Everybody kind of knew it was coming, but what are you seeing? >> Yeah so it was, I guess I started the journey back in 2009, when I was at Springsource in Paul Moretz was CEO of Vmware, and that was pre sort of cloud at that time. We were talking about this notion of platform as a service, and things like that. And that resonated really well with folks back then, but their main ask was how do you solve the data problem, how do actually get the data to the apps that need it. Fast forward to 2016, I think it's been a lot of open source innovation, you know a lot of commercial innovation, the rise of cloud for providing a fast path to value, booting up these used cases, it's a fascinating transition to watch. Many of our customers are, people use the word hybrid. What that means to me is they'll have data center workloads, or multi data center workloads, but they also have cloud workloads, sometimes even multi cloud workloads, and that inherent nature of the beast is why I use sort of the term of connected data architecture, is y%ou need an architecture that inherently is built to span that fact. And that's just increasing, that's just the world we live in today. >> But the fact is because there speed of light issues, there's data fidelity issues. >> Shaun: Yup. >> There's other types of things, how are you starting to see those practical and very physical realities start to impact the whole concept of design as it pertains to data, as it pertains to analytics, as it pertains to the infrastructure associated with the two? >> Yup, so at Hoop Summit that we had last June, there were really some really good sessions that were there. Folks like Comcast, Ford, Schlumberger talked about this connected data architecture reality, right. If you look at like, I like to use the connected car ecosystem as a good example, cause there were insurance providers and others that were sort of speaking on behalf of that, where you have the cars and other data that's inherently born up there, and there's a slug of use cases that are around edge analytics, streaming analytics, time series analytics, and we're seeing that, and I think the cloud lends itself really well for those types of use cases. But we also see manufacturing line data for the cars, where you want to get a 360 degree view of operational issues, and dovetail that with manufacturing line elements, and that's inherently what we've seen is, what your classic sort of on prem data wake, in quotes has been used for so you can get that 360 degree operational intelligence type of analytics to come out of that, right? So that type of use case, whether you apply it to oil and gas and having the sensors on the oil rigs, in the Schlumberger example, that pattern is repeating itself across different industries. British Gas, in Europe talks about how they're fundamentally changing the nature of the relationship with their customer because of the smart meters, and their connectivity in the homes and they can deliver a better value there. So that's inherently connected data realm, there's cloud use cases, and in the data center use cases. So I see these use cases, you know, they'll be use case specific in applications that are sprinkled across that fabric, if you will. And that's really what we're seeing. >> At our panel last year here in this venue, we would talk about a lot of things, one was the market, the sort of ebbs and flows you just mentioned, you guys are the only public player, Talon's joining that crew. >> Shaun: Yeah. Excellent. >> You've seen some. >> Shaun: We need more. >> We need more, we've seen some MNA, Plat 4 taken out, I don't know if that was, I don't know the specifics of that deal. Might have been an acu hire, might not, I don't know. And Data Mere did a raise, so you're seeing these rip currents, in all directions. What are you seeing in the marketplace, lot of funding early on, lot of players, lot of innovation, and now it's like, okay, the music at some point's going to stop, but. >> Yeah. >> What's your take? >> So in our last call, and I think we repeated it on our prior earnings call, you know, our focus and then we put out there in our earnings, in our Q3 earnings will sort of reiterate where we stand is, we basically said Q4 is when we look to go adjust to even or break even. >> Right. >> And then 2017 we'll go from there. We reiterated that guidance, we had a little over 62 million in billings for the quarter, so the business is pretty robust and growing, it's a. We're only five years into this, I mean we're just five years old, so it's a very fast pace of billings growth, right? That's almost a 250 million run rate, right? For exiting that quarter. You know, annual run rate. So we see a lot of the use cases really continuing to move on. I think what I and what our customers ask us is, they're on a digital transformation journey, and they want the industry to start talking about those types of business value drivers, right? So I think we should expect to see a transition from the piece parts animals in the zoo and what's the right open source piece of technology, and more why should you care, right? As a business, how is this transforming what you do? How does this open up new lines of business? We started seeing that at Hadoop Summit when I think about two dozen customers were sharing, very rich stories, right? So that's where things are. But I think running a company is, you have to run it with a certain sense of rigor and that was one of the reasons why we chose to go public, right? >> So, we by the way, we totally agree that customers want to stop talking about digital business in platitudes and start actually identifying specifically what is it about it that's new and different, and find ways of doing it. >> Shaun: Sure. >> Coming back to the issue, however, of how you go about making some of those transformations relevant. There is clearly a knowledge gap about what digital business is, what it isn't, certainly. But there's also a fair amount of skills that have yet to be developed, that are required for a lot of the use cases that companies are pursuing. Not just in terms of implementing the technology appropriately, but actually constructing and conceptualizing the use cases. >> Shaun: Sure. >> So that suggests that there's two paths forward. There's a path forward where we can do a better job of diffusing knowledge through people, and there's a path for where we can do a better job of building software that's easier to use. >> Shaun: Mm hmm. >> And there's both. How do you see this playing out over the course of the next few years? >> Yep, and I think in any new area as technology's emerging, like one of the things I use is Apache Software Foundation. Literally every other week there's a new data related Apache project that lands, so it's. It can be really confusing, but it's exhilarating from the fact of I participate in that, and I try and figure out what ones we can harness in a consumable platform, whether it's one prem or a cloud or what have you. What use cases can it light up? So I think you have both of those vectors, and it really depends on, I like to use the classic software adoption curve, you have a lot of the left side of the chasm folks, where a lot of this new stuff is going to be sharper edges, and they're always going to be trailblazers, right? But we are also seeing a lot of some of these advanced analytics. Some of these new solutions are automating the pipeline, so you can actually let the infrastructure and these engines do more of the thinking for you, so you get your model's output. Even to the point where you run multi model simulation in parallel, and out pops the best fit. That's where things will head, right? I think it's just a matter of the technology maturing, making sure we address things like security, metadata management, governance, and those illities that the enterprise expects, and then really forcing ourselves to simplify and automate as much as possible, right. And that was one of the reasons on that last one why in October 2011 we basically chose Teradata and Microsoft as key partners. Teradata because in 2011, clearly, right? >> Peter: Teradata. >> They're Teradata, right? Microsoft because it simplifies technologies and brings them to billions of users, right? And so we need to do both, you need to harden it, right? For the most rigorous large enterprises, but you need to simplify it for the meat of the market adopters, right? The early majority and late majority. You have to do both. >> Shaun, you're sitting across from a CEO, and you have to say these are the three things you need to do to enact this digital transformation. >> Shaun: Yup. >> What are the three things you're telling him? >> So, I think they need as a business to identify how do they want to leverage data as capital, and what pockets of value do they want to go chase, number one. Number two, how is their business being impacted by the fact that you have the rise of IOT and inherently increasing connected society and infrastructure. How is that impacting them? And number three is, how do they evolve what they're used to doing, right? You have to align it, exactly. >> Because that's really many respects of, I like to say there's a difference between invention and innovation. Invention is the engineering act, innovation's a social act, it's adopting those new practices >> Shaun: Exactly. >> That actually allow you to enact the invention and generate revenue. >> Exactly. Now in our space, I think we have a very compelling renovate value prop which is a cost savings where you can drive cost out, but the innovate use cases are the ones. Like if all you're going to do is renovate, then you will fail, you will stall, right? Because it's not a balance of cost savings. It's about how do you actually transform your business. And in the case of like the British Gas example, I used that as how they engaged that end consumer is fundamentally changing. So that's the question I put back in those conversations is how do you want to evolve your business and how do you leverage data as capital? Because the beauty of data as capital is you can actually generate multiple lines of interest off of a single data set, cause you can derive different insights off of that, so it's not like a dollar, right? And single compound, it's multiple compound annual interest rate on that. But they have to chase the right use cases. >> Although, we've also learned from great design that if you do the right thing better, you get rid of a lot waste and so coming back to your point, doing the right thing better often leads to cost savings. >> Yes. Exactly. One inherently can drive the other, but if you're just driving it then >> Peter: Just doing cost. >> You're not going to transform your buisiness. >> Peter: You're just going to continue to do the same or wrong things worse. >> Shaun: Exactly. >> Or wrong things cheaper. >> And that's difficult for enterprises. Because there's a certain way to do data management inherently inside in a highly structured manner, but I do think the rise of like IOT, I don't see as a market, I see it as infinite slices of prosciutto, right? (laughter) It's a very thinly sliced set of market opportunities, right? But it's forcing people to think about different use cases and how that might impact their business. >> We see those set of capabilities. >> Yup. >> Which leads to the prosciutto. >> Exactly. >> So you, and come up with a really nice sandwich. (laughter) >> It's my Italian. >> Let's keep going. >> I'm loving it. >> I'm getting a little hungry. >> You have always made a big deal out of your partnerships not being barney deals but being deep integration relationships. So you mentioned two here, Teradata and Microsoft. As the cloud becomes more prevalent, as things evolve and machine learning becomes the hot buzzword, et cetera. How have you evolved those relationships specifically in terms of the integration work that you've done? Have you kept up that engineering ethos, or? >> And that was the thing. With Microsoft, we clearly spent a lot of sweat equity on the Azure HDInsight service, but if you look at that ecosystem, they have Azure machine learning, right? They have a whole raft of services, right, that you can apply to the data when it's in the cloud, right? So how that piece integrates with the broader ecosystem of services is a lot of engineering work as well. I've always said, there's work to be done in our green box, but the other half of the work is how it plumbs into the rest. And so if you look at the AWS ecosystem, how do you optimize for S3 as a storage tier, and ephemeral workloads where HDFS is maybe a caching mechanism but it's not your primary storage, right? It brings up really interesting integration modes and how you actually bring your value out into really interesting use cases, right? So I think it's opened up a lot of areas where we can drive a lot more integration, drive the open source tech in a way that's relevant for those use cases. >> Alright, we got to go but, summit in Tokyo, is it next month? >> Yes, end of October. >> End of October. >> It's our first time, so primarily summits have been US and Europe. We had Melbourne end of August, and we have Tokyo end of October. I'll be, they're bringing the right hander out of retirement, so I'll be onstage in Tokyo. (laughing) I've usually been behind the scenes. >> Throwing the slurve? (laughter) >> Yeah, exactly. So I'm looking forward to it, it'll be exciting. >> Alright, good, and then 17, you're going to start again in the spring. >> Shaun: Yup. >> You're in Munich. >> Shaun: Yup. Munich. >> You were in Dublin last year, you're moving to Munich this year. >> Shaun: Exactly. >> Hopefully the Cube will be back, in Munich, alright? >> We love you guys, you guys do a good job. >> Let's make it happen, do good stuff in Europe, so thanks again for coming out. >> Shaun: Thanks for having me. >> Always a pleasure. Alright, keep it right there, we'll be back right after this short break. This is the Cube, we're live from New York City. ( upbeat electronic music)

Published Date : Sep 29 2016

SUMMARY :

Brought to you by headline sponsors and Cube alum, great to see you again. at the same venue last the same, some of the of at the NSA over the but what are you seeing? nature of the beast is why I use But the fact is because there in the data center use cases. and flows you just mentioned, you guys Shaun: Yeah. okay, the music at some So in our last call, and I think so the business is pretty of doing it. for a lot of the use and there's a path for where we can do a of the next few years? the pipeline, so you can actually let the for the meat of the market and you have to say these by the fact that you have the rise of IOT Invention is the engineering you to enact the invention And in the case of like that if you do the right thing better, One inherently can drive the other, You're not going to to do the same or wrong things worse. But it's forcing people to think about So you, and come up with of the integration work of sweat equity on the of August, and we have to it, it'll be exciting. start again in the spring. Shaun: Yup. to Munich this year. We love you guys, so thanks again for coming out. This is the Cube, we're

ENTITIES

Entity	Category	Confidence
Shaun	PERSON	0.99+
Comcast	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Dublin	LOCATION	0.99+
Nvidia	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Sisco	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
2011	DATE	0.99+
British Gas	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
Peter	PERSON	0.99+
Shaun Connolly	PERSON	0.99+
October 2011	DATE	0.99+
Tokyo	LOCATION	0.99+
New York City	LOCATION	0.99+
Apache Software Foundation	ORGANIZATION	0.99+
2009	DATE	0.99+
2016	DATE	0.99+
two	QUANTITY	0.99+
Teradata	ORGANIZATION	0.99+
360 degree	QUANTITY	0.99+
FedEx	ORGANIZATION	0.99+
one	QUANTITY	0.99+
last year	DATE	0.99+
Vmware	ORGANIZATION	0.99+
2017	DATE	0.99+
five years	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
last year	DATE	0.99+
both	QUANTITY	0.99+
Springsource	ORGANIZATION	0.99+
this year	DATE	0.99+
Melbourne	LOCATION	0.99+
last June	DATE	0.99+
Schlumberger	ORGANIZATION	0.99+
Big Apple	LOCATION	0.99+
NYC	LOCATION	0.99+
first time	QUANTITY	0.99+
End of October	DATE	0.99+
end of October	DATE	0.99+
next month	DATE	0.99+
end of August	DATE	0.98+
Apache	ORGANIZATION	0.98+
single	QUANTITY	0.98+
two paths	QUANTITY	0.98+
Horton Works	ORGANIZATION	0.98+
BigDataNYC	ORGANIZATION	0.97+
over 62 million	QUANTITY	0.97+
US	LOCATION	0.96+
Azure	TITLE	0.96+
billions of users	QUANTITY	0.93+
today	DATE	0.92+

Tendu Yogurtcu, Syncsort - #BigDataSV 2016 - #theCUBE

from San Jose in the heart of Silicon Valley it's the kue covering big data sv 2016 now your host John furrier and George Gilbert okay welcome back on we are here live in Silicon Valley for the cubes looking angles flagship program we go out to the events and extract the signal from the noise i'm john furrier mykos george gilbert big data analyst at Wikibon calm our next guest is 10 do yoga coo to yogurt coo coo I you see your last name yo Joe okay I gots clothes GM with big David sinks or welcome back to the cube sink starts been a long time guess one of those companies we love to cover because your value publishes is right in the center of all the action around mainframes and you know Dave and I always love to talk about mainframe not mean frame guys we know that we remember those days and still powering a lot of the big enterprises so I got to ask you you know what's your take on the show here one of the themes that came up last night on crowd chatters why is enterprise data warehousing failing so you know got some conversation but you're seeing a transformation what do you guys see thank you for having me it's great to be here yes we are seeing the transformation of the next generation data warehouse and evolution of the data warehouse architecture and as part of that mainframes are a big part of this data warehouse architecture because still seventy percent of data is on the mainframes world's data seventy percent of world's data this is a large amount of data so when we talk about big data architecture and making big data and enterprise data useful for the business and having advanced analytics not just gaining operational efficiencies with the new architecture and also having new products new services available to the customers of those organizations this data is intact and making that part of this next-generation data warehouse architecture is a big part of the initiatives and we play a very strong core role in this bridging the gap between mainframes and the big data platforms because we have product offerings spanning across platforms and we are very focused on accessing and integrating data accessing and integrating in a secure way from mainframes to the big data plan one is one of the things that's the mainframe highlights kind of a dynamic in the marketplace and wrong hall customers whether they have many firms are not your customers who have mainframes they already have a ton of data their data full as we say in the cube they have a ton of data do it but they spend a lot of times you mentioned cleaning the data how do you guys specifically solve that because that's a big hurdle that they want to just put behind they want to clean fast and get on to other things yes we see a few different trends and challenges first of all from the Big Data initiatives everybody is really trying to either gain operational efficiency business agility and make use of some of the data they weren't able to make use of before and enrich this data with some of the new data sources they might be actually adding to the data pipeline or they are trying to provide new products and services to their customers so when we talk about the mainframe data it's a it's really a how you access this mainframe data in a secure way and how you make that data preparation very easy for the data scientists the data scientists are still spending close to eighty percent of their time in data preparation and if you can't think of it when we talk about the compute frameworks like spark MapReduce flink versus the technology stack technologies these should not be relevant to the data scientist they should be just worried about how do i create my data pipeline what are the new insights that I'm trying to get from this data the simplification we bring in that data cleansing and data preparation is one well we are bringing simple way to access and integrate all of the enterprise data not just the legacy mainframe and the relational data sources and also the emerging data sources with streaming data sources the messaging frameworks new data sources we also make this in a cross-platform secure way and some of the new features for example we announced where we were simply the best in terms of accessing all of the mainframe data and having this available on Hadoop and spark we now also makes park and Hadoop understand this data in its original format you do not have to change the original record format which is very important for highly regulated industries like financial services banking and insurance and health care because you want to be able to do the data sanitization and data cleansing and yet bring that mainframe data in its original format for audit and compliance reasons okay so this is this is the product i think where you were telling us earlier that you can move the processing you can move the data from the mainframe do processing at scale and at cost that's not possible or even ii is is easy on the mainframe do it on a distributed platform like a dupe it preserves its original sort of way of being encoded send it back but then there's also this new way of creating a data fabric that we were talking about earlier where it used to be sort of point-to-point from the transactional systems to the data warehouse and now we've basically got this richer fabric and your tools sitting on some technologies perhaps like spark and Kafka tell us what that world looks like and how it was different from we see a greater interest in terms of the concept of a database because some organizations call it data as a service some organizations call it a Hadoop is a service but ultimately an easy way of publishing data and making data available for both the internal clients of the organization's and external clients of the organization's so Kafka is in the center of this and we see a lot of other partners of us including hot dog vendors like Cloudera map r & Horton works as well as data bricks and confluent are really focused on creating that data bus and servicing so we play a very strong there because phase one project for these organizations how do I create this enterprise data lake or enterprise data hub that is usually the phase one project because for advanced analytics or predictive analytics or when you make an engine your mortgage application you want to be able to see that change on your mobile phone under five minutes likewise when you make a change in your healthcare coverage or telecom services you want to be able to see that under five minutes on your phone these things really require easy access to that enterprise data hub what we have we have a tool called data funnel this basically simplifies in a one click and reduces the time for creating the enterprise data hub significantly and our customers are using this to migrate and make I would not say my great access data from the database tables like db2 for example thousands of tables populating an automatically mapping metadata whether that metadata is higher tables or parquet files or whatever the format is going to be in the distributed platform so this really simplifies the time to create the enterprise data hub it sounds actually really interesting when I'm hearing what you're saying the first sort of step was create this this data lake lets you know put data in there and start getting our feet wet and learning new analysis patterns but what if I'm hearing you correctly you're saying now radiating out of that is a new sort of data backbone that's much lower latency that gets data out of the analytic systems perhaps back into the operational systems or into new systems at a speed that we didn't do before so that we can now make decisions or or do an analysis and make decisions very quickly yes that's true basically operational intelligence and mathematics are converging okay and in that convergence what we are basically seeing is that I'm analyzing security data I'm analyzing telemetry data that's a streamed and I want to be able to react as fast as possible and some of the interest in the emerging computer platforms is really driven by this they eat the use case right many of our customers are basic saying that today operating under five minutes is enough for me however I want to be prepared I want to future-proof my applications because in a year it might be that I have to respond under a minute even in sub seconds when they talk about being future proofed and you mentioned to time you know time sort of brackets on either end our customers saying they're looking at a speed that current technologies don't support in other words are they evaluating some things that are you know essentially research projects right now you know very experimental or do they see a set of technologies that they can pick and choose from to serve those different latency needs we published a Hadoop survey earlier this year in january according to the results from that Hadoop survey seventy percent of the respondents were actually evaluating spark and this is very confused consistent with our customer base as well and the promise of spark is driven by multiple use cases and multiple workload including predictive analytics and streaming analytics and bat analytics all of these use cases being able to run on the same platform and all of the Hadoop vendors are also supporting this so we see as our customer base are heavy enterprise customers they are in production already in Hadoop so running spark on top of their Hadoop cluster is one way they are looking for future proofing their applications and this is where we also bring value because we really abstract that insulate the user while we are liberating all of the data from the enterprise whether it's on the relational legis data warehouse or it's on the mainframe side or it's coming from new web clients we are also helping them insulate their applications because they don't really need to worry about what's the next compute framework that's going to be the fastest most reliable and low latency they need to focus on the application layer they need to focus on creating that data pipeline today I want to ask you about the state of syncsort you guys have been great success with the mainframe this concept of data funneling or you can bring stuff in very fast new management new ownership what's the update on the market dynamics because now ingestion zev rethink data sources how do you guys view what's the plan for syncsort going forward share with the folks out there sure our new investors clearlake capital is very supportive of both organic and inorganic growth so acquisitions are one of the areas for us we plan to actually make one or two acquisitions this year and companies with the products in the near adjacent markets are real value add for us so that's one area in addition to organic growth in terms of the organic growth our investments are really we have been very successful with a lot of organizations insurance financial services banking and healthcare many many of the verticals very successful with helping our customers create the enterprise data hub integrate access all of the data integrated and now carrying them to the next generating generation frameworks those are the areas that we have been partnering with them the next is for us is really having streaming data sources as well as batch data sources through the single data pipeline and this includes bringing telemetry data and security data to the advanced analytics as well okay so it sounds like you're providing a platform that can handle the today's needs which were mostly batch but the emerging ones which are streaming and so you've got that sort of future proofing that customers are looking for once they've got that those types of data coming together including stuff from the mainframe that they want might want to enrich from public sources what new things do you see them doing predictive analytics and machine learning is a big part of this because ultimately once there are different phases right operational efficiency phase was the low-hanging fruit for many organizations I want to understand what I can do faster and serve my clients faster and create that operational efficiency in a cost-effective scalable way second was what our new for go to market opportunities with transformative applications what can I do by recognizing how my telco customers are interacting with the SAS services to help and how like under a couple of minutes I react to their responses or cell service is the second one and then the next phase is that how do I use this historical data in addition to the streaming of data rapidly I'm collecting to actually predict and prevent some of the things and this is already happening with a guy with banking for example it's really with the fraud detection a lot of predictive analysis happens so advanced analytics using AI advanced analytics using machine learning will be a very critical component of this moving forward this is really interesting because now you're honing in on a specific industry use case and something that you know every vendor is trying to sort of solve the fraud detection fraud prevention how repeatable is it across your customers is this something they have to build from scratch because there's no templates that get them fifty percent of the way there seventy percent of the way there actually there's an opportunity here because if you look at the health care or telco or financial services or insurance verticals there are repeating patterns and that one is fraud for fraud or some of the new use cases in terms of customer churn analytics or cosmetics estate so these patterns and the compliance requirements in these verticals creates an opportunity actually to come up with application applications for new companies start for new startups okay then do final question share with the folks out there to view the show right now this is ten years of Hadoop seven years of this event Big Data NYC we had a great event there New York City Silicon Valley what's the vibe here in Silicon Valley here this is one of the best events I really enjoy strata San Jose and I'm looking forward two days of keynotes and hearing from colleagues and networking with colleagues this is really the heartbeat happens because with the hadoop world and strata combined actually we started seeing more business use cases and more discussions around how to enable the business users which means the technology stack is maturing and the focus is really on the business and creating more insights and value for the businesses ten do you go to welcome to the cube thanks for coming by really appreciate it go check out our Dublin event on fourteenth of April hadoop summit will be in europe for that event of course go to SiliconANGLE TV check out our women in check every week we feature women in tech on wednesday thanks for joining us thanks for sharing the inside would sink so i really appreciate it thanks for coming by this turkey will be right back with more coverage live and Silicon Valley into the short break you

Published Date : Mar 29 2016

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
fifty percent	QUANTITY	0.99+
john furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
seventy percent	QUANTITY	0.99+
two days	QUANTITY	0.99+
San Jose	LOCATION	0.99+
one	QUANTITY	0.99+
Dave	PERSON	0.99+
John furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
ten years	QUANTITY	0.99+
telco	ORGANIZATION	0.99+
george gilbert	PERSON	0.99+
seven years	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
NYC	LOCATION	0.98+
Hadoop	TITLE	0.98+
thousands of tables	QUANTITY	0.98+
today	DATE	0.98+
under five minutes	QUANTITY	0.98+
europe	LOCATION	0.98+
Joe	PERSON	0.98+
january	DATE	0.97+
wednesday	DATE	0.97+
one click	QUANTITY	0.97+
under five minutes	QUANTITY	0.97+
under a minute	QUANTITY	0.97+
one area	QUANTITY	0.97+
phase one	QUANTITY	0.96+
earlier this year	DATE	0.96+
both	QUANTITY	0.96+
a year	QUANTITY	0.96+
second one	QUANTITY	0.96+
Dublin	LOCATION	0.95+
this year	DATE	0.95+
New York City Silicon Valley	LOCATION	0.92+
last night	DATE	0.91+
one way	QUANTITY	0.9+
a ton of data	QUANTITY	0.9+
Cloudera map r & Horton	ORGANIZATION	0.9+
a ton of data	QUANTITY	0.89+
two acquisitions	QUANTITY	0.89+
turkey	LOCATION	0.88+
one of the best events	QUANTITY	0.87+
first sort	QUANTITY	0.86+
eighty percent	QUANTITY	0.85+
mykos	PERSON	0.84+
SiliconANGLE TV	ORGANIZATION	0.83+
2016	DATE	0.82+
second	QUANTITY	0.81+
Tendu Yogurtcu	PERSON	0.81+
single data	QUANTITY	0.8+
David	PERSON	0.8+
park	TITLE	0.8+
a lot of times	QUANTITY	0.78+
10	QUANTITY	0.77+
db2	TITLE	0.76+
under a couple of minutes	QUANTITY	0.75+
areas	QUANTITY	0.71+
things	QUANTITY	0.71+
Syncsort	ORGANIZATION	0.71+
every week	QUANTITY	0.7+
one of	QUANTITY	0.7+
sv	EVENT	0.69+
of April	DATE	0.69+
first	QUANTITY	0.68+
#BigDataSV	EVENT	0.66+
themes	QUANTITY	0.66+
spark	ORGANIZATION	0.65+
summit	EVENT	0.62+
Kafka	TITLE	0.62+
every vendor	QUANTITY	0.61+
use	QUANTITY	0.6+
Big Data	EVENT	0.6+
MapReduce	TITLE	0.55+

Jim Campigli, WANdisco - #BigDataNYC 2015 - #theCUBE

>> Live from New York. It's The Cube, covering Big Data NYC 2015. Brought to you by Horton Works, IBM, EMC, and Pivotal. Now for your hosts, John Furrier and Dave Vellante. >> Hello, everyone. Welcome back to live in New York City for the Cube. A special big data [inaudible 00:00:27] our flagship program will go out to the events. They expect a [Inaudible 00:00:30] We are here live as part of Strata Hadoop Big Data NYC. I'm John Furrier. My co-host, Dave Vellante. Our next guest is Jim Campigli, the Chief Product Officer at WANdisco. Welcome back to The Cube. Great to see you. >> Thanks, great to be here. >> You've been COO of WANdisco, head of marketing, now Chief Product Officer for a few years. You guys have always had the patent. David was on earlier. I asked him specifically, why doesn't the other guys just do what you do? I wanted you to comment deeper on that because he had a great answer. He said, patents. But you guys do something that's really hard that people can't do. >> Right. >> So let's get into it because Fusion is a big announcement you guys made. Big deal with EMC, lot of traction with that, and it's one of these things that is kind of talked about, but not talked about. It's really a big deal, so what is the reason why you guys are so successful on the product side? >> Well I think, first of all, it starts with the technology that we have patented, and it's this true active active replication capability that we have. Other software products claim to have active active replication, but when you drill down on what they're really doing, typically, what's happening is they'll have a set of servers that they replicate across, and you can write a transaction at any server, but then that server is responsible for propagating it to all of the other servers in the implementation. There's no mechanism for pre-agreeing to that transaction before it's actually written, so there's no way to avoid conflicts up front, there's no way to effectively handle scenarios where some of the servers in the implementation go down while the replication is in process, and very frequently, those solutions end up requiring administrators to do periodic resynchronization, go back and manually find out what didn't take, and deal with all the deltas, whereas we offer guaranteed consistency. And effectively what happens is with us, you can write at any server as well, but the difference is we go through a peer-to-peer agreement process, and once a quorum of the servers in the implementation agree to the transaction, they all accept it, and we make sure everything is written in the same order on every server. And every server knows the last good transaction it processed, so if it goes down at some point in time, as soon as it comes back up, it can grab all the transactions it missed during that time slice while it was offline, resync itself automatically without an administrator having to do anything. And you can use that feature not only for network and server outages that cause downtime, but even for planned maintenance, which is one of the biggest causes of Hadoop availability issues, because obviously if you've got a global appointment, when it's midnight on Sunday in the U.S., it's the start of the business day on Monday in Europe, and then it's the middle of the afternoon in Asia. So if you take Hadoop clusters down, somebody somewhere in the world is going to be going without their applications and data. >> It's interesting; I want to get your comments on this because this has a great highlight into the next conversation we've been hearing all throughout The Cube this week is analytics, outcomes. These are the kind of things that people talk about because that means there's checks being written. Hadoop is moving into production. People have done the clusters. It used to be the conversation, hey, x number of clusters, you do this, you do that, replication here and there, YARN, all these different buzz words. Really feeds and speeds. Now, Hadoop is relevant, but it's kind of invisible. It's under the hood. >> Right. >> Yet, it's part of other things in the network, so high availability, non-disruptive operations, is what our table stakes now. So I want you to talk about that nuance because that's what we're seeing as the things that are powering, as the engine of Hadoop deployments. What is that? Take us through that nuance, because that's one of the things that you guys have been doing a lot of work in that's making it reliable and stable. To actually go out and play with Hadoop, deploy it, make sure it's always on. >> Well, we really come into play when companies are moving Hadoop out of the lab and into production. When they have defined application SLAs, when they can only have so much down time, and it may be business requirements, it may be regulatory compliance issues, for example, financial services. They pretty much always have to have their data available. They have to have a solid back-up of the data. That's a hard requirement for them to put anything into production in their data centers. >> The other use case we've been hearing is okay, I've got Hadoop, I've been playing with it, now I need to scale it up big time. I need to double, triple my clusters. I have to put it with my applications. Then the conversation's, okay, wait, do I need to do more cis admin work? How do you address that particular piece because I think that's where I think Fusion comes in from how I'm reading it, but is that a Fusion value proposition? Is it a WANdisco thing, and what does the customer, and is that happening? >> Yeah, so there's actually two angles to that, and the first is how do we maintain that up-time? How do we make sure there's performance availability to meet the SLA's, the production SLA's? The active active replication that we have patents for, that I described earlier, and it's embodied in our discount distributed coordination engine, is at the core of Fusion, and once a Fusion server's installed with each of your Hadoop clusters, that active active replication capability is extended to them, and we expose that HDFS API so the client applications, Sqoop, Flume, Impala, HIVE, anything that would normally run against a Hadoop cluster, would talk through us. If it's been defined for replication, we do the active active replication of it. Pass straight through and process normally on the local cluster. So how does that address the issues you were talking about? What you're getting by default with our active active replication is effectively continuous hot back-up. That means if one cluster or an entire data center goes offline, that data exists elsewhere. Your users can fail over. They can continue accessing the data, running their applications. As soon as that cluster comes back online, it resyncs automatically. Now what's the other >> No user involvement? No admin? >> No user involvement in that. Now the only time, and this gets back into what I was talking about earlier, if I take servers offline for planned maintenance, upgrade the hardware, the operating system, whatever it may be, I can take advantage of that feature, as I was alluding to earlier. I can take the servers of the entire cluster offline, and Fusion knows the last good transactions that were processed on that cluster. As soon as the admin turns it back on, it'll resync itself automatically. So that's how you avoid down time, even for planned maintenance, if you have to take an entire location off. Now, to your other question, how do you scale this stuff up? Think about what we do. We eliminate idle standby hardware, because everything is full read write. You don't have standby read-only back-up clusters and servers when we come into the picture. So let's say we walk into an existing implementation, and they've got two clusters. One is the active cluster where everything's being written to, read from, actively being accessed by users. The other's just simply taking snapshots or periodic back-ups, or they're using dis(CP) or something else, but they really can't get full utilization out of that. We come in with our active active replication capability, and they don't have to change anything, but what suddenly happens is, as soon as they define what they want replicated, we'll replicate it for them initially to the other clusters. They don't have to pre-sync it, and the cluster that was formally for disaster recovery, for back-up, is now live and fully usable. So guess what? I'm now able to scale up to twice my original implementation by just leveraging that formally read-only back-up cluster that I was >> Is there a lot of configuration involved in that, or is it automatically? >> No, so basically what happens, again, you don't have to synchronize the clusters in advance. The way we replicate is based on this concept of folders, and you can think of a folder as basically a collection of files and subdirectories that roll up into root directories, effectively, that reflect typically particular applications that people are using with Hadoop or groups of users that have data sets that they access for their various sets of applications. And you define the replicated folders, basically a high level directory that consists of everything in it, and as soon as you do that, what we'll do automatically, in a new implementation. Let's keep it simple. Let's say you just have two clusters, two locations. We'll replicate that folder in its entirety to the target you specify, and then from that point on, we're just moving the deltas over the wire. So you don't have to do anything in advance. And then suddenly that back-up hardware is fully usable, and you've doubled the size of your implementations. You've scaled up to 2x. >> So, I mean what you're describing before, really strikes me that the way you tell the complexity of a product and the value of a product in this space is what happens when something goes wrong. >> Yep. >> That's the question you always ask. How do you recover, because recovery's a very hard thing, and your patents, you've got a lot of math inside there. >> Right. >> But you also said something that's interesting, which is you're an asset utilization play. >> Right. >> You're being able to go in relatively simply and say, okay, you've got this asset that's underutilized. I'm now going to give you back some capacity that's on the floor and take advantage of that. >> Right, and you're able to scale up without spending any more on hardware and infrastructure. >> So I'm interested in, so another company. You're now with an EMC partnership this week. And they sort of got into this way back in the mainframe days with SRDF. I always thought when I first heard about WANdisco, it's like SRDF for Hadoop, but it's active active. Then they bought that Yada Yada. >> And there's no distance limitations for their active active. >> So what's the nature of the relationship with EMC? >> Okay, so basically EMC, like the other storage vendors that want to play in the Hadoop space, expose some form of an HDFS API, and in fact, if you look at Hortonworks or Cloudera, if you go and look at Cloudera Manager, one of the things it asks you when you're installing it is are you going to run this on regular HDFS storage, effectively a bunch of commodity boxes typically, or are you going to use EMC Isilon or the various other options? And what we're able to do is replicate across Hadoop clusters running on Isilon, running on EMC ECS, running on standard HDFS, and what that allows these companies to do is without modifying those storage systems, without migrating that data off of them, incorporate it into an enterprise-wide data lake, if that's what they want to do, and selectively replicate across all of those different storage systems. It could be a mix of different Hadoop distributions. You could have replication between C/D/H, HDP, Pivotal, MapR, all of those things, including EMC Storage that I just mentioned, it was mentioned in the press release, Isilon, and ECS effectively has a Hadoop-compatible API support. And we can create in effect a single virtual cluster out of all of those different platforms. >> So is it a go-to-market relationship? Is it an OEM deal? >> Yeah, it was really born out of the fact that we have some mutual customers that want to do exactly what I just described. They have standard Hortonworks or Cloudera deployments in house. They've got data running on Isilon, and they want to deploy a data lake that includes what they've got stored on Isilon with what they've got in HDFS and Hadoop and replicate across that. >> Like onerous EMC certification process? >> Yeah, we went through that process. We actually set up environments in our labs where we had EMC, Isilon, and ECS running and did demonstration integrations, replication across Isilon to HDP to Hortonworks, Isilon to Cloudera, ECS to Isilon to HDP and Cloudera and so forth. So we did prove it out. They saw that. In fact, they lent us boxes to actually do this in our labs, so they were very motivated, and they're seeing us in some of their bigger accounts. >> Talk about the aspect of two things: non-disruptive operations, meaning I have to want to deploy stuff because now that Hadoop has a hardened top with some abstraction layer, with analytics to focus, there's a lot of work going on under the hood, and a large scale enterprise might have a zillion versions of Hadoop. They might have little Hortonworks here. They might have something over here, so there might be some diversity in the distributions. That's one thing. The other one is operational disruption. >> Right. >> What do you guys do there? Is it zero disruption, and how do you deal with multiple versions of the distro? >> Okay, so basically what we're doing, the simplest way to describe it is we're providing a common API across all of these different distributions, running on different storage platforms and so forth, so that the client applications are always interacting with us. They're not worrying about the nuances of the particular Hadoop API's that these different things expose. So we're providing a layer of abstraction effectively. So we're transparent in effect, in that sense, operationally, once we're installed. The other thing is, and I mentioned this earlier, we come in, basically, you don't have to pre-sync clusters, you don't have to make sure they're all the same versions or the same distros or any of that, just install us, select the data that you want to replicate, we'll replicate it over initially to the target clusters, and then from that point on, you just go. It just works, and we talked about the core patent for active active replication. We've got other patents that have been approved, three patents now and seven pending applications pending, that allow this active active replication to take place while servers are being added and removed from implementations without disrupting user access or running applications and so forth. >> Final question for you, sum up the show this week. What's the vibe here? What's the aroma? Is it really Hadoop next? What is the overall Big Data NYC story here in Strata Hadoop? What's the main theme that you're seeing coming out of the show? >> I think the main theme that we're starting to see, it's twofold. I think one is we are seeing more and more companies moving this into production. There's a lot of interest in Spark and the whole fast data concept, and I don't think that Spark is necessarily orthogonal to Hadoop at all. I think the two have to coexist. If you think about Spark streaming and the whole fast data concept, basically, Hadoop provides the historical data at rest. It provides the historical context. The streaming data provides the point in time information. What Spark together with Hadoop allows you to do is that real time analysis, do the real time informed decision-making, but do it within historical context instead of a single point in time vacuum. So I think what's happening, and you notice the vendors themselves aren't saying, oh it's all Spark, forget Hadoop. They're really talking about coexisting. >> Alright, Jim, from WANdisco, Chief Product Officer, really in the trenches, talking about what's under the hood and making it all scale in the infrastructure so his analysts can hit the scene. Great to see you again. Thanks for coming and sharing your insight here on The Cube. Live in New York City. We are here, day two of three days of wall-to-wall coverage of Big Data NYC in conjunction with Strata. We'll be right back with more live coverage in the moment here in New York City after this short break.

Published Date : Oct 6 2015

SUMMARY :

Brought to you by Horton New York City for the Cube. You guys have always had the patent. on the product side? and once a quorum of the servers These are the kind of things because that's one of the things back-up of the data. and is that happening? So how does that address the issues and the cluster that was and you can think of a folder really strikes me that the way you tell That's the question you always ask. But you also said that's on the floor and Right, and you're able to scale up in the mainframe days with SRDF. And there's no distance limitations one of the things it asks you born out of the fact and Cloudera and so forth. diversity in the distributions. so that the client applications What is the overall Big Data NYC story and the whole fast data concept, in the infrastructure

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Jim	PERSON	0.99+
Jim Campigli	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
WANdisco	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
U.S.	LOCATION	0.99+
New York	LOCATION	0.99+
John Furrier	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
New York City	LOCATION	0.99+
two locations	QUANTITY	0.99+
Strata Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
Pivotal	ORGANIZATION	0.99+
one	QUANTITY	0.99+
two things	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
two	QUANTITY	0.99+
two clusters	QUANTITY	0.99+
three days	QUANTITY	0.99+
Monday	DATE	0.99+
three patents	QUANTITY	0.98+
this week	DATE	0.98+
seven pending applications	QUANTITY	0.98+
two angles	QUANTITY	0.98+
two clusters	QUANTITY	0.98+
Spark	TITLE	0.97+
this week	DATE	0.97+
one cluster	QUANTITY	0.97+
00:00:30	DATE	0.95+
ECS	TITLE	0.95+
HDP	ORGANIZATION	0.94+
Cloudera Manager	TITLE	0.94+
single point	QUANTITY	0.94+
#BigDataNYC	EVENT	0.94+
each	QUANTITY	0.94+
Impala	TITLE	0.93+
NYC	LOCATION	0.93+
twofold	QUANTITY	0.93+
Strata	ORGANIZATION	0.92+
Flume	TITLE	0.92+
00:00:27	DATE	0.92+
Sqoop	TITLE	0.92+
Fusion	TITLE	0.91+
Isilon	ORGANIZATION	0.89+
Cloudera	ORGANIZATION	0.89+
midnight	DATE	0.89+
Sunday	DATE	0.88+
Isilon	TITLE	0.88+
single	QUANTITY	0.88+
HIVE	TITLE	0.87+
one thing	QUANTITY	0.83+
double	QUANTITY	0.83+

George Mathew, Alteryx - BigDataSV 2014 - #BigDataSV #theCUBE

>>The cube at big data SV 2014 is brought to you by headline sponsors. When disco we make Hadoop invincible and Aptean accelerating big data, 2.0, >>Okay. We're back here, live in Silicon valley. This is big data. It has to be, this is Silicon England, Wiki bonds, the cube coverage of big data in Silicon valley and all around the world covering the strata conference. All the latest news analysis here in Silicon valley, the cube was our flagship program about the events extract the signal from noise. I'm John furrier, the founders of looking angle. So my co-host and co-founder of Wiki bond.org, Dave Volante, uh, George Matthew CEO, altruist on the cube again, back from big data NYC just a few months ago. Um, our two events, um, welcome back. Great to be here. So, um, what fruit is dropped into the blend or the change, the colors of the big data space this this time. So we were in new Yorkers. We saw what happened there. A lot of talk about financial services, you know, big business, Silicon valley Kool-Aid is more about innovation. Partnerships are being formed, channel expansion. Obviously the market's hot growth is still basing. Valuations are high. What's your take on the current state of the market? >>Yeah. Great question. So John, when we see this market today, I remember even a few years ago when I first visited the cave, particularly when it came to a deep world and strata a few years back, it was amazing that we talked about this early innings of a ballgame, right? We said it was like, man, we're probably in the second or third inning of this ball game. And what has progressed particularly this last few years has been how much the actual productionization, the actual industrialization of this activity, particularly from a big data analytics standpoint has merged. And that's amazing, right? And in a short span, two, three years, we're talking about technologies and capabilities that were kind of considered things that you play with. And now these are things that are keeping the lights on and running, you know, major portions of how better decision-making and analytics are done inside of organizations. So I think that industrialization is a big shift forward. In fact, if you've listened to guys like Narendra Mulani who runs most of analytics at Accenture, he'll actually highlight that as one of the key elements of how not only the transformation is occurring among organizations, but even the people that are servicing a large companies today are going through this big shift. And we're right in the middle of it. >>We saw, you mentioned a censure. We look at CSC, but service mesh and the cloud side, you seeing the consulting firms really seeing build-out mandates, not just POC, like let's go and lock down now for the vendors. That means is people looking for reference accounts right now? So to me, I'm kind of seeing the tea leaves say, okay, who's going to knock down the reference accounts and what is that going to look like? You know, how do you go in and say, I'm going to tune up this database against SAP or this against that incumbent legacy vendor with this new scale-out, all these things are on in play. So we're seeing that, that focus of okay, tire kicking is over real growth, real, real referenceable deployments, not, not like a, you know, POC on steroids, like full on game-changing deployments. Do you see that? And, and if you do, what versions of that do you seeing happening and what ending of that is that like the first pitch of the sixth inning? Uh, w what do you, how would you benchmark that? >>Yeah, so I, I would say we're, we're definitely in the fourth or fifth inning of a non ballgame now. And, and there's innings. What we're seeing is I describe this as a new analytic stack that's emerged, right? And that started years ago when particularly the major Hadoop distro vendors started to rethink how data management was effectively being delivered. And once that data management layer started to be re thought, particularly in terms of, you know, what the schema was on read what the ability to do MPP and scale-out was in terms of how much cheaper it is to bring storage and compute closer to data. What's now coming above that stack is, you know, how do I blend data? How do I be able to give solutions to data analysts who can make better decisions off of what's being stored inside of that petabyte scale infrastructure? So we're seeing this new stack emerge where, you know, Cloudera Hortonworks map are kind of that underpinning underlying infrastructure where now our based analytics that revolution provides Altrix for data blending for analytic work, that's in the hands of data analysts, Tableau for visual analysis and dashboarding. Those are basically the solutions that are moving forward as a capability that are package and product. >>Is that the game-changing feature right now, do you think that integration of the stack, or is that the big, game-changer this sheet, >>That's the hardening that's happening as we speak right now, if you think about the industrialization of big data analytics that, you know, as I think of it as the fourth or fifth inning of the ballgame, that hardening that ability to take solutions that either, you know, the Accentures, the KPMGs, the Deloitte of the world deliver to their clients, but also how people build stuff internally, right? They have much better solutions that work out of the box, as opposed to fumbling with, you know, things that aren't, you know, stitched as well together because of the bailing wire and bubblegum that was involved for the last few years. >>I got it. I got to ask you, uh, one of the big trends you saw in certainly in the tech world, you mentioned stacks, and that's the success of Amazon, the cloud. You're seeing integrated stacks being a key part of the, kind of the, kind of the formation of you said hardening of the stack, but the word horizontally scalable is a term that's used in a lot of these open source environments, where you have commodity hardware, you have open source software. So, you know, everything it's horizontally scalable. Now, that's, that's very easy to envision, but thinking about the implementation in an enterprise or a large organization, horizontally scalable is not a no brainer. What's your take on that. And how does that hyperscale infrastructure mindset of scale-out scalable, which is a big benefit of the current infrastructure? How does that fit into, into the big day? >>Well, I think it fits extremely well, right? Because when you look at the capabilities of the last, as we describe it stack, we almost think of it as vertical hardware and software that's factually built up, but right now, for anyone who's building scale in this world, it's all about scale-out and really being able to build that stack on a horizontal basis. So if you look at examples of this, right, say for instance, what a cloud era recently announced with their enterprise hub. And so when you look at that capability of the enterprise data hub, a lot of it is about taking what yarn has become as a resource manager. What HDFS has been ACOM as a scale-out storage infrastructure, what the new plugin engines have merged beyond MapReduce as a capability for engines to come into a deep. And that is a very horizontal description of how you can do scale out, particularly for data management. >>When we built a lot of the work that was announced at strata a few years ago, particularly around how the analytics architecture for Galerie, uh, emerged at Altryx. Now we have hundreds of, of apps, thousands of users in that infrastructure. And when we built that out was actually scaling out on Amazon where the worker nodes and the capability for us to manage workload was very horizontal built out. If you look at servers today of any layer of that stack, it is really about that horizontal. Scale-out less so about throwing more hardware, more, uh, you know, high-end infrastructure at it, but more about how commodity hardware can be leveraged and use up and down that stack very easily. So Georgia, >>I asked you a question, so why is analytics so hard for so many companies? Um, and you've been in this big data, we've been talking to you since the beginning, um, and when's it going to get easier? And what are you guys specifically doing? You know, >>So facilitate that. Sure. So a few things that we've seen to date is that a lot of the analytics work that many people do internal and external to organizations is very rote, hand driven coding, right? And I think that's been one of the biggest challenges because the two end points in analytics have been either you hard code stuff that you push into a, you know, a C plus plus or a Java function, and you push it into database, or you're doing lightweight analytics in Excel. And really there needs to be a middle ground where someone can do effective scale-out and have repeatability in what's been done and ease of use. And what's been done that you don't have to necessarily be a programmer and Java programmer in C plus plus to push an analytic function and database. And you certainly don't have to deal with the limitations of Excel today. >>And really that middle ground is what Altryx serves. We look at it as an opportunity for analysts to start work with a very repeatable re reasonable workflow of how they would build their initial constructs around an analytic function that they would want to deploy. And then the scale-out happens because all of the infrastructure works on that analyst behalf, whether that be the infrastructure on Hadoop, would that be the infrastructure of the scale out of how we would publish an analytic function? Would that be how the visualizations would occur inside of a product like Tableau? And so that, I think Dave is one of the biggest things that needs to shift over where you don't have the only options in front of you for analytics is either Excel or hard coding, a bunch of code in C plus plus, or Java and pushing it in database. Yeah. >>And you correct me if I'm wrong, but it seems to be building your partnerships and your ecosystem really around driving that solution and, and, and really driving a revolution in the way in which people think about analytics, >>Ease of use. The idea is that ultimately if you can't get data analysts to be able to not only create work, that they can actually self-describe deploy and deliver and deliver success inside of an organization. And scale that out at the petabyte scale information that exists inside of most organizations you fail. And that's the job of folks like ourselves to provide great software. >>Well, you mentioned Tableau, you guys have a strong partnership there, and Christian Chabot, I think has a good vision. And you talked about sort of, you know, the, the, the choices of the spectrum and neither are good. Can you talk a little bit more about that, that, that partnership and the relationship and what you guys are doing together? Yeah. >>Uh, I would say Tableau's our strongest and most strategic partner today. I mean, we were diamond sponsors of their conference. I think I was there at their conference when I was on the cube the time before, and they are diamond sponsors of our conference. So our customers and particular users are one in the same for Tablo. It really becomes a, an experience around how visual analysis and dashboard, and can be very easily delivered by data analysts. And we think of those same users, the same exact people that Tablo works with to be able to do data blending and advanced analytics. And so that's why the two software products, that's why the two companies, that's where our two customer bases are one in the same because of that integrated experience. So, you know, Tableau is basically replacing XL and that's the mission that thereafter. And we feel that anyone who wants to be able to do the first form of data blending, which I would think of as a V lookup in Excel, should look at Altryx as a solution for that one. >>So you mentioned your conference it's inspire, right? It >>Is inspiring was coming up in June, >>June. Yeah. Uh, how many years have you done inspire? >>Inspire is now in its fifth year. And you're gonna bring the >>Cube this year. Yeah. >>That would be great. You guys, yeah, that would be fun. >>You should do it. So talk about the conference a little bit. I don't know much about it, but I mean, I know of it. >>Yeah. It's very centered around business users, particularly data analysts and many organizations that cut across retail, financial services, communications, where companies like Walmart at and T sprint Verizon bring a lot of their underlying data problems, underlying analytic opportunities that they've wrestled with and bring a community together this year. We're expecting somewhere in the neighborhood of 550 600 folks attending. So largely to, uh, figure out how to bring this, this, uh, you know, game forward, really to build out this next rate analytic capability that's emerging for most organizations. And we think that that starts ultimately with data analysts. All right. We think that there are well over two and a half million data analysts that are underserved by the current big data tools that are in this space. And we've just been highly focused on targeting those users. And so far, it's been pretty good at us. >>It's moving, it's obviously moving to the casual user at some levels, but I ended up getting there not soon, but I want to, I want to ask you the role of the cloud and all this, because when you have underneath the hood is a lot of leverage. You mentioned integrates that's when to get your perspective on the data cloud, not data cloud is it's putting data in the cloud, but the role of cloud, the role of dev ops that intersection, but you're seeing dev ops, you know, fueling a lot of that growth, certainly under the hood. Now on the top of the stack, you have the, I guess, this middle layer for lack of a better description, I'm of use old, old metaphor developing. So that's the enablement piece. Ultimately the end game is fully turnkey, data science, personalization, all that's, that's the holy grail. We all know. So how do you see that collision with cloud and the big, the big data? >>Yeah. So cloud is basically become three things for a lot of folks in our space. One is what we talked about, which is scale up and scale out, uh, is something that is much more feasible when you can spin up and spin down infrastructure as needed, particularly on an elastic basis. And so many of us who built our solutions leverage Amazon being one of the most defacto solutions for cloud based deployment, that it just makes it easy to do the scale-out that's necessary. This is the second thing it actually enables us. Uh, and many of our friends and partners to do is to be able to bring a lower cost basis to how infrastructure stood up, right? Because at the end of the day, the challenge for the last generation of analytics and data warehousing that was in this space is your starting conversation is two to $3 million just in infrastructure alone before you even buy software and services. >>And so now if you can rent everything that's involved with the infrastructure and the software is actually working within days, hours of actually starting the effort, as opposed to a 14 month life cycle, it's really compressing the time to success and value that's involved. And so we see almost a similarity to how Salesforce really disrupted the market. 10 years ago, I happened to be at Salesforce when that disruption occurred and the analytics movement that is underway really impacted by cloud. And the ability to scale out in the cloud is really driving an economic basis. That's unheard of with that >>Developer market, that's robust, right? I mean, you have easy kind of turnkey development, right? Tapping >>It is right, because there's a robust, uh, economy that's surrounding the APIs that are now available for cloud services. So it's not even just at the starting point of infrastructure, but there's definite higher level services where all the way to software as industry, >>How much growth. And you'll see in those, in that, as that, that valley of wealth and opportunity that will be created from your costs, not only for the companies involved, but the company's customers, they have top line focus. And then the goal of the movement we've seen with analytics is you seeing the CIO kind of with less of a role, more of the CEO wants to the chief data officer wants most of the top line drivers to be app focused. So you seeing a big shift there. >>Yeah. I mean, one of the, one of the real proponents of the cloud is now the fact that there is an ability for a business analyst business users and the business line to make impacts on how decisions are done faster without the infrastructure underpinnings that were needed inside the four walls in our organization. So the decision maker and the buyer effectively has become to your point, the chief analytics officer, the chief marketing officer, right. Less so that the chief information officer of an organization. And so I think that that is accelerating in a tremendous, uh, pace, right? Because even if you look at the statistics that are out there today, the buying power of the CMO is now outstrip the buying power of the CIO, probably by 1.2 to 1.3 X. Right. And that used to be a whole different calculus that was in front of us before. So I would see that, uh, >>The faster, so yeah, so Natalie just kind of picked this out here real time. So you got it, which we all know, right. I went to the it world for a long time service, little catalog. Self-service, you know, Sarah's already architectures whatever you want to call it, evolve in modern era. That's good. But on the business side, there's still a need for this same kind of cataloguing of tooling platform analytics. So do you agree with that? I mean, do you see that kind of happening that way, where there's still some connection, but it's not a complete dependency. That's kind of what we're kind of rethinking real time you see that happen. >>Yeah. I think it's pretty spot on because when you look at what businesses are doing today, they're selecting software that enables them to be more self-reliant the reason why we have been growing as much among business analysts as we have is we deliver self-reliance software and in some way, uh, that's what tablet does. And so the, the winners in this space are going to be the ones that will really help users get to results faster for self-reliance. And that's, that's really what companies like Altrix Stanford today. >>So I want to ask you a follow up on that CMOs CIO discussion. Um, so given that, that, that CMOs are spending a lot more where's the, who owns the data, is that, is we, we talk, well, I don't know if I asked you this before, but do you see the role of a chief data officer emerging? And is that individual, is that individual part of the marketing organization? Is it part of it? Is it a separate parallel role? What are you, >>One of the things I will tell you is that as I've seen chief analytics and chief data officers emerge, and that is a real category entitled real deal of folks that have real responsibilities in the organization, the one place that's not is in it, which is interesting to see, right? Because oftentimes those individuals are reporting straight to the CEO, uh, or they have very close access to line of business owners, general managers, or the heads of marketing, the heads of sales. So I seeing that shift where wherever that chief data officer is, whether that's reporting to CEOs or line of business managers or general managers of, of, you know, large strategic business units, it's not in the information office, it's not in the CEO's, uh, purview anymore. And that, uh, is kind of telling for how people are thinking about their data, right? Data is becoming much more of an asset and a weapon for how companies grow and build their scale less. So about something that we just have to deal with. >>Yeah. And it's clearly emerging that role in certain industry sectors, you know, clearly financial services, government and healthcare, but slowly, but we have been saying that, >>Yeah, it's going to cross the board. Right. And one of the reasons why I wrote the article at the end of last year, I literally titled it. Uh, analytics is eating the world, is this exact idea, right? Because, uh, you have this, this notion that you no longer are locked down with data and infrastructure kind of holding you back, right? This is now much more in the hands of people who are responsible for making better decisions inside their organizations, using data to drive those decisions. And it doesn't matter the size and shape of the data that it's coming in. >>Yeah. Data is like the F the food that just spilled all over it spilled out from the truck and analytics is on the Pac-Man eating out. Sorry. >>Okay. Final question in this segment is, um, summarize big data SV for us this year, from your perspective, knowing what's going on now, what's the big game changer. What should the folks know who are watching and should take note of which they pay attention to? What's the big story here at this moment. >>There's definite swim lanes that are being created as you can see. I mean, and, and now that the bigger distribution providers, particularly on the Hadoop side of the world have started to call out what they all stand for. Right. You can tell that map are, is definitely about creating a fast, slightly proprietary Hadoop distro for enterprise. You can tell that the folks at cloud era are focusing themselves on enterprise scale and really building out that hub for enterprise scale. And you can tell Horton works is basically embedding, enabling an open source for anyone to be able to take advantage of. And certainly, you know, the previous announcements and some of the recent ones give you an indicator of that. So I see the sense swimlanes forming in that layer. And now what is going to happen is that focus and attention is going to move away from how that layer has evolved into what I would think of as advanced analytics, being able to do the visual analysis and blending of information. That's where the next, uh, you know, battle war turf is going to be in particularly, uh, the strata space. So we're, we're really looking forward to that because it basically puts us in a great position as a company and a market leader in particularly advanced analytics to really serve customers in how this new battleground is emerging. >>Well, we really appreciate you taking the time. You're an awesome guest on the queue biopsy. You know, you have a company that you're running and a great team, and you come and share your great knowledge with our fans and an audience. Appreciate it. Uh, what's next for you this year in the company with some of your goals, let's just share that. >>Yeah. We have a few things that are, we mentioned a person inspired coming up in June. There's a big product release. Most of our product team is actually here and we have a release coming up at the beginning of Q2, which is Altryx nine oh. So that has quite a bit involved in it, including expansion of connectivity, uh, being able to go and introduce a fair degree of modeling capability so that the AR based modeling that we do scales out very well with revolution and Cloudera in mind, as well as being able to package into play analytic apps very quickly from those data analysts in mind. So it's, uh, it's a release. That's been almost a year in the works, and we're very much looking forward to a big launch at the beginning of Q2. >>George, thanks so much. You got inspire coming out. A lot of great success as a growing market, valuations are high, and the good news is this is just the beginning, call it mid innings in the industry, but in the customers, I call the top of the first lot of build-out real deployment, real budgets, real deal, big data. It's going to collide with cloud again, and I'm going to start a load, get a lot of innovation all happening right here. Big data SV all the big data Silicon valley coverage here at the cube. I'm Jennifer with Dave Alonzo. We'll be right back with our next guest. After the short break.

Published Date : Feb 15 2014

SUMMARY :

The cube at big data SV 2014 is brought to you by headline sponsors. A lot of talk about financial services, you know, big business, Silicon valley Kool-Aid is of the key elements of how not only the transformation is occurring among organizations, We look at CSC, but service mesh and the cloud side, you seeing the consulting that stack is, you know, how do I blend data? That's the hardening that's happening as we speak right now, if you think about the industrialization kind of the, kind of the formation of you said hardening of the stack, but the word horizontally And that is a very horizontal description of how you can do scale out, particularly around how the analytics architecture for Galerie, uh, been one of the biggest challenges because the two end points in analytics have been either you hard code stuff that have the only options in front of you for analytics is either Excel or And that's the job of folks like ourselves to provide great software. And you talked about sort of, you know, the, the, the choices of the spectrum and neither are So, you know, Tableau is basically replacing XL and that's the mission that thereafter. And you're gonna bring the Cube this year. That would be great. So talk about the conference a little bit. this, uh, you know, game forward, really to build out this next rate analytic capability that's the stack, you have the, I guess, this middle layer for lack of a better description, I'm of use old, Because at the end of the day, the challenge for the last generation of analytics And the ability to scale out in the cloud is really driving an economic basis. So it's not even just at the starting point of infrastructure, And then the goal of the movement we've seen with analytics is you seeing Less so that the chief information officer of an organization. of rethinking real time you see that happen. the winners in this space are going to be the ones that will really help users get to is that individual part of the marketing organization? One of the things I will tell you is that as I've seen chief analytics and chief data officers you know, clearly financial services, government and healthcare, but slowly, but we have been And one of the reasons why I wrote the article the Pac-Man eating out. What's the big story here at this moment. and some of the recent ones give you an indicator of that. Well, we really appreciate you taking the time. a fair degree of modeling capability so that the AR based modeling that we do scales and the good news is this is just the beginning, call it mid innings in the industry, but in the customers,

ENTITIES

Entity	Category	Confidence
Walmart	ORGANIZATION	0.99+
Dave Alonzo	PERSON	0.99+
Jennifer	PERSON	0.99+
two	QUANTITY	0.99+
George Mathew	PERSON	0.99+
Narendra Mulani	PERSON	0.99+
Dave	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
two companies	QUANTITY	0.99+
Excel	TITLE	0.99+
Dave Volante	PERSON	0.99+
June	DATE	0.99+
Natalie	PERSON	0.99+
14 month	QUANTITY	0.99+
Accentures	ORGANIZATION	0.99+
John	PERSON	0.99+
George	PERSON	0.99+
two events	QUANTITY	0.99+
fourth	QUANTITY	0.99+
Silicon valley	LOCATION	0.99+
Deloitte	ORGANIZATION	0.99+
Tableau	ORGANIZATION	0.99+
Accenture	ORGANIZATION	0.99+
KPMGs	ORGANIZATION	0.99+
George Matthew	PERSON	0.99+
Java	TITLE	0.99+
C plus plus	TITLE	0.99+
fifth year	QUANTITY	0.99+
John furrier	PERSON	0.99+
one	QUANTITY	0.99+
$3 million	QUANTITY	0.99+
second	QUANTITY	0.99+
C plus plus	TITLE	0.99+
1.2	QUANTITY	0.99+
Salesforce	ORGANIZATION	0.99+
NYC	LOCATION	0.99+
sixth inning	QUANTITY	0.99+
Tableau	TITLE	0.99+
first	QUANTITY	0.99+
three things	QUANTITY	0.99+
Altrix Stanford	ORGANIZATION	0.99+
Sarah	PERSON	0.99+
One	QUANTITY	0.99+
Wiki bonds	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
this year	DATE	0.98+
550 600 folks	QUANTITY	0.98+
fifth	QUANTITY	0.98+
two customer	QUANTITY	0.98+
Tablo	ORGANIZATION	0.98+
Christian Chabot	PERSON	0.98+
thousands of users	QUANTITY	0.98+
fifth inning	QUANTITY	0.98+
today	DATE	0.98+
10 years ago	DATE	0.98+
T sprint	ORGANIZATION	0.98+
2014	DATE	0.97+
Altrix	ORGANIZATION	0.97+
three years	QUANTITY	0.97+
Altryx	ORGANIZATION	0.96+
Georgia	LOCATION	0.96+
two software products	QUANTITY	0.95+
Horton	PERSON	0.94+
first form	QUANTITY	0.94+
MapReduce	TITLE	0.94+
first pitch	QUANTITY	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Horton: