Jeff Bettencourt, DataTorrent & Nathan Trueblood, DataTorrent - DataWorks Summit 2017

>> Narrator: Live, from San Jose, in the heart of Silicon Valley, it's The Cube. Covering, DataWorks Summit, 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live on day two of the DataWorks Summit. From the heart of Silicon Valley. I am Lisa Martin, my co-host is George Gilbert. We're very excited to be joined by our next guest from DataTorrent, we've got Nathan Trueblood, VP of Product, hey Nathan. >> Hi. >> Lisa: And, the man who gave me my start in high tech, 12 years ago, the SVP of Marketing, Jeff Bettencourt. Welcome, Jeff. >> Hi, Lisa, good to see ya. >> Lisa: Great to see you, too, so. Tell us about the SVP of Marketing, who is DataTorrent, what do you guys do, what are doing in the big data space? >> Jeff: So, DataTorrent is all about real time streaming. So, it's really taken a different paradigm to handling information as it comes from the different sources that are out there, so you think, big IOT, you think, all of these different new things that are creating pieces of information. It could be humans, it could be machines. Sensors, whatever it is. And taking that in realtime, rather than putting it traditionally just in a data lake and then later on coming back and investigating the data that you stored. So, we started about 2011, started by some of the early founders, people that started Yahoo. And, we're pioneers in Hadoop with Hadoop yarn. This is one of the guys here, too. And so we're all about building realtime analytics for our customers, making sure that they can get business decisions done in realtime. As the information is created. And, Nathan will talk a little bit about what we're doing on the application side of it, as well. Building these hard application pipelines for our customers to assist them to get started faster. >> Lisa: Excellent. >> So, alright, let's turn to those realtime applications. Umm, my familiarity with DataTorrent started probably about five years ago, I think, where it was, I think the position is, I don't think that there was so much talk about streaming but it was like, you know, realtime data feed, but, now we have, I mean, streaming is sort of center of gravity. Sort of, appear to big data. >> Nathan: Yeah. >> So, tell us how someone whose building apps, should think about the two solution categories how they compliment each other and what sort of applications we can build now that we couldn't build before? >> So, I think the way I look at it, is not so much two different things that compliment each other, but streaming analytics and realtime data processing and analytics is really just a natural progression of where big data has been going. So, you know, when we were at Yahoo and we're running Hadoop in scale, you know, first thing on the scene was just simply the ability to produce insight out of a massive amount of data. But then there was this constant pressure, well, okay, now we've produced that insight in a day, can you do it in an hour? You know, can you do it in half an hour? And particularly at Yahoo at the time that Ah-mol, our CTO and I were there, there was just constant pressure of can you produce insight from a huge volume of data more quickly? And, so we kind of saw at that time, two major trends. One, was that we were kind of reaching a limit of where you could go with the Hadoop and batch architecture at that time. And so a new approach was required. And that's what really was sort of, the foundation of the Apache Apex project and of DataTorrent the company, was simply realizing that a new approach was required because the more that Yahoo or other businesses can take information from the world around them and take action on that as quickly as possible, that's going to make you more competitive. So I'd look at streaming as really just a natural progression. Where, now it's possible to get inside and take action on data as close to the time of data creation as possible and if you can do that, then, you're going to be competitive. And so we see this coming across a whole bunch of different verticals. So that's how I kind of look at the sort of it's not too much complimentary, as a trend in where big data is going. Now, the kinds of things that weren't possible before this, are, you know, the kinds of applications where now you can take insight whether it's from IOD or from sensors or from retail, all the things that are going on. Whereas before, you would land this in a data lake, do a bunch of analysis, produce some insight, maybe change your behavior, but ultimately, you weren't being as responsive as you could be to customers. So now what we are seeing, why I think the center of mass is moved into realtime and streaming, is that now it's possible to, you know, give the customer an offer the second they walk into a store. Based on what you know about them and their history. This was always something that the internet properties were trying to move towards, but now we see, that same technology is being made available across a whole bunch of different verticals. A whole bunch of different industries and that's why you know, when you look at Apex and DataTorrent, we're involved not only in things like adtech, but in industrial automation and IOT, and we're involved in, you know, retail and customer 360 because in every one of these cases, insurance, finance, security and fraud prevention, it's a huge competitive advantage if you can get insight and make a decision, close to the time of the data creation. So, I think that's really where the shift is coming from. And then the other thing I would mention here, is that a big thrust of our company, and of Apache Apex and this is, so we saw streaming was going to be something that every one was going to need. The other thing we saw from our experience at Yahoo, was that, really getting something to work at a POC level, showing that something is possible, with streaming analytics is really only a small part of the problem. Being able to take and put something into production at scale and run a business on it, is a much bigger part of the problem. And so, we put into both the Apache Apex problem as well as into our product, the ability to not only get insight out of this data in motion, but to be able to put that into production at scale. And so, that's why we've had quite a few customers who have put our product, in production at scale and have been running that way, you know, in some cases for years. And so that's another sort of key area where we're forging a path, which is, it's not enough to do POC and show that something is possible. You have to be able to run a business on it. >> Lisa: So, talk to us about where DataTorrent sits within a modern data architecture. You guys are kind of playing in a couple of, integrated in a couple of different areas. What goes through what that looks like? >> So, in terms of a modern data architecture, I mean part of it is what I just covered in that, we're moving sort of from a batch to streaming world where the notion of batch is not going away, but now when you have something, you know a streaming application, that's something that's running all the time, 24/7, there's no concept of batch. Batch is really more the concept of how you are processing data through that streaming application so, what we're seeing in the modern data architecture, is that, you know, typically you have people taking data, extracting it and eventually loading it into some kind of a data lake, right? What we're doing is, shifting left of the data lake. You know, analyzing information when it's created. Produce insight from it, take action on it, and then, yes, land it in the data lake, but once you land it in the data lake, now, all of the purposes of what you're doing with that data have shifted. You know, we're producing insight, taking action to the left of the data lake and then we use that data lake to do things, like train your you know, your machine learning model that we're then going to use to the left of the data lake. Use the data lake to do slicing and dicing of your data to better understand what kinds of campaigns you want to run, things like that. But ultimately, you're using the realtime portion of this to be able to take those campaigns and then measure the impacts you're having on your customers in realtime. >> So, okay, cause that was going to be my followup question, which is, there does seem to be a role, for a historical repository for richer context. >> Nathan: Absolutely. >> And you're acknowledging that. Like, did the low legacy analytics happen first? Then, store up for a richer model, you know, later? >> Nathan: Correct. >> Umm. So, there are a couple things then that seem to be like requirements, next steps, which is, if you're doing the modeling, the research model, in the cloud, how do you orchestrate its distribution towards the sources of the realtime data, umm, and in other words, if you do training up in the cloud where you have, the biggest data or the richest data. Is DataTorrent or Apex a part of the process of orchestrating the distribution and coherence of the models that should be at the edge, or closer to where the data sources are? >> So, I guess there's a couple different ways we can think about that problem. So, you know we have customers today who are essentially providing into the streaming analytics application, you know, the models that have been trained on the data from the data lake. And, part of the approach we take in Apex and DataTorrent, is that you can reload and be changing those models all of the time. So, our architecture is such that it's full tolerant it stays up all the time so you can actually change the application and evolve it over time. So, we have customers that are reloading models on a regular basis, so that's whether it's machine learning or even just a rules engine, we're able to reload that on a regular basis. The other part of your question, if I understood you, was really about the distribution of data. And the distribution of models, and the distribution of data and where do you train that. And I think that you're going to have data in the cloud, you're going to have data on premises, you're going to have data at the edge, again, what we allow customers to do, is to be able to take and integrate that data and make decisions on it, regardless kind of where it lives, so we'll see streaming applications that get deployed into the cloud. But they may be synchronized in some portion of the data, to on premises or vis versa. So, certainly we can orchestrate all of that as part of an overall streaming application. >> Lisa: I want to ask Jeff, now. Give us a cross section of your customers. You've got customers ranging from small businesses, to fortune 10. >> Jeff: Yep. >> Give us some, kind of used cases that really took out of you, that really showcased the great potential that DataTorrent gives. >> Jeff: So if you think about the heritage of our company coming out of the early guys that were in Yahoo, adtech is obviously one that we hit hard and it's something we know how to do really really well. So, adtech is one of those things where they're constantly changing so you can take that same model and say, if I'm looking at adtech and saying, if I applied that to a distribution of products, in a manufacturing facility, it's kind of all the same type of activities, right? I'm managing a lot of inventory, I'm trying to get that inventory to the right place at the right time and I'm trying to fill that aspect of it. So that's kind of where we kind of started but we've got customers in the financial sector, right, that are really looking at instantaneous type of transactions that are happening. And then how do you apply knowledge and information to that while you're bringing that source data in so that you can make decisions. Some of those decisions have people involved with them and some of them are just machine based, right, so you take the people equation out. We kind of have this funny thing that Guy Churchward our CEO talks about, called the do loop and the do loop is where the people come in and how do we remove people out of that do loop and really make it easier for companies to act, prevent? So then if you take that aspect of it, we've got companies like in the publishing space. We've got companies in the IOT space, so they're doing interview management, stuff like that, so, we go from very you know, medium sized customers all the way up to very very large enterprises. >> Lisa: You're really turning up a variety of industries and to tech companies, because they have to be these days. >> Nathan: Right, well and one other thing I would mention, there, which is important, especially as we look at big data and a lot of customer concern about complexity. You know, I mentioned earlier about the challenge of not just coming up with an idea but being able to put that into production. So, one of the other big ares of focus for DataTorrent, as a company, is that not only have we developed platform for streaming analytics and applications but we're starting to deliver applications that you can download and run on our platform that deliver an outcome to a customer immediately. So, increasingly as we see in different verticals, different applications, then we turn those into applications we can make available to all of our customers that solve business problems immediately. One of the challenges for a long time in IT is simply how do you eliminate complexity and there's no getting away from the fact that this is big data in its complex systems. But to drive mass adoption, we're focused on how can we deliver outcomes for our customers as quickly as possible and the way to do that is by making applications available across all these different verticals. >> Well you guys, this has been so educational. We wish you guys continued success, here. It sounds like you're really being quite disruptive in an of yourselves, so if you haven't heard of them, DataTorrent.com, check them out. Nathan, Jeff, thanks so much for giving us your time this afternoon. >> Great, thanks for the opportunity. >> Lisa: We look forward to having you back. You've been watching The Cube, live from day two of the DataWorks Summit, from the heart of Silicon Valley, for my co-host George Gilbert, I'm Lisa Martin, stick around, we'll be right back. (upbeat music)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. From the heart of Silicon Valley. 12 years ago, the SVP of Marketing, Jeff Bettencourt. who is DataTorrent, what do you guys do, the data that you stored. but it was like, you know, realtime data feed, is that now it's possible to, you know, Lisa: So, talk to us about where DataTorrent Batch is really more the concept of how you are So, okay, cause that was going to be my followup question, Then, store up for a richer model, you know, later? in the cloud, how do you orchestrate its distribution and DataTorrent, is that you can reload to fortune 10. showcased the great potential that DataTorrent gives. so that you can make decisions. of industries and to tech companies, that you can download and run on our platform We wish you guys continued success, here. Lisa: We look forward to having you back.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Nathan	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jeff Bettencourt	PERSON	0.99+
Lisa	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
adtech	ORGANIZATION	0.99+
Nathan Trueblood	PERSON	0.99+
Apex	ORGANIZATION	0.99+
DataTorrent	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Guy Churchward	PERSON	0.99+
The Cube	TITLE	0.99+
half an hour	QUANTITY	0.99+
one	QUANTITY	0.99+
an hour	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two different things	QUANTITY	0.98+
One	QUANTITY	0.98+
Apache	ORGANIZATION	0.97+
today	DATE	0.97+
both	QUANTITY	0.97+
Ah-mol	ORGANIZATION	0.96+
first thing	QUANTITY	0.96+
DataTorrent.com	ORGANIZATION	0.96+
a day	QUANTITY	0.95+
Hortonworks	ORGANIZATION	0.95+
day two	QUANTITY	0.94+
12 years ago	DATE	0.93+
this afternoon	DATE	0.92+
DataWorks Summit 2017	EVENT	0.92+
2011	DATE	0.91+
first	QUANTITY	0.91+
two solution	QUANTITY	0.9+
about five years ago	DATE	0.88+
Apache Apex	ORGANIZATION	0.88+
SVP	PERSON	0.83+
Hadoop	ORGANIZATION	0.77+
two major trends	QUANTITY	0.77+
2017	DATE	0.74+
second	QUANTITY	0.68+
360	QUANTITY	0.66+
The Cube	ORGANIZATION	0.63+

Ron Bodkin, Teradata - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, It's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. We are live at the DataWorks Summit on day two. We have had a great day and a half learning a lot about the next generation of big data, machine learning, artificial intelligence, I'm Lisa Martin, and my co-host is George Gilbert. We are next joined by a CUBE alumni, Ron Bodkin, the VP and General Manager of Artificial Intelligence for Teradata. Welcome back to theCUBE! >> Well thank you Lisa, it's nice to be here. >> Yeah, so talk to us about what you're doing right now. Your keynote is tomorrow. >> Ron: Yeah. >> What are you doing, what is Teradata doing in helping customers to be able to leverage artificial intelligence? >> Sure, yeah so as you may know, I ha`ve been involved in this conference and the big data space for a long time as the founding CEO of Think Big Analytics. We were involved in really helping customers in the beginning of big data in the enterprise. And so, we are seeing a very similar trend in the space of artificial intelligence, right? The rapid advances in recent years in deep learning have opened up a lot of opportunity to really create value from all the data the customers have in their data ecosystems, right? So Teradata has a big role to play in having high quality product, Teradata database, analytic ecosystem products such as Hadoop, such as QueryGrid for connecting these systems together, right? So what we're seeing is our customers are very excited by artificial intelligence, but what we're really focused on is how do they get to the value, right? What can they do that's really going to get results, right? And we bring this perspective of having this strong solutions approach inside of Teradata, and so we have Think Big Analytics consulting for data science, we now have been building up experts in deep learning in that organization, working with customers, right? We've brought product functionality so we're innovating around how do we keep pushing the Teradata product family forward with functionality around streaming with listeners. Functionality like the ability to, how do you take GPU and start to think about how can we add that and make that deploy efficiently inside our customer's data center. How can you take advantage of innovation in open source with projects like TensorFlow and Keras becoming important for our customers. So we're seeing is a lot of customers are excited about use cases for artificial intelligence. And tomorrow in the keynote I'm going to touch on a few of them, ranging from applications like preventative maintenance, anti-fraud in banking, to e-commerce recommendations and we're seeing those are some of the examples of use cases where customers are saying hey, there's a lot of value in combining traditional machine learning, wide learning, with deep learning using neural nets to generalize. >> Help us understand if there's an arc where there's the mix of what's repeatable and what's packagable, or what's custom, how that changes over time, or whether it's just by solution. >> Yeah, it's a great question. Right, I mean I think there's a lot of infrastructure that any of these systems need to rest on. So having data infrastructure, having quality data that you can rely on is foundational, and so you need to get that installed and working well as a beginning point. Obviously having repeatable products that manage data with high SLAs and supporting not use production use, but also how do you let data scientists analyze data in a lab and make that work well. So there's that foundational data layer. Then there's the whole integration of the data science into applications, which is critical, analytics, ops, agile ways of making it possible to take the data and build repeatable processes, and those are very horizontal, right? There's some variation, but those work the same in a lot of use cases. At this stage, I'd say, in deep learning, just like in machine learning generally, you still have a lot of horizontal infrastructure. You've got Spark, you've got TensorFlow, those are support use case across many industries. But then you get to the next level, you get specific problems, and there's a lot of nuance. What modeling techniques are going to work, what data sets matter? Okay, you've got time series data and a problem like fraud. What techniques are going to make that work well? And recommendations, you may have a long tail of items to think about recommending. How do you generalize across the long tail where you can't learn. People who use some relatively small thing or go to an obscure website, or buy an obscure product, there's not enough data to say are they likely to buy something else or do something else, but how do you categorize them so you get statistical power to make useful recommendations, right? Those are things that are very specific that there's a lot of repeatability and a specific solution area of. >> This is, when you talk about the data assets that might be specific to a customer and then I guess some third party or syndicated sources. If you have an outcome in mind, but not every customer has the same inventory of data, so how do you square that circle? >> That's a great question. And I really think that's a lot of the opportunity in the enterprise of applying analytics, so this whole summit DataWorks is about hey, the power of your data. What you can get by collecting your data in a well-managed ecosystem and creating value. So, there's always a nuance. It's like what's happening in your customers, what's your business process, what's special about how you interact, what's the core of your business? So I guess my view is that the way anybody that wants to be a winner in this new digital era and have processes that take advantage of artificial intelligence is going to have to use data as a competitive advantage and build on their unique data. So because we see a lot of times enterprises struggle with this. There's a tendency to say hey, can we just buy a package off the shelf SaaS solution and do that? And for context, for things that are the same for everybody in an industry, that's a great choice. But if you're doing that for your core differentiation of your business, you're in deep trouble in this digital era. >> And that's a great place, sorry George, really quickly. That this day and age, every company is a technology company. You mentioned a use case in banking, fraud detection, which is huge. There's tremendous value that can be gleaned from artificial intelligence, and there's also tremendous risk to them. I'm curious, maybe just kind of a generalization. Where are your customers on this journey in terms of have they, are you going out to customers that have already embraced Hadoop and have a significant amount of data that they say, all right, we've got a lot of data here, we need to understand the context. Where are customers in that maturity evolution? >> Sure, so I'd say that we're really fast-approaching the slope of enlightenment for Hadoop, which is to say the enthusiasm of three years ago when people thought Hadoop was going to do everything have kind of waned and there's now more of an appreciation, like there's a lot of value in having a data warehouse for high value curated data for large-scale use. There's a lot of value in having a data lake of fairly raw data that can be used for exploration in the data science arena. So there's emerging, like what is the best architecture for streaming and how do you drive realtime decisions, and that's still very much up in the air. So I'd say that most of our customers are somewhere on that journey, I think that a lot of them have backed off from their initial ambitions that they bought a little too much of the hype of all that Hadoop might do and they're realizing what it is good for, and how they really need to build a complementary ecosystem. The other thing I think is exciting though is I see the conversation is moving from the technology to the use cases. People are a lot more excited about how can we drive value and analytics, and let's work backwards from the analytics value to the data that's going to support it. >> Absolutely. >> So building on that, we talk about sort of what's core and if you can't have something completely repeatable that's going to be core to your sustainable advantage, but if everyone is learning from data, how does a customer achieve a competitive advantage or even sustain a competitive advantage? Is it orchestrating learning that feeds, that informs processes all across the business, or is it just sort of a perpetual Red Queen effect? >> Well, that's a great question. I mean, I think there's a few things, right? There's operational excellence in every discipline, so having good data scientists, having the right data, collecting data, thinking about how do you get network effects, those are all elements. So I would say there's a table-stakes aspect that if you're not doing this, you're in trouble, but then if you are it's like how do you optimize and lift your game and get better at it? So that's an important fact that you see companies that say how do we acquire data? Like one of the things that you see digital disruptors, like a Tesla, doing is changing the game by saying we're changing the way we work with our customers to get access to the data. Think of the difference between every time you buy a Tesla you sign over the rights for them to collect and use all your data, when the traditional auto OEMs are struggling to get access to a lot of the data because they have intermediaries that control the relationship and aren't willing to share. And a similar thing in other industries, you see in consumer packaged goods. You see a lot of manufacturers there are saying how do we get partnerships, how do we get more accurate data? The old models of going out to the Nielsens of the world and saying give us aggregates, and we'll pay you a lot to give us a summary report, that's not working. How do we learn directly in a digital world about our consumers so we can be more relevant? So one of the things is definitely that control of data and access to data, as well as we see a lot of companies saying what are the acquisitions we can make? What are start ups and capabilities that we can plug in, and complement to get data, to get analytic capability that we can then tailor for our needs? >> It's funny that you mention Tesla having more cars on the road, collecting more data than pretty much anyone else at this point. But then there's like Stanford's sort of luminary for AI, Fei-Fei Li. She signed on I think with Toyota, because she said they sell 10 million cars a year, I'm going to be swimming in data compared to anyone else, possible exception of GM or maybe some Chinese manufacturer. So where does, how can you get around scale when using data at scale to inform your models? How would someone like a Tesla be able to get an end run around that? So that's the battle, the disruptor comes in, they're not at scale, but they maybe change the game in some way. Like having different terms that give them access to different kinds of data, more complete data. So that's sort of part of the answer, is to disrupt an industry you need a strategy what's different, right, like in Tesla's case an electric vehicle. And they've been investing in autonomous vehicles with AI, of course everybody in the industry is seeing that and is racing. I mean, Google really started that whole wave going a long time ago as another potential disruptor coming in with their own unique data asset. So, I think it's all about the combination of capabilities that you need. Disruptors often bring a commitment to a different business process, and that's a big challenge is a lot of times the hardest things are the business processes that are entrenched in existing organizations and disruptors can say we're rethinking the way this gets done. I mean, the example of that in ride sharing, the Ubers and Lyfts of the world, deities where they are re-conceiving what does it mean to consume automobile services. Maybe you don't want to own a car at all if you're a millennial, maybe you just want to have access to a car when you need to go somewhere. That's a good example of a disruptive business model change. >> What are some things that are on the intermediate-term horizon that might affect how you go about trying to create a sustainable advantage? And here I mean things like where deep learning might help data scientists with feature engineering so there's less need for, you can make data scientists less of a scarce resource. Or where there's new types of training for models where you need less data? Those sorts of things might disrupt the practice of achieving an advantage with current AI technology. >> You know, that's a great question. So near-term, the ability to be more efficient in data science is a big deal. There's no surprise that there's a big talent gap, big shortage of qualified data scientists in the enterprise and one of the things that's exciting is that deep learning lets you get more information out of the data, so it learns more so that you'd have to do less future engineering. It's not like a magic box you just pour in raw data to deep learning and out comes the answers, so you still need qualified data scientists, but it's a force multiplier. There's less work to do in future engineering, and therefore you get better results. So that's a factor, you're starting to see things like a hyperparameter search where people will create neural networks that search for the best machine learning model, and again get another level of leverage. Now, today doing that is very expensive. The amount of hardware to do that, very few organizations are going to spend millions of dollars to sort of automate the discovery of models, but things are moving so fast. I mean, even just in the last six weeks to have Nvidia and Google both announce significant breakthroughs in hardware. And I just had a colleague forward me a paper for recent research that says hey this technique could produce a hundred times faster results in deep learning convergence. So you've got rapid advances in investment in the hardware and the software. Historically software improvements have outstripped hardware improvements throughout the history of computing, so it's quite reasonable to expect you'll have 10 thousand times the price performance for deep learning in five years. So things that today might cost a hundred million dollars and no one would do, could cost 10 thousand dollars in five years, and suddenly it's a no-brainer to apply a technique like that to automate something instead of hiring more scarce data scientists that are hard to find, and make the data scientists more productive so they're spending more time thinking about what's going on and less time trying out different variations of how do I configure this thing, does this work, does this, right? >> Oh gosh, Ron, we could keep chatting away. Thank you so much for stopping by theCUBE again, we wish you the best of luck in your keynote tomorrow. I think people are going to be very inspired by your passion, your energy, and also the tremendous opportunity that is really sitting right in front of us. >> Thank you, Lisa, it's a very exciting time to be in the data industry, and the emergence of AI and the enterprise, I couldn't be more excited by it. >> Oh, excellent, well your excitement is palpable. We want to thank you for watching. We are live on theCUBE at the DataWorks Summit day 2, #dws17. For my cohost George Gilbert, I'm Lisa Martin, stick around. We'll be right back. (upbeat electronic melody)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at the DataWorks Summit on day two. Yeah, so talk to us about what you're doing right now. Functionality like the ability to, how do you take GPU and what's packagable, or what's custom, how that changes of infrastructure that any of these systems need to rest on. that might be specific to a customer There's a tendency to say hey, can we just buy a package are you going out to customers that have already embraced conversation is moving from the technology to the use cases. Like one of the things that you see digital disruptors, So that's sort of part of the answer, is to disrupt horizon that might affect how you go about So near-term, the ability to be more efficient we wish you the best of luck in your keynote tomorrow. and the emergence of AI and the enterprise, We want to thank you for watching.

ENTITIES

Entity	Category	Confidence
Toyota	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
George	PERSON	0.99+
Ron Bodkin	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Ron	PERSON	0.99+
San Jose	LOCATION	0.99+
five years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
10 thousand dollars	QUANTITY	0.99+
GM	ORGANIZATION	0.99+
Stanford	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Ubers	ORGANIZATION	0.99+
Think Big Analytics	ORGANIZATION	0.99+
10 thousand times	QUANTITY	0.99+
tomorrow	DATE	0.99+
one	QUANTITY	0.99+
DataWorks Summit	EVENT	0.98+
CUBE	ORGANIZATION	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
three years ago	DATE	0.98+
DataWorks Summit 2017	EVENT	0.97+
Hadoop	TITLE	0.97+
Lyfts	ORGANIZATION	0.97+
#dws17	EVENT	0.96+
10 million cars a year	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
millions of dollars	QUANTITY	0.94+
hundred times	QUANTITY	0.92+
Nielsens	ORGANIZATION	0.91+
last six weeks	DATE	0.89+
Spark	TITLE	0.88+
day two	QUANTITY	0.86+
Hortonworks	ORGANIZATION	0.83+
a hundred million dollars	QUANTITY	0.81+
Fei-Fei Li	COMMERCIAL_ITEM	0.8+
TensorFlow	TITLE	0.77+
Chinese	OTHER	0.75+
Teradata -	EVENT	0.67+
QueryGrid	ORGANIZATION	0.64+
DataWorks	ORGANIZATION	0.63+
things	QUANTITY	0.61+
a half	QUANTITY	0.55+
Keras	TITLE	0.53+
Hadoop	ORGANIZATION	0.44+
2	DATE	0.4+
day	QUANTITY	0.35+

Raj Verma, Hortonworks - DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to by Hortonworks. >> Welcome back to theCUBE, we are live, on day two of the DataWorks Summit. I'm Lisa Martin. #DWS17, join the conversation. We've had a great day and a half. We have learned from a ton of great influencers and leaders about really what's going on with big data, data science, how things are changing. My cohost is George Gilbert. We're joined by my old buddy, the COO of Hortonworks, Rajnish Verma. Raj, it's great to have you on theCUBE. >> It's great to be here, Lisa. Great to see you as well, it's been a while. >> It has, so yesterday on the customer panel, the Raj I know had great conversation with customers from, Duke Energy was one. You also had Black Knight on the financial services side. >> Rajnish: And HSC. >> Yes, on the insurance side, and one of the things that, a couple things that really caught my attention, one was when Duke said, kind of, where they were using data and moving to Hadoop, but they are now a digital company. They're now a technology company that sells electricity and products, which I thought was fantastic. Another thing that I found really interesting about that was they all talked about the need to leverage big data, and glean insights and monetize that, really requires this cultural shift. So I know you love customer interactions. Talk to us about what you're seeing. Those are three great industry examples. What are you seeing? Where are customers on this sort of maturity model where big data and Hadoop are concerned? >> Sure, happy to. So one thing that I enjoy the most about my job is meeting customers and talking to them about the art of the possible. And some of the stuff that they're doing, and, which was only science fiction, really, about two or three years ago. And they're a couple of questions that you've just asked me as to where they are on their journey, what are they trying to accomplish, et cetera. I remember about, five, seven, 10 years ago where Marc Andreessen said "Software is eating the world." And to be honest with you, now, it's now more like every company is a data company. I wouldn't say data is eating the world, but without effective monetization of your data assets, you can't be a force to reckon with as a company. So that is a common theme that we are seeing irrespective of industry, irrespective of customer, irrespective of really the size of the customer. The only thing that sort of varies is the amount and complexity of data, from one company to the other. Now, when, I'm new to Hortonworks as you know. It's really my fifth month here. And one of the things that I've seen and, Lisa, as you know, are coming from TIBCO. So we've been dealing with data. I have been involved with data for over a decade and a half now, right. So the difference was, 15 years ago, we were dealing with really structured data and we actually connected the structured data and gleaned insights into structured data. Now, today, a seminal challenge that every CIO or chief data officer is trying to solve is how do you get actionable insights into semi-structured and unstructured data. Now, so, getting insights into that data first requires ability to aggregate data, right. Once you've aggregated data, you also need a platform to make sense of data in real-time, that is being streamed at you. Now once you do those two things, then you put yourself in a position to analyze that data. So in that journey, as you asked, where our customers are. Some are defining their data aggregation strategy. The others, having defined data aggregation, they're talking about streaming analytics as a platform, and then the others are talking about data science and machine learning and deep learning, as a journey. Now, you saw the customer panel yesterday. But the one point I'd like to make is, it's not only the Duke Energies and the Black Knights of the world, or the HSC, who I believe are big, large firms that are using data. Even a company like, an old agricultural company, or I shouldn't say old but steeped in heritage is probably the right word. 96, 97 year old agricultural company that's in the animal feed business. Animal feed. Multi-billion dollar animal feed business. They use data to monetize their business model. What they say is, they've been feeding animals for the last 70 years. Sp now they go to a farmer and they have enough data about how to feed animals, that they can actually tell the farmer, that this hog that you have right now, which is 17 pounds, I can guarantee you that I will have him or her on a nutrition that, by four months, it'll be 35 pounds. How much are you willing to pay? So even in the animal feed business, data is being used to drive not only insights, but monetization models. >> Wow. >> So. >> That's outstanding. >> Thank you. >> So in getting to that level of sophistication, it's not like every firm sort of has the skills and technology in place to do that. What are some of the steps that you find that they typically have to go through to get to that level of maturity? Like, where do they make mistakes? Where do they find the skills to manage on-prem infrastructure, if it is on-premmed? What about, if they're trying to do a hybrid cloud setup. How complex is that? >> I think that's where the power of the community comes through at multiple levels. So we're committed to the open-source movement. We're committed to the community-based development of data. Now, this community-based business model does a few things. Firstly, it keeps the innovation at the leading edge, bleeding edge, number one. But as you heard the panel talk about yesterday, one of the biggest benefits that our customers see of using open source, is, sure economics is good, but that's not the leading reason. Keeping up with innovation, very high up there. Avoiding when to lock in, again very, very high up there. But one of the biggest reasons that CIOs gave me for choosing open source as a business model is more to do with the fact that they can attract good talent, and without open source, you can't actually attract talent. And I can relate to that because I have a sophomore at home. And it just happened to me that she's 15 now but she's been using open source since she was 11. The iPhone and, she downloads an application for free. She uses it, and if she stretches the limit of that, then she orders something more in a paid model. So the community helps people do a few things. Be able to fail fast if they need to. The second is, it lowers the barriers of entry, right. Because it's really free. You can have the same model. The third is, you can rely on the community for support and methodologies and best practices and lessons learned from implementations. The fourth is, it's a great hiring ground in terms of bringing people in and attracting Millennial talent, young talent, and sought-after talent. So that's really probably the answer that I would have for that. >> When you talk about the business model, the open-source business model and the attraction on the customer side, that sounded like there's this analogy with sort of the agro-business customer in the sense that there are offering data along with their traditional product. If your traditional product is open-source data management, what a room started telling us this morning was the machine learning that goes along with operating not only your own sort of internal workloads but customers, and being to offer prescriptive advice on operations, essentially IT operations. Is that the core, will that become the core of sort of value-add through data for an open-source business model like yours? >> I don't want to be speculative but I'll probably answer it another way. I think our vision, which was set by our founder Rob Bearden, and he took you guys through that yesterday, was way back when, we did say that our mission in life is to manage the world's data. So that mission hasn't changed. And the second was, we would do it as a open-source community or as a big contributing part of that community. And that has really not changed. Now, we feel that machine learning and data science and deep learning are areas that we're very, very excited about, our customers are very, very excited about. Now, the one thing that we did cover yesterday and I think earlier today as well, I'm a computer science engineer. And when I was in college, way back when, 25 years ago, I was interested in AI and ML. And it has existed for 50 years. The reason why it hasn't been available to the common man, so as to speak, is because of two reasons. One is, it did not have a source of data that it could sit on top of, that makes machine learning and AI effective. Or at least not a commercially-viable option to do so. Now, there is one. The second is, the compute power required to run some of the large algorithms that really give you insights into machine learning and AI. So we've become the platform on which customers can take advantage of excellent machine learning and AI tools to get insights. Now, that is two independent sort of categories. One is the open source community providing the platform. And then what tools the customer has used to apply data science and machine learning, so. >> So, all right. I'm thinking something that is slightly different and maybe the nuance is making it tough to articulate. But it's how can Hortonworks take the data platform and data science tools that you use to help understand how to operate important works, whether it's on a customer prem, or in the cloud. In other words, how can you use machine learning to make it a sort of a more effective and automated manage service? >> Yeah, and I think that's, the nuance's not lost in me. I think what I'm trying to sort of categorize is, for that to happen, you require two things. One is data aggregator across on-prem and cloud. Because when you have data which is multi-tenancy, you have a lot of issues with data security, data governance, all the rest of it. Now, that is what we plan to manage for the world, so as to speak. Now, on top of that, customers who require to have data science or deep learning to be used, we provide that platform. Now, whether that is used as a service by the customer, which we would be happy to provide, or it is used inhouse, on-prem, on various cloud models, that's more a customer decision. We don't want to force that decision. However, from the art of the possible perspective, yes it's possible. >> I love the mission to manage the world's data. >> Thank you. >> That's a lofty goal, but yesterday's announcements with IBM were pretty, pretty transformative. In your opinion as chief operating officer, how do you see this extension of this technology and strategic partnership helping Hortonworks on the next level of managing the world's data? >> Absolutely, it's game-changing for us. We're very, very excited. Our colleagues are very, very excited about the opportunity to partner. It's also a big validation of the fact that we now have a pretty large open-source community that contributes to this cause. So we're very excited about that. The opportunity is in actually our partnering with a leader in data science, machine learning, and AI, a company that has steeped in heritage, is known for game-changing, next technology moves. And the fact that we're powering it from a data perspective is something that we're very, very excited and pleased about. And the opportunities are limitless. >> I love that, and I know you are a game-changer, in your fifth month. We thank you so much, Raj, for joining us. It was great to see you. Continued success, >> Thank you. >> at managing the world's data and being that game-changer, yourself, and for Hortonworks as well. >> Thank you Lisa, good to see you. >> You've been watching theCUBE. Again, we're live, day two of the DataWorks Summit, #DWS17. For my cohost, George Gilbert, I'm Lisa Martin. Stick around guys, we'll be right back with more great content. (jingle)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, Raj, it's great to have you on theCUBE. Great to see you as well, it's been a while. You also had Black Knight on the financial services side. Yes, on the insurance side, and one of the things that, But the one point I'd like to make is, What are some of the steps that you find is more to do with the fact that they can attract and the attraction on the customer side, Now, the one thing that we did cover yesterday and maybe the nuance is making it tough to articulate. for that to happen, you require two things. on the next level of managing the world's data? about the opportunity to partner. I love that, and I know you are a game-changer, at managing the world's data of the DataWorks Summit, #DWS17.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Marc Andreessen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
TIBCO	ORGANIZATION	0.99+
Duke Energies	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
35 pounds	QUANTITY	0.99+
Raj	PERSON	0.99+
Rob Bearden	PERSON	0.99+
50 years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
17 pounds	QUANTITY	0.99+
fifth month	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Rajnish Verma	PERSON	0.99+
HSC	ORGANIZATION	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
15	QUANTITY	0.99+
four months	QUANTITY	0.99+
One	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Black Knights	ORGANIZATION	0.99+
Duke	ORGANIZATION	0.99+
two reasons	QUANTITY	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Firstly	QUANTITY	0.99+
second	QUANTITY	0.99+
third	QUANTITY	0.99+
one company	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
three	QUANTITY	0.98+
#DWS17	EVENT	0.98+
Multi-billion dollar	QUANTITY	0.98+
fourth	QUANTITY	0.98+
one thing	QUANTITY	0.98+
today	DATE	0.97+
15 years ago	DATE	0.97+
11	QUANTITY	0.96+
this morning	DATE	0.95+
25 years ago	DATE	0.95+
one point	QUANTITY	0.94+
day two	QUANTITY	0.93+
Rajnish	PERSON	0.93+
first	QUANTITY	0.93+
five	DATE	0.91+
three years ago	DATE	0.91+
theCUBE	ORGANIZATION	0.9+
96, 97 year old	QUANTITY	0.89+
Hortonworks - DataWorks Summit 2017	EVENT	0.87+
earlier today	DATE	0.87+
COO	PERSON	0.86+
10 years ago	DATE	0.86+
about two	DATE	0.84+
seven	DATE	0.8+
couple	QUANTITY	0.8+
Hadoop	ORGANIZATION	0.75+
over a decade and a half	QUANTITY	0.72+
last 70 years	DATE	0.69+

Josh Klahr & Prashanthi Paty | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. Day two of the DataWorks Summit, I'm Lisa Martin with my cohost, George Gilbert. We've had a great day and a half so far, learning a ton in this hyper-growth, big data world meets IoT, machine learning, data science. George and I are excited to welcome our next guests. We have Josh Klahr, the VP of Product Management from AtScale. Welcome George, welcome back. >> Thank you. >> And we have Prashanthi Paty, the Head of Data Engineering for GoDaddy. Welcome to theCUBE. >> Thank you. >> Great to have you guys here. So, wanted to kind of talk to you guys about, one, how you guys are working together, but two, also some of the trends that you guys are seeing. So as we talked about, in the tech industry, it's two degrees of Kevin Bacon, right. You guys worked together back in the day at Yahoo. Talk to us about what you both visualized and experienced in terms of the Hadoop adoption maturity cycle. >> Sure. >> You want to start, Josh? >> Yeah, I'll start, and you can chime in and correct me. But yeah, as you mentioned, Prashanthi and I worked together at Yahoo. It feels like a long time ago. In our central data group. And we had two main jobs. First job was, collect all of the data from our ad systems, our audience systems, and stick that data into a Hadoop cluster. At the time, we were kind of doing it while Hadoop was kind of being developed. And the other thing that we did was, we had to support a bunch of BI consumers. So we built cubes, we built data marts, we used MicroStrategy, Tableau, and I would say the experience there was a great experience with Hadoop in terms of the ability to have low-cost storage, scale out data processing of all of, what were really, billions and billions, tens of billions of events a day. But when it came to BI, it felt like we were doing stuff the old way. And we were moving data off cluster, and making it small. In fact, you did a lot of that. >> Well, yeah, at the end of the day, we were using Hadoop as a staging layer. So we would process a whole bunch of data there, and then we would scale it back, and move it into, again, relational stores or cubes, because basically we couldn't afford to give any accessibility to BI tools or to our end users directly on Hadoop. So while we surely did a large-scale data processing in Hadoop layer, we failed to turn on the insights right there. >> Lisa: Okay. >> Maybe there's a lesson in there for folks who are getting slightly more mature versions of Hadoop now, but can learn from also some of the experiences you've had. Were there issues in terms of, having cleaned and curated data, were there issues for BI with performance and the lack of proper file formats like Parquet? What was it that where you hit the wall? >> It was both, you have to remember this, we were probably one of the first teams to put a data warehouse on Hadoop. So we were dealing with Pig versions of like, 0.5, 0.6, so we were putting a lot of demand on the tooling and the infrastructure. Hadoop was still in a very nascent stage at that time. That was one. And I think a lot of the focus was on, hey, now we have the ability to do clickstream analytics at scale, right. So we did a lot of the backend stuff. But the presentation is where I think we struggled. >> So would that mean that you did do, the idea is that you could do full resolution without sampling on the backend, and then you would extract and presumably sort of denormalize so that you could, essentially run data match for subject matter interests. >> Yeah, and that's exactly what we did is, we took all of this big data, but to make it work for BI, which were two things, one was performance. It was really, can you get an interactive query and response time. And the other thing was the interface. Can a Tableau user connect and understand what they're looking at. You had to make the data small again. And that was actually the genesis of AtScale, which is where I am today, was, we were frustrated with this, big data platform and having to then make the data small again in order to support BI. >> That's a great transition, Josh. Let's actually talk about AtScale. You guys saw BI on Hadoop as this big white space. How have you succeeded there, and then let's talk about what GoDaddy is doing with AtScale and big data. >> Yeah, I think that we definitely learned, we took the learnings from our experience at Yahoo, and we really thought about, if we were to start from scratch, and solve the problem the way we wanted it to be solved, what would that system look like. And it was a few things. One was an interface that worked for BI. I don't want to date myself, but my experience in the software space started with OLAP. And I can tell you OLAP isn't dead. When you go and talk to an enterprise, a fortune 1000 enterprise and you talk about OLAP, that's how they think. They think in terms of measures and dimensions and hierarchies. So one important thing for us was to project an OLAP interface on top of data that's Hadoop native. It's Hive tables, Parquet, ORC, you kind of talk about all of the mess that may sit underneath the covers. So one thing was projecting that interface, the other thing was delivering performance. So we've invested a lot in using the Hadoop cluster natively to deliver performing queries. We do this by creating aggregate tables and summary tables and being smart about how we route queries. But we've done it in a way that makes a Hadoop admin very happy. You don't have to buy a bunch of AtScale servers in addition to your Hadoop cluster. We scale the way the Hadoop cluster scales. So we don't require separate technology. So we fit really nicely into that Hadoop ecosystem. >> So how do you make, making the Hadoop admin happy is a good thing. How do you make the business user happy, who needs now, as we were here yesterday, to kind of merge more with the data science folks to be able to understand or even have the chance to articulate, "These are the business outcomes "we want to look for and we want to see." How do you guys, maybe, under the hood, if you will, AtScale, make the business guys and gals happy? >> I'll share my opinion and then Prashanthi can comment on her experience but, as I've mentioned before, the business users want an interface that's simple to use. And so that's one thing we do, is, we give them the ability to just look at measures and dimensions. If I'm a business, I grew up using Excel to do my analysis. The thing I like most as an analyst is a big fat wide table. And so that's what, we make an underlying Hadoop cluster and what could be tens or hundreds of tables look like a single big fat wide table for a data analyst. You talk to a data scientist, you talk to a business analyst, that's the way they want to view the world. So that's one thing we do. And then, we give them response times that are fast. We give them interactivity, so that you could really quickly start to get a sense of the shape of the data. >> And allowing them to get that time to value. >> Yes. >> I can imagine. >> Just a follow-up on that. When you have to prepare the aggregates, essentially like the cubes, instead of the old BI tools running on a data mart, what is the additional latency that's required from data coming fresh into the data lake and then transforming it into something that's consumption ready for the business user? >> Yeah, I think I can take that. So again, if you look at the last 10 years, in the initial period, certainly at Yahoo, we just threw engineering resources at that problem, right. So we had teams dedicated to building these aggregates. But the whole premise of Hadoop was the ability to do unstructured optimizations. And by having a team find out the new data coming in and then integrating that into your pipeline, so we were adding a lot of latency. And so we needed to figure out how we can do this in a more seamless way, in a more real-time way. And get the, you know, the real premise of Hadoop. Get it at the hands of our business users. I mean, I think that's where AtScale is doing a lot of the good work in terms of dynamically being able to create aggregates based on the design that you put in the cube. So we are starting to work with them on our implementation. We're looking forward to the results. >> Tell us a little bit more about what you're looking to achieve. So GoDaddy is a customer of AtScale. Tell us a little bit more about that. What are you looking to build together, and kind of, where are you in your journey right now? >> Yeah, so the main goal for us is to move beyond predefined models, dashboards, and reports. So we want to be more agile with our schema changes. Time to market is one. And performance, right. Ability to put BI tools directly on top of Hadoop, is one. And also to push as much of the semantics as possible down into the Hadoop layer. So those are the things that we're looking to do. >> So that sounds like a classic business intelligence component, but sort of rethought for a big data era. >> I love that quote, and I feel it. >> Prashanthi: Yes. >> Josh: Yes. (laughing) >> That's exactly what we're trying to do. >> But that's also, some of the things you mentioned are non-trivial. You want to have this, time goes in to the pre-processing of data so that it's consumable, but you also wanted it to be dynamic, which is sort of a trade-off, which means, you know, that takes time. So is that a sort of a set of requirements, a wishlist for AtScale, or is that something that you're building on your own? >> I think there's a lot happening in that space. They are one of the first people to come out with their product, which is solving a real problem that we tried to solve for a long time. And I think as we start using them more and more, we'll surely be pushing them to bring in more features. I think the algorithm that they have to dynamically generate aggregates is something that we're giving quite a lot of feedback to them on. >> Our last guest from Pentaho was talking about, there was, in her keynote today, the quote from I think McKinsey report that said, "40% of machine learning data is either not fully "exploited or not used at all." So, tell us, kind of, where is big daddy regarding machine learning? What are you seeing? What are you seeing at AtScale and how are you guys going to work together to maybe venture into that frontier? >> Yeah, I mean, I think one of the key requirements we're placing on our data scientists is, not only do you have to be very good at your data science job, you have to be a very good programmer too to make use of the big data technologies. And we're seeing some interesting developments like very workload-specific engines coming into the market now for search, for graph, for machine learning, as well. Which is supposed to give the tools right into the hands of data scientists. I personally haven't worked with them to be able to comment. But I do think that the next realm on big data is this workload-specific engines, and coming on top of Hadoop, and realizing more of the insights for the end users. >> Curious, can you elaborate a little more on those workload-specific engines, that sounds rather intriguing. >> Well, I think interactive, interacting with Hadoop on a real-time basis, we see search-based engines like Elasticsearch, Solr, and there is also Druid. At Yahoo, we were quite a bit shop of Druid actually. And we were using it as an interactive query layer directly with our applications, BI applications. This is our JavaScript-based BI applications, and Hadoop. So I think there are quite a few means to realize insights from Hadoop now. And that's the space where I see workload-specific engines coming in. >> And you mentioned earlier before we started that you were using Mahout, presumably for machine learning. And I guess I thought the center of gravity for that type of analytics has moved to Spark, and you haven't mentioned Spark yet. We are not using Mahout though. I mentioned it as something that's in that space. But yeah, I mean, Spark is pretty interesting. Spark SQL, doing ETL with Spark, as well as using Spark SQL for queries is something that looks very, very promising lately. >> Quick question for you, from a business perspective, so you're the Head of Engineering at GoDaddy. How do you interact with your business users? The C-suite, for example, where data science, machine learning, they understand, we have to have, they're embracing Hadoop more and more. They need to really, embracing big data and leveraging Hadoop as an enabler. What's the conversation like, or maybe even the influence of the GoDaddy business C-suite on engineering? How do you guys work collaboratively? >> So we do have very regular stakeholder meeting. And these are business stakeholders. So we have representatives from our marketing teams, finance, product teams, and data science team. We consider data science as one of our customers. We take requirements from them. We give them peek into the work we're doing. We also let them be part of our agile team so that when we have something released, they're the first ones looking at it and testing it. So they're very much part of the process. I don't think we can afford to just sit back and work on this monolithic data warehouse and at the end of the day say, "Hey, here is what we have" and ask them to go get the insights from it. So it's a very agile process, and they're very much part of it. >> One last question for you, sorry George, is, you guys mentioned you are sort of early in your partnership, unless I misunderstood. What has AtScale help GoDaddy achieve so far and what are your expectations, say the next six months? >> We want the world. (laughing) >> Lisa: Just that. >> Yeah, but the premise is, I mean, so Josh and I, we were part of the same team at Yahoo, where we faced problems that AtScale is trying to solve. So the premise of being able to solve those problems, which is, like their name, basically delivering data at scale, that's the premise that I'm very much looking forward to from them. >> Well, excellent. Well, we want to thank you both for joining us on theCUBE. We wish you the best of luck in attaining the world. (all laughing) >> Josh: There we go, thank you. >> Excellent, guys. Josh Klahr, thank you so much. >> My pleasure. Prashanthi, thank you for being on theCUBE for the first time. >> No problem. >> You've been watching theCUBE live at the day two of the DataWorks Summit. For my cohost George Gilbert, I am Lisa Martin. Stick around guys, we'll be right back. (jingle)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. George and I are excited to welcome our next guests. And we have Prashanthi Paty, Talk to us about what you both visualized and experienced And the other thing that we did was, and then we would scale it back, and the lack of proper file formats like Parquet? So we were dealing with Pig versions of like, the idea is that you could do full resolution And the other thing was the interface. How have you succeeded there, and solve the problem the way we wanted it to be solved, So how do you make, And so that's one thing we do, is, that's consumption ready for the business user? based on the design that you put in the cube. and kind of, where are you in your journey right now? So we want to be more agile with our schema changes. So that sounds like a classic business intelligence Josh: Yes. of data so that it's consumable, but you also wanted And I think as we start using them more and more, What are you seeing at AtScale and how are you guys and realizing more of the insights for the end users. Curious, can you elaborate a little more And we were using it as an interactive query layer and you haven't mentioned Spark yet. machine learning, they understand, we have to have, and at the end of the day say, "Hey, here is what we have" you guys mentioned you are sort of early We want the world. So the premise of being able to solve those problems, Well, we want to thank you both for joining us on theCUBE. Josh Klahr, thank you so much. for the first time. of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Josh	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Josh Klahr	PERSON	0.99+
Prashanthi Paty	PERSON	0.99+
Prashanthi	PERSON	0.99+
Lisa	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Kevin Bacon	PERSON	0.99+
San Jose	LOCATION	0.99+
Excel	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
GoDaddy	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
yesterday	DATE	0.99+
AtScale	ORGANIZATION	0.99+
tens	QUANTITY	0.99+
Spark	TITLE	0.99+
Druid	TITLE	0.99+
First job	QUANTITY	0.99+
Hadoop	TITLE	0.99+
two	QUANTITY	0.99+
Spark SQL	TITLE	0.99+
today	DATE	0.99+
two degrees	QUANTITY	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two things	QUANTITY	0.98+
Elasticsearch	TITLE	0.98+
first time	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.97+
first teams	QUANTITY	0.96+
Solr	TITLE	0.96+
Mahout	TITLE	0.95+
hundreds of tables	QUANTITY	0.95+
two main jobs	QUANTITY	0.94+
One last question	QUANTITY	0.94+
billions and	QUANTITY	0.94+
McKinsey	ORGANIZATION	0.94+
Day two	QUANTITY	0.94+
One	QUANTITY	0.94+
Parquet	TITLE	0.94+
Tableau	TITLE	0.93+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Telco	ORGANIZATION	0.99+
Madhu	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jamie Engesser	PERSON	0.99+
Madhu Kochar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
second step	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
two activities	QUANTITY	0.99+
San Jose	LOCATION	0.99+
second thing	QUANTITY	0.99+
Hortonwork	ORGANIZATION	0.99+
2015	DATE	0.99+
first	QUANTITY	0.99+
first thought	QUANTITY	0.98+
two things	QUANTITY	0.98+
eight months	QUANTITY	0.98+
three things	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.97+
two next guests	QUANTITY	0.97+
both	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Apache	ORGANIZATION	0.97+

George Chow, Simba Technologies - DataWorks Summit 2017

>> (Announcer) Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi everybody, this is George Gilbert, Big Data and Analytics Analyst with Wikibon. We are wrapping up our show on theCUBE today at DataWorks 2017 in San Jose. It has been a very interesting day, and we have a special guest to help us do a survey of the wrap-up, George Chow from Simba. We used to call him Chief Technology Officer, now he's Technology Fellow, but when we was explaining the different in titles to me, I thought he said Technology Felon. (George Chow laughs) But he's since corrected me. >> Yes, very much so >> So George and I have been, we've been looking at both Spark Summit last week and DataWorks this week. What are some of the big advances that really caught your attention? >> What's caught my attention actually is how much manufacturing has really, I think, caught into the streaming data. I think last week was very notable that both Volkswagon and Audi actually had case studies for how they're using streaming data. And I think just before the break now, there was also a similar session from Ford, showcasing what they are doing around streaming data. >> And are they using the streaming analytics capabilities for autonomous driving, or is it other telemetry that they're analyzing? >> The, what is it, I think the Volkswagon study was production, because I still have to review the notes, but the one for Audi was actually quite interesting because it was for managing paint defect. >> (George Gilbert) For paint-- >> Paint defect. >> (George Gilbert) Oh. >> So what they were doing, they were essentially recording the environmental condition that they were painting the cars in, basically the entire pipeline-- >> To predict when there would be imperfections. >> (George Chow) Yes. >> Because paint is an extremely high-value sort of step in the assembly process. >> Yes, what they are trying to do is to essentially make a connection between downstream defect, like future defect, and somewhat trying to pinpoint the causes upstream. So the idea is that if they record all the environmental conditions early on, they could turn around and hopefully figure it out later on. >> Okay, this sounds really, really concrete. So what are some of the surprising environmental variables that they're tracking, and then what's the technology that they're using to build model and then anticipate if there's a problem? >> I think the surprising finding they said were actually, I think it was a humidity or fan speed, if I recall, at the time when the paint was being applied, because essentially, paint has to be... Paint is very sensitive to the condition that is being applied to the body. So my recollection is that one of the finding was that it was a narrow window during which the paint were, like, ideal, in terms of having the least amount of defect. >> So, had they built a digital twin style model, where it's like a digital replica of some aspects of the car, or was it more of a predictive model that had telemetry coming at it, and when it's an outside a certain bounds they know they're going to have defects downstream? >> I think they're still working on the predictive model, or actually the model is still being built, because they are essentially trying to build that model to figure out how they should be tuning the production pipeline. >> Got it, so this is sort of still in the development phase? >> (George Chow) Yeah, yeah >> And can you tell us, did they talk about the technologies that they're using? >> I remember the... It's a little hazy now because after a couple weeks of conference, so I don't remember the specifics because I was counting on the recordings to come out in a couples weeks' time. So I'll definitely share that. It's a case study to keep an eye on. >> So tell us, were there other ones where this use of real-time or near real-time data had some applications that we couldn't do before because we now can do things with very low latency? >> I think that's the one that I was looking forward to with Ford. That was the session just earlier, I think about an hour ago. The session actually consisted of a demo that was being done live, you know. It was being streamed to us where they were showcasing the data that was coming off a car that's been rigged up. >> So what data were they tracking and what were they trying to anticipate here? >> They didn't give enough detail, but it was basically data coming off of the CAN bus of the car, so if anybody is familiar with the-- >> Oh that's right, you're a car guru, and you and I compare, well our latest favorite is the Porche Macan >> Yes, yes. >> SUV, okay. >> But yeah, they were looking at streaming the performance data of the car as well as the location data. >> Okay, and... Oh, this sounds more like a test case, like can we get telemetry data that might be good for insurance or for... >> Well they've built out the system enough using the Lambda Architecture with Kafka, so they were actually consuming the data in real-time, and the demo was actually exactly seeing the data being ingested and being acted on. So in the case they were doing a simplistic visualization of just placing the car on the Google Map so you can basically follow the car around. >> Okay so, what was the technical components in the car, and then, how much data were they sending to some, or where was the data being sent to, or how much of the data? >> The data was actually sent, streamed, all the way into Ford's own data centers. So they were using NiFi with all the right proxy-- >> (George Gilbert) NiFi being from Hortonworks there. >> Yeah, yeah >> The Hortonworks data flow, okay >> Yeah, with all the appropriate proxys and firewall to bring it all the way into a secure environment. >> Wow >> So it was quite impressive from the point of view of, it was life data coming off of the 4G modem, well actually being uploaded through the 4G modem in the car. >> Wow, okay, did they say how much compute and storage they needed in the device, in this case the car? >> I think they were using a very lightweight platform. They were streaming apparently from the Raspberry Pi. >> (George Gilbert) Oh, interesting. >> But they were very guarded about what was inside the data center because, you know, for competitive reasons, they couldn't share much about how big or how large a scale they could operate at. >> Okay, so Simba has been doing ODBC and JDBC drivers to standard APIs, to databases for a long time. That was all about, that was an era where either it was interactive or batch. So, how is streaming, sort of big picture, going to change the way applications are built? >> Well, one way to think about streaming is that if you look at many of these APIs, into these systems, like Spark is a good example, where they're trying to harmonize streaming and batch, or rather, to take away the need to deal with it as a streaming system as opposed to a batch system, because it's obviously much easier to think about and reason about your system when it is traditional, like in the traditional batch model. So, the way that I see it also happening is that streaming systems will, you could say will adapt, will actually become easier to build, and everyone is trying to make it easier to build, so that you don't have to think about and reason about it as a streaming system. >> Okay, so this is really important. But they have to make a trade-off if they do it that way. So there's the desire for leveraging skill sets, which were all batch-oriented, and then, presumably SQL, which is a data manipulation everyone's comfortable with, but then, if you're doing it batch-oriented, you have a portion of time where you're not sure you have the final answer. And I assume if you were in a streaming-first solution, you would explicitly know whether you have all the data or don't, as opposed to late arriving stuff, that might come later. >> Yes, but what I'm referring to is actually the programming model. All I'm saying is that more and more people will want streaming applications, but more and more people need to develop it quickly, without having to build it in a very specialized fashion. So when you look at, let's say the example of Spark, when they focus on structured streaming, the whole idea is to make it possible for you to develop the app without having to write it from scratch. And the comment about SQL is actually exactly on point, because the idea is that you want to work with the data, you can say, not mindful, not with a lot of work to account for the fact that it is actually streaming data that could arrive out of order even, so the whole idea is that if you can build applications in a more consistent way, irrespective whether it's batch or streaming, you're better off. >> So, last week even though we didn't have a major release of Spark, we had like a point release, or a discussion about the 2.2 release, and that's of course very relevant for our big data ecosystem since Spark has become the compute engine for it. Explain the significance where the reaction time, the latency for Spark, went down from several hundred milliseconds to one millisecond or below. What are the implications for the programming model and for the applications you can build with it. >> Actually, hitting that new threshold, the millisecond, is actually a very important milestone because when you look at a typical scenario, let's say with AdTech where you're serving ads, you really only have, maybe, on the order about 100 or maybe 200 millisecond max to actually turn around. >> And that max includes a bunch of things, not just the calculation. >> Yeah, and that, let's say 100 milliseconds, includes transfer time, which means that in your real budget, you only have allowances for maybe, under 10 to 20 milliseconds to compute and do any work. So being able to actually have a system that delivers millisecond-level performance actually gives you ability to use Spark right now in that scenario. >> Okay, so in other words, now they can claim, even if it's not per event processing, they can claim that they can react so fast that it's as good as per event processing, is that fair to say? >> Yes, yes that's very fair. >> Okay, that's significant. So, what type... How would you see applications changing? We've only got another minute or two, but how do you see applications changing now that, Spark has been designed for people that have traditional, batch-oriented skills, but who can now learn how to do streaming, real-time applications without learning anything really new. How will that change what we see next year? >> Well I think we should be careful to not pigeonhole Spark as something built for batch, because I think the idea is that, you could say, the originators, of Spark know that it's all about the ease of development, and it's the ease of reasoning about your system. It's not the fact that the technology is built for batch, so the fact that you could use your knowledge and experience and an API that actually is familiar, should leverage it for something that you can build for streaming. That's the power, you could say. That's the strength of what the Spark project has taken on. >> Okay, we're going to have to end it on that note. There's so much more to go through. George, you will be back as a favorite guest on the show. There will be many more interviews to come. >> Thank you. >> With that, this is George Gilbert. We are DataWorks 2017 in San Jose. We had a great day today. We learned a lot from Rob Bearden and Rob Thomas up front about the IBM deal. We had Scott Gnau, CTO of Hortonworks on several times, and we've come away with an appreciation for a partnership now between IBM and Hortonworks that can take the two of them into a set of use cases that neither one on its own could really handle before. So today was a significant day. Tune in tomorrow, we have another great set of guests. Keynotes start at nine, and our guests will be on starting at 11. So with that, this is George Gilbert, signing out. Have a good night. (energetic, echoing chord and drum beat)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, do a survey of the wrap-up, What are some of the big advances caught into the streaming data. but the one for Audi was actually quite interesting in the assembly process. So the idea is that if they record So what are some of the surprising environmental So my recollection is that one of the finding or actually the model is still being built, of conference, so I don't remember the specifics the data that was coming off a car the performance data of the car for insurance or for... So in the case they were doing a simplistic visualization So they were using NiFi with all the right proxy-- to bring it all the way into a secure environment. So it was quite impressive from the point of view of, I think they were using a very lightweight platform. the data center because, you know, for competitive reasons, going to change the way applications are built? so that you don't have to think about and reason about it But they have to make a trade-off if they do it that way. so the whole idea is that if you can build and for the applications you can build with it. because when you look at a typical scenario, not just the calculation. So being able to actually have a system that delivers but how do you see applications changing now that, so the fact that you could use your knowledge There's so much more to go through. that can take the two of them

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Audi	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
George Chow	PERSON	0.99+
Ford	ORGANIZATION	0.99+
last week	DATE	0.99+
Silicon Valley	LOCATION	0.99+
one millisecond	QUANTITY	0.99+
two	QUANTITY	0.99+
next year	DATE	0.99+
100 milliseconds	QUANTITY	0.99+
200 millisecond	QUANTITY	0.99+
today	DATE	0.99+
tomorrow	DATE	0.99+
Volkswagon	ORGANIZATION	0.99+
this week	DATE	0.99+
Google Map	TITLE	0.99+
AdTech	ORGANIZATION	0.99+
DataWorks 2017	EVENT	0.98+
DataWorks Summit 2017	EVENT	0.98+
both	QUANTITY	0.98+
11	DATE	0.98+
Spark	TITLE	0.98+
Wikibon	ORGANIZATION	0.96+
under 10	QUANTITY	0.96+
one	QUANTITY	0.96+
20 milliseconds	QUANTITY	0.95+
Spark Summit	EVENT	0.94+
first solution	QUANTITY	0.94+
SQL	TITLE	0.93+
hundred milliseconds	QUANTITY	0.93+
2.2	QUANTITY	0.92+
one way	QUANTITY	0.89+
Spark	ORGANIZATION	0.88+
Lambda Architecture	TITLE	0.87+
Kafka	TITLE	0.86+
minute	QUANTITY	0.86+
Porche Macan	ORGANIZATION	0.86+
about 100	QUANTITY	0.85+
ODBC	TITLE	0.84+
DataWorks	EVENT	0.84+
NiFi	TITLE	0.84+
about an hour ago	DATE	0.8+
JDBC	TITLE	0.79+
Raspberry Pi	COMMERCIAL_ITEM	0.76+
Simba	ORGANIZATION	0.75+
Simba Technologies	ORGANIZATION	0.74+
couples weeks'	QUANTITY	0.7+
CTO	PERSON	0.68+
theCUBE	ORGANIZATION	0.67+
twin	QUANTITY	0.67+
couple weeks	QUANTITY	0.64+

Sri Raghavan, Teradata - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. (electronic music fading) >> Hi everybody, this is George Gilbert. We're watching theCUBE. We're at DataWorks 2017 with my good friend Sri Raghavan from Teradata, and Sri, let's kick this off. Tell us, bring us up to date with what Teradata's been doing in the era of big data and advanced analytics. >> First of all, George, it's always great to be back with you. I've done this before with you, and it's a pleasure coming back, and I always have fun doing this. So thanks for having me and Teradata on theCUBE. So, a lot of things have been going on at Teradata. As you know, we are the pioneer in the enterprise data warehouse space. We've been so for the past 25 plus years, and, you know, we've got an incredible amount of goodwill in the marketplace with a lot of our key customers and all that. And as you also know, in the last, you know, five or seven years or so, between five and seven years, we've actually expanded our portfolio significantly to go well beyond the enterprise data warehouse into advanced analytics. We've got solutions for the quote-unquote the big data, advanced analytics space. We've acquired organizations which have significant amount of core competence with enormous numbers of years of experience of people who can deliver us solutions and services. So it's fair to say, as an understatement, that we have, we've come a long way in terms of being a very formidable competitor in the marketplace with the kinds of, not only our core enterprise data warehouse solutions, but also advanced analytics solutions, both as products and solutions and services that we have developed over time. >> So I was at the Influencer Summit, not this year but the year before, and the thing, what struck me was you guys articulated very consistently and clearly the solutions that people build with the technology as opposed to just the technology. Let's pick one, like Customer Journey that I remember that was used last year. >> Sri: Right. >> And tell us, sort of, what are the components in it, and, sort of, what are the outcomes you get using it? >> Sure. First of all, thanks for picking on that point because it's a very important point that you mentioned, right? It's not- in today's world, it can't just be about the technology. We just can't go on and articulate things around our technology and the core competence, but we also have to make a very legitimate case for delivering solutions to the business. So, our, in fact, our motto is: Business solutions that are technology-enabled. We have a strong technology underpinning to be able to deliver solutions like Customer Journey. Let me give you a view into what Customer Journey is all about, right? So the idea of the Customer Journey, it's actually pretty straightforward. It's about being able to determine the kind of experience a customer is having as she or he engages with you across the various channels that they do business with you at. So it could be directly they come into the store, it could be online, it could be through snail mail, email, what have you. The point is not to look at Customer Journey as a set of disparate channels through which they interact with you, but to look at it holistically. Across the various areas of encounters they have with you and engagements they have with you, how do you determine what their overall experience is, and, more importantly, once you determine what their overall experience is, how can you have certain kinds of treatments that are very specific to the different parts of the experience and make their experience and engagement even better? >> Okay, so let me jump in for a second there. >> We've seen a lot of marketing automation companies come by and say, you know, or come and go having said over many generations, "We can help you track that." And they all seem to, like, target either ads or email. >> Correct. >> There's like, the touchpoints are constrained. How do you capture a broader, you know, a broader journey? >> Yeah, to me it's not just the touchpoints being constrained, although all the touchpoints are constrained. To me, it's almost as if those touchpoints are looked at very independently, and it's very orthogonal too, right? I look at only my online experience versus a store experience versus something else, right? And the assumption in most cases is that they're all not related. You know, sometimes, I may not come directly to the store, right, but the reason why I'm not coming to the store is because, to buy things, because, you know, I have seen an advertisement somewhere which says, "Look, go online and purchase a product." So whatever the case might be, the point is each part of the journey is very interrelated, and you need to understand this is as well. Now, the question that you asked is, "How do you, for instance, collect all this information? "Where do you store it?" >> George: And how do you relate it ... >> And, exactly, and how do you connect the various points of interaction, right? So for one thing, and let me just, sort of, go a little bit tangential and go into some architecture, the marchitecture, if you will that allows us to be able to, first of all, access all of this data. As you can imagine, the types and the sources of data are quite a bit, are pretty disparate, particularly as the number of channels by which you can engage with me as an organization has expanded, so do the number of sources. So, you know, we have to go to place A, where there's a lot of CRM information for instance, or place B, where it's a lot of online information, weblogs and web servers and what have you, right? So, we have to go to, for instance, some of these guys would have put all this information in a big data lake. Or they could have stored it in an EDW, in an enterprise data warehouse. So we've put in place a technology, an architecture, which allows us to be able to connect to all these various sources, be it Teradata products, or non-Terada- third-party sources, we don't care. We have the capability to connect all to, to these different data sources to be able to access information. So that's number one. Number two is how do you normalize all of this information? So as you can well imagine, right, webs logs servers are very different in their data makeup as apposed to CRM solutions, highly structured information. So we need a way to be able to bring them together, to connect a singular user ID across the different sources, so we have filtering, you know, data filters in place that extracts information from weblogs, let's say it's a XML file. So we extract all that information, and we connect it. We, ultimately, all of that information comes to you in a structured manner. >> And can it, can it be realtime reactive? In other words when- >> Sri: Absolutely. >> someone comes to- >> Sri: Absolutely. >> you know, a channel where you need to anticipate and influence. >> Very good question. In fact, I think we will be doing a big disservice to our customers if we did not have realtime decisioning in place. I mean, the whole idea is for us to be able to provide certain treatments based on what we anticipate your reactions are going to be to certain, let's say if it's a retail store, let's say to certain product coupons we've placed, which says, you know, come online, and basically behavior we think there's a 90% chance that tomorrow morning you're going to come back, you know, through our online portal and buy the products. And because of the fact that our analytics allows us to be able to predict your behavior tomorrow morning, as soon as you land on the online portal, we will be able to provide certain treatment to you that takes advantage of that. Absolutely. >> Techy question: because you're anticipating, does that mean you've done the prediction runs, batch, >> Sri: Absolutely. >> And so you're just serving up the answer. >> Yeah, the business level answer is absolutely. In fact, we have, as part of our advanced analytics solution, we have pre-built algorithms that take all this information that I've talked to you about, where it's connected all that information across the different sources, and we apply algorithms on top of that to be able to deliver predictive models. Now, these models, once they are actually applied as and when the data comes in, you know, you can operationalize them. So the thing to be very clear here, a key part of the Teradata story, is that not only are we in a position to be able to provide the infrastructure which allows you to be able to collect all the information, but we provide the analytic capabilities to be able to connect all of the data across the various sources and at scale, to do the analytics on top of all that disparate data, to deliver the model, and, as an important point, to operationalize that model, and then to connect it back in the feedback loop. We do the whole thing. >> That's, there's a lot to unpack in there, and I called our last guest dense. What I was actually trying to say, we had to unpack a dense answer, so it didn't come out quite that, quite right. So I won't make that mistake. >> Sri: That's a very backhanded compliment there. (George laughing) >> So, explain to me though, the, I know from all the folks who are trying to embed predictive analytics in their solutions, the operationalizing of the model is very difficult, you know, to integrate it with the system of record. >> Yeah, yeah, yeah. >> How do, you know, how do you guys do that? >> So a good point. There are two ways by which we do it. One is we have something called the AppCenter. It's called Teradata AppCenter. The AppCenter is a core capability of some of the work we've done so far, in fact we've had it for the last, I don't know, four years or so. We've actually expanded it across, uh, to include a lot of the apps. So the idea behind the AppCenter is that it's a framework for us to be able to develop very specific apps for us to be able to deliver the model so that next time, as and when realtime data comes in, when you connect to a database for instance. So the way the app works is that you set up the app. There's a code that we've created, it's all prebuilt code that he put behind that app, and it runs, the app runs. Every time the data is refreshed, you can run the app, and it automatically comes up with visualizations which allow you to be able to see what's happening with your customers in realtime. So that's one way to operationalize. In fact, you know, if you come by to our booth, we can show you a demo as to how the AppCenter works. The other say by which we've done it is to develop a software development kit where we actually have created an operationalization. So, as an, I'll give you an example, right? We developed an app, a realtime operationalization app where the folks in the call center are assessing whether you should be given a loan to buy a certain kind of car, a used car, brand new car, what have you the case might be. So what happens is the call center person takes information from you, gets information about, you know, what your income level is, you know, how long you've been working in your existing job, what have you. And those are parameters that are passed into the screen- >> By the way, I should just say, on the income level, it's way too low for my taste. >> Those are, um, those are comments I'll take, uh, later. >> Off slide. >> But, I mean, you got a brand new Armani suit, so you're not doing badly. But, uh, so what happens is, you know, as and when the data goes into the parameters, right, the call center person just clicks on the button, and the model which sits behind the app picks up all the parameters, runs it, and spews out a likelihood score saying that this person is 88% likely- >> So an AppCenter is not just a full end to end app, it also can be a model. >> AppCenter can include the model which can be used to operationalize as and when the data comes in. >> George: Okay. >> It's a very core part of our offering. In fact, AppCenter is, I can't stress how important, I can't stress enough how important it is to our ability to operationalize our various analytic models. >> Okay, one more techy question in terms of how that's supported. Is the AppCenter running on Aster or the models, are they running on Aster, uh, the old Aster database or Teradata? >> Well, just to be clear, right, so the Aster solution is called Aster Analytics of which one foreign factor contains a database, but you have Aster which is in Hadoop, you have Aster in the Cloud, you have Aster software only, so there's a lot of difference between these two, right? So AppCenter sits on Aster, but right now, it's not just the Aster AppCenter. It's called the Teradata AppCenter which sits on, with the idea is that it will sit on Teradata products as well. >> George: Okay. >> So again, it's a really core part of our evolution that we've come up with. We're very proud of it. >> On that note, we have to wrap it up for today, but to be continued. >> Sri: Time flies when you're having fun. >> Yes. So this is George Gilbert. I am with Sri Raghavan from Teradata. We are at DataWorks 2017 in San Jose, and we will be back tomorrow with a whole lineup of exciting new guests. Tune in tomorrow morning. Thanks. (electronic music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. in the era of big data and advanced analytics. And as you also know, in the last, you know, the solutions that people build with the technology Across the various areas of encounters they have with you come by and say, you know, or come and go having said How do you capture a broader, you know, a broader journey? is because, to buy things, because, you know, so we have filtering, you know, data filters in place you know, a channel where you need to which says, you know, come online, So the thing to be very clear here, That's, there's a lot to unpack in there, Sri: That's a very backhanded compliment there. you know, to integrate it with the system of record. So the way the app works is that you set up the app. By the way, I should just say, on the income level, But, uh, so what happens is, you know, So an AppCenter is not just a full end to end app, AppCenter can include the model which can be used to I can't stress enough how important it is to our Is the AppCenter running on Aster or the models, you have Aster in the Cloud, you have Aster software only, So again, it's a really core part of our evolution On that note, we have to wrap it up for today, and we will be back tomorrow with a whole lineup

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Sri Raghavan	PERSON	0.99+
San Jose	LOCATION	0.99+
Teradata	ORGANIZATION	0.99+
tomorrow morning	DATE	0.99+
88%	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
90%	QUANTITY	0.99+
five	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
last year	DATE	0.99+
today	DATE	0.99+
seven years	QUANTITY	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
four years	QUANTITY	0.98+
each part	QUANTITY	0.98+
Sri	PERSON	0.97+
AppCenter	TITLE	0.96+
Armani	ORGANIZATION	0.96+
one thing	QUANTITY	0.96+
DataWorks 2017	EVENT	0.96+
two	QUANTITY	0.96+
First	QUANTITY	0.96+
one	QUANTITY	0.96+
both	QUANTITY	0.95+
this year	DATE	0.93+
Aster	TITLE	0.92+
Aster Analytics	TITLE	0.88+
Teradata - DataWorks Summit 2017	EVENT	0.88+
Number two	QUANTITY	0.86+
Tune	DATE	0.82+
Hortonworks	ORGANIZATION	0.82+
one way	QUANTITY	0.81+
Terada	ORGANIZATION	0.8+
Hadoop	TITLE	0.77+
Influencer Summit	EVENT	0.77+
25	QUANTITY	0.68+
Aster AppCenter	COMMERCIAL_ITEM	0.62+
year before	DATE	0.55+
plus years	DATE	0.52+
past	DATE	0.51+
Teradata	TITLE	0.47+
theCUBE	ORGANIZATION	0.47+
second	QUANTITY	0.47+
theCUBE	EVENT	0.44+
Cloud	TITLE	0.4+
AppCenter	COMMERCIAL_ITEM	0.38+

Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017

>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
hundreds	QUANTITY	0.99+
90%	QUANTITY	0.99+
Twice	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
twice	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trillium	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
both	QUANTITY	0.99+
60	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Data Lake	ORGANIZATION	0.99+
less than 10%	QUANTITY	0.99+
this week	DATE	0.99+
one day	QUANTITY	0.99+
Tendu	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
87%	QUANTITY	0.99+
first step	QUANTITY	0.99+
thousands of customers	QUANTITY	0.99+
Syncsort	TITLE	0.98+
87	QUANTITY	0.98+
one	QUANTITY	0.98+
Atlas	TITLE	0.98+
later this year	DATE	0.98+
SyncSort	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.98+
a year and a half ago	DATE	0.97+
Tendu	PERSON	0.97+
DataWorks Summit 2017	EVENT	0.97+
Day One	QUANTITY	0.97+
Fortune 500	ORGANIZATION	0.96+
a week	QUANTITY	0.96+
one side	QUANTITY	0.96+
Fortune 100	ORGANIZATION	0.96+
Scott Voiceover	PERSON	0.95+
Hadoop	TITLE	0.93+
Atlas	ORGANIZATION	0.93+
theCUBE	ORGANIZATION	0.92+
this morning	DATE	0.92+
CTO	PERSON	0.92+
day one	QUANTITY	0.92+
couple	QUANTITY	0.91+
last six months	DATE	0.9+
first use cases	QUANTITY	0.9+
early February of this year	DATE	0.89+
theCube	ORGANIZATION	0.89+
CUBE Alumni	ORGANIZATION	0.87+
DataWorks summit	EVENT	0.86+
today	DATE	0.86+
Talco financial services	ORGANIZATION	0.85+
every 30 seconds	QUANTITY	0.83+
Fortune	ORGANIZATION	0.8+
Kafka	PERSON	0.79+
DMX-h	ORGANIZATION	0.75+
data lake	ORGANIZATION	0.73+
Man's Voiceover	TITLE	0.6+
Kafka	TITLE	0.6+

David Lyle, Informatica - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's the Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to the Cube, I'm Lisa Martin with my co-host, Peter Buress. We are live on day one of the DataWorks Summit in Silicon Valley. We've had a great day so far, talking about innovation across different, different companies, different use cases, it's been really exciting. And now, please welcome our next guest, David Lyle from Informatica. You are driving business transformation services. >> Yes. >> Lisa: Welcome to the Cube. >> Well thank you, it's good to be here. >> It's great to have you here. So, tell us a little about Informatica World, Peter you were there with the Cube. Just recently some of the big announcements that came out of there, Informatica getting more aggressive with cloud movement, extending your master data management strategy, and you also introduce a set of AI capabilities around meta-data. >> David: Exactly. >> So, looking at those three things, and your customer landscape, what's going on with Informatica customers, where are you seeing these great new capabilities be, come to fruition? >> Absolutely, well one of the areas that is really wonderful that we're using in every other aspect of our life is using the computer to do the logical things it should, and could, be doing to help us out. So, in this announcement at Informatica World, we talked about the central aspect of meta-data finally being the true center of Informatica's universe. So bringing in meta-data-- >> And customer's universes. >> Well, and customer's universes, so the, not seeing it as something that sits over here that's not central, but truly the thing that, is where you should be focusing your attention on. And so Informatica has some card carrying PhD artificial intelligence machine learning engineers, scientists, that we have hired, that have been working for several years, that have built this new capability called CLAIRE. That's the marketing term for it, but really what it is, it's helping to apply artificial intelligence against that meta-data, to use the computer to do things for the developer, for the analyst, for the architect, for the business people, whatever, that are dealing with these complex data transformation initiatives that they're doing. Where in the past what's been happening is whatever product you're using, the product is basically keeping track of all the things that the scientist or analyst does, but isn't really looking at that meta-data to help suggest the things, that they, that maybe has already been done before. Or domains of data. Why, how come you have to tell the system that this is an address? Can't the system identify that when data looks like this, it's an address already? We think about Shazam and all these other apps that we have on our phones that can do these fantastic things with music. How come we can't do those same things with data? Well, that's really what CLAIRE can actually do now is discover these things and help. >> Well, I want to push now a little bit. >> David: Sure, sure. >> So, historically meta-data was the thing that you created in the modeling activity. >> David: Right. >> And it wasn't something that you wanted to change, or was expected to change frequently. >> In fact, in the world of transaction processing, you didn't want to change. >> Oh, yeah. And especially you get into finance apps, and things like that, you want to keep that slow. >> Exactly. >> Yeah. >> And meta-data became one of those things that often had to be secured in a different way, and was one of those reasons why IT was always so slow. >> Yeah. >> Because of all these concerns about what's the impact on meta-data. >> Yeah. >> We move into this big data world, and we're bringing forward many of the same perspectives on how we should treat meta-data, and what you guys are doing is saying "that's fine, keep the meta-data of that data, but do a better job of revealing it, and how it connects-- >> David: Exactly. >> and how it could be connected." And we talked about this with Bill Schmarzo just recently-- >> Good friend of mine. >> Yeah, the data that's in that system can also be applied to that system. >> Yeah. >> It doesn't have to be a silo. And what CLAIRE is trying to do is remove some of the artificial barriers-- >> Exactly. >> Of how we get access to data that are founded by organization, or application, or system. >> David: Right. >> And make it easier to find that data, use that data, and trust the data. >> Exactly. >> Peter: I got that right? >> You've totally got that right. So, if we think about all these systems in a organization as this giant complex air ball, that in the past we may have had pockets of meta-data here and there that weren't really exposed, or controlled in the right way in the first place. But now bringing it together. >> But also valuable in the context of the particular database or system-- >> Yep. >> that was running. It wasn't the meta-data that was guarded as valuable-- >> Right. that just provided documentation for what was in the data. >> Exactly, exactly. So, but now with this ability to see it, really for the first time, and understand how it connects and impacts with other systems, that are exchanging data with this, or viewing data with this. We can understand then if I need, occasionally, to make a change to the general ledger, or something, I can now understand what impact on different KPIs, and the calculations stream of tableaux, business objects, cognos, micro strategy, quick, whatever. That, what else do I need to change? What else do I need to test? That's something computers are good at. Something that humans have had to do manually up to this point. And that, that's what computers are for. >> Right. >> So questions for you on the business side. Since we look at-- >> Yeah. >> Businesses are demanding real time access to data to make real time decisions, manage costs, be competitive, and that's driving cloud, it's driving IOTs, it's driving big data and analytics. You talked about CLAIRE, and the implications of it across different people within an organization. >> Right. Meta-data, how does a C-Sweet, or a senior manager care-- >> David: Good point. >> About meta-data? >> They don't, and that's why we don't talk about the word architecture. Typically we see sweet folks we don't use the word meta-data. We see sweet folks, instead we talk about things like solving the problem of time, to get the application, or information that you need, reducing that time by being able to see and change and retest the things that need to be. So we just change the discussion to either dollars, or time, or of course those are really equivalent. >> But really facilitated by this-- >> Exactly. >> Artificial intelligence. >> It's facilitated by this artificial intelligence. It can also then lead to the, when we get into data lakes, ensuring that those data lakes are, understood better, trusted better, that people are being able to see what other people are actually using. And in other words we kind of bring, somewhat, the Amazon.com website model to the data lake, so that people know, okay, if I'm looking of a product, or data set, that looks like this for my, our, processing data science utility, or what I want to do. Then these are the data sets that are out there, that may be useful. This is how many people have used them, or who those other people are, and are those people kind of trusted, valid, people that have done similar stuff to what I want to do before? Anyway, all that information we're used to when we buy products from Amazon, we bring that now to the data lake that you're putting together, so that you can actually prevent it, kind of, from being a swamp and actually get value at it. Once again, it's the meta-data that's the key to that, of getting the value out of that data. >> Have you seen historically that, you're working with customers that, have or are already using hadoop. >> David: That's right. >> They've got data lakes. >> Oh yeah. >> Have you seen that historically they haven't really thought about meta-data as driving this much value before, is this sort of a, not a new problem, but are you seeing that it's not been part of their-- >> It's a new. >> strategic approach. >> That's right, it's a new solution. I think you talk to anybody, and they knew this problem was coming. That with a data lake, and the speed that we're talking about, if you don't back that up with the corresponding information that you need to really digest, you can create a new mess, a new hairball, faster than you ever created the original hairball you're trying to fix in the first place. >> Lisa: Nobody likes a hairball. >> Nobody likes a hairball, exactly. >> Well it also seems as though, for example at the executive level, do I have a question? Can I get this question answered? How do I get this question answered? How can I trust the answer that I get? In many respects that's what you guys are trying to solve. >> David: Exactly, exactly. >> So, it's not, hey what you need to do is invest a whole bunch in the actual data, or copying data, or moving a bunch of data around. You're just starting with the prob, with the observation, with the proposition. Yes, you can answer this question, here's how you're going to do it, and you can trust it because of this trail-- >> David: Exactly. >> Of activities based on the meta-data. >> Exactly, exactly. So, it's about helping to, hate to use the phrase again, but "detangle" that hairball, so that, or at least manage it a bit, so that we can begin to move faster and solve these problems with a hell of a lot more confidence. So we have-- >> Can we switch gears? >> Absolutely. >> Certainly. >> Let's switch gears and talk about transformations. >> Yeah. >> I know that's something that is near and dear to your heart, and something you're spending a lot of time with clients in. >> Yeah. >> How, how do you approach, when a customer comes to you, how are they approaching the transformation, and what are they, what's the conversation that you're having with them? >> Well, it's interesting that the phrase has, and I'm even thinking of changing our group's title to digital transformation services, not just because it's hot, but because, frankly, the fluid or the thing, the glue, that really makes that happen is data in these different environments. But the way that we approach it is by, well understanding what the business capabilities are that are affected by the transformation that is being discussed. Looking at and prioritizing those capabilities based upon the strategic relevance of that capability, along with the opportunity to improve, and multiplying those together, we can then take those and rank those capabilities, and look at it in conjunction with, what we call a business view of the company. And from that we can understand what the effects are on the different parts of the organization, and create the corresponding plans, or roadmaps that are necessary to do this digital transformation. We actually bought a little stealth acquisition of a company two years ago, that's kind of the underpinnings of what my team does, that is extremely helpful in being able to drive these kinds of complex transformations. In fact, big companies, a lot, several in this room in a way, are going through the transformation of moving from a traditional software license sale transaction with the customer to a subscription, monthly transaction. That changes marketing. That changes sales. That changes customer support. That changes R&D. Everything Changes. >> Everything, yeah. >> How do you coordinate that? What is the data that you need in order to calculate a new KPI for how I judge how well I'm doing in my company? Annual recurring revenue, or something. It's a, these are all, they get into data governance. You get into all these different aspects, and that's what our team's tool and approach is actually able to credibly go in, and lay out this road map for folks that is shocking, kind of, in how it's making complex problems manageable. Not necessarily simple. Actually it was Bill Schmarzo, on the, he told me this 15 years ago. Our problem is not to make simple problems mundane, our problem, or what we're trying to do, is make complex problems manageable. I love that. >> Sounds like something-- >> I love that. >> Bill would say. >> That's an important point though about not saying "we're going to make it simple-- >> No. >> we're going to make it manageable." >> David: Exactly. >> Because that's much more realistic. >> David: Right. >> Don't you think? >> David: Exactly, exactly. The fact-- >> I dunno, if we can make them simple, that's good too. >> That would be nice. >> Oh, we'd love that >> Yeah. >> Oh yeah. >> When it happens, it's beautiful. >> That's art. >> Right, right. >> Well, your passion and your excitement for what you guys have just announced is palpable. So, obviously just coming off that announcement, what's next? We look out the rest of the calendar year, what's next for Informatica and transforming digital businesses? >> I think it is, you could say the first 20 years, almost, of Informatica's existence was building that meta-data center of gravity, and allowing people to put stuff in, I guess you could say. So going forward, the future is getting value out. It's continually finding new ways to use, in the same way, for instance, Apple is trying to improve Siri, right? And each release they come out with more capabilities. Obviously Google and Amazon seems to be working a little better, but nevertheless, it's all about continuous improvement. Now, I think, the things that Informatica is doing, is moving that, power of using that meta-data also towards helping our customers more directly with the business aspect of data in a digital transformation. >> Excellent. Well, David, thank you so much for joining us on the Cube. We wish you continued success, I'm sure the Cube be back with Informatica in the next round. >> Excellent. >> Thanks for sharing your passion and your excitement for what you guys are doing. Like I said, it was very palpable, and it's always exciting to have that on the show. So, thank you for watching. I'm Lisa Martin, for my co-host Peter Burress, we thank you for watching the Cube again. And we are live on day one of the Dataworks summit from San Jose. Stick around, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. We are live on day one of the It's great to have you here. and could, be doing to help us out. that we have on our phones that can do that you created in the modeling activity. you wanted to change, In fact, in the world of transaction processing, And especially you get into finance apps, things that often had to be secured in a different way, Because of all these concerns And we talked about this with Bill Schmarzo just recently-- Yeah, the data that's in that system is remove some of the artificial barriers-- that are founded by organization, And make it easier to find that data, that in the past we may have had pockets of that was running. that just provided documentation and the calculations stream of tableaux, So questions for you on the business side. and the implications of it across Meta-data, how does a C-Sweet, or a senior manager care-- and change and retest the things that need to be. it's the meta-data that's the key to that, Have you seen historically that, and the speed that we're talking about, In many respects that's what you guys are trying to solve. and you can trust it because of this trail-- so that we can begin to move faster near and dear to your heart, And from that we can understand what the What is the data that you need in order David: Exactly, exactly. for what you guys have just announced is palpable. and allowing people to put stuff in, I'm sure the Cube be back with and it's always exciting to have that on the show.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Peter Buress	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Lyle	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
Google	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Siri	TITLE	0.99+
San Jose	LOCATION	0.99+
Amazon.com	ORGANIZATION	0.99+
CLAIRE	PERSON	0.99+
first time	QUANTITY	0.99+
one	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
Bill	PERSON	0.97+
first 20 years	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
two years ago	DATE	0.97+
Informatica World	ORGANIZATION	0.96+
Dataworks	EVENT	0.96+
each release	QUANTITY	0.96+
Cube	COMMERCIAL_ITEM	0.95+
first place	QUANTITY	0.93+
day one	QUANTITY	0.92+
three things	QUANTITY	0.92+
Hortonworks	ORGANIZATION	0.9+
Informatica - DataWorks Summit 2017	EVENT	0.89+
15 years ago	DATE	0.88+
Cube	ORGANIZATION	0.88+
Cube	TITLE	0.53+
Shazam	ORGANIZATION	0.51+
years	QUANTITY	0.47+

David Hseih, Qubole - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host Peter Burgess. Just chatting with our next guest about the Warriors win yesterday we're also pretty excited about that. David Hseih the SVP of Marketing from Qubole, hi David. >> David: Hey, thanks for having me. >> Welcome to theCUBE, we're glad you still have a voice after no doubt cheering on the home team last night. >> It was a close call 'cause I was yelling pretty loud yesterday. >> So talk to us about you the SVP of Marketing for Qubole. Big data platform in the cloud. You guys just had a big announcement a few weeks ago. >> David: Right. >> What are your thoughts, what's going on with Qubole? What's going on with big data? What are you seeing in the market? So you know we're a cloud-native data platform and you know when we talk to customers, we're really, you know, they're really complaining about how they're just struggling with complexity and the barriers to entry and you know, they're really crying out for help. And the good news I suppose is we're in an industry that has a very high pace of innovation. That's great right. Spark has had eight versions now in two years, but that pace of innovation is, you know, making the complexity even harder. I was watching Cloudera bragging about how their new product is a combination of 24 open source projects. You know that's tough stuff, right. So if you're a practitioner trying to get big data operationalized in your company. And trying to scale the use of data and analytics across the company. The nature of open source is it's designed for flexibility. Right, the source codes public, you have all these options, configuration settings et cetera. But moving those into production and then scaling them in a reliable way is just crushing practitioners. And so data teams are suffering, and I think frankly it's bad for our industry, because, you know, Gardner's talking about a, you know, 80% failure rate of big data projects by 2018. Think about that, what industry can survive when 70 or 80% of the projects fail? >> Well I think what's let me push on that a little bit. Because I think that the concern is about, is not about 70 to 80% of the efforts to reach an answer in a complex big data thing, it's going to fail. We can probably accommodate that, but what we can't accommodate is failure in the underlying infrastructure. >> David: Absolutely. >> So the research we've done, suggest something as well that we are seeing an enormous amount of time spent on the underlying infrastructure. And there's a lot of failures there. People would say, I have a question, I want to know if there's an answer and try to get to that answer, and not getting the answer they want, >> David: Yep. or getting a different answer. That kind of failure is still okay. >> David: Right. >> Because that's experience, you get more and more and more. >> David: Absolutely. >> So it's not the failure in the data science side or the application side. >> Actually I would say getting to an answer you don't like, is a form of success. Like you have an idea, you try it out, that's all great. >> So what Gardner is really saying it's failure in the implementation of the infrastructure. >> That's exactly right. >> So it's the administrative and operational sides. >> Correct, it's a project that didn't deliver then resolve. If the end result what you hoped, great. >> You couldn't even answer your question. >> Exactly, couldn't even answer the question. >> So let me test something on you Dave, David. We've been carrying a thesis at Wikibon for awhile that it looks like opensource is proving that it's very good at mimicking, and not quite as good at inventing. >> David: Right. >> So by that I mean if you put an operating, drop an operating system in front of Linus Torvalds he can look at that and say I can do that. >> David: Right. >> And do a great job of it. If you put a development tool same kind of thing. But big data is very complex, a lot of it, an enormous number of usecases. >> David: Correct. >> And open source has done a good job at a tool level and it looks as though the tools are being built to make other tools more valuable, >> David: Ha, right. >> As opposed to making it easy for a business to operationalize data science and the use of big data in their business. Would you agree or disagree with that? >> I yeah, I think that sort of like fundamentally the philosophy of open source. You know I'm going to do my work, something I need for me, but I'm going to share it with everybody else. And they can contribute. But at the end of the day, you know, unlike commercial software, there's sort of no one throat to choke. Right and there's nobody who is going to guarantee the interoperability and the success of the piece of software that you're trying to deploy. >> There's mot even a real coherent vision in many respects. >> David: No, absolutely not. >> What the final product's going to end up looking like. >> So what you have is a lot of really great cutting edge technology that a lot of really smart people, sort of poured their hearts and souls into. But that's a little different than trying to get to an end result. And, you know. Like it or not, commercial software packages are designed to deliver the result you pay for. Open source being sort of philosophically, very different I think breeds you know inherent complexity. And that complexity right now, is I think the root of the problem in our industry. >> So give us an example David, you know, you're a Marketing guy, I'm a marketing gal. >> Sure. >> Give us an example of a customer, maybe one of your favorite examples, where are you helping them? They're struggling here, they've made significant investments from an infrastructure perspective. They know there's value in the data, >> David: Yup. varying degrees as we've talked about before. How does Qubole get in there and start helping this usecase customer start to optimize, and really start making this big data project successful? >> That's a great question. So there's really two things, number one is that we are a SAAS based platform in the cloud and what we do basically is make big data into more of a turnkey service. So actually the other day, I was sort of surfing the internet, and we have a customer from Sonic Drive-In. You know they do hamburgers and stuff. >> Lisa: Oh yeah. >> And they're doing a bunch of big data, and this guy was at a data science meet, talking about. We didn't put him up to this, he just volunteered. He was talking about how we made his life so much easier. Why, because all of the configurations stuff, the settings, and you know, how to manage costs, was basically filling out a form and setting policy and parameters. And not having to write scripts and figure out all these configuration settings. If I set this one this way and that one that way, what happens. You know, we have a sort of more curated environment that makes that easy. But the thing that I'm really excited about is we think this is the time to really look at having data platforms that can you know, build or run autonomously. Today companies have to hire really expensive, really highly skilled, super smart data engineers, and data ops people to run their infrastructure. And you know, if you look at studies, we're about a 180,000 people short of the number of data engineers, data ops this industry needs. So try to scale by adding more smart people is super hard. Right but instead if you could start to get machines to do what people are doing. Just faster, cheaper, more reliably. Then you can scale your data platform. So we basically, made an announcement a couple weeks ago, kind of about the industry's first autonomous data platform. And what we're building, are software agents that can take over certain types of data management tasks so that data engineers don't have to do it. Or don't have to be up at three in the morning making sure everything is going right. >> And from a market segmentation perspective where's your sweet spot for that? Enterprise, SMB, somewhere in the middle? >> The bigger you have to scale. It's not about company size it's really about sort of the scope and scale of your big data efforts. So you know, the more people you have using it, then the more data you have. The more you want automation to make things easier. It's sort of true of any industry, it's certainly going to be true of the big data industry. >> Peter: Yeah more complexity in the question set, >> Correct. >> The more complexity-- >> Or the more users you have, the more it gives. Adds more data sources. >> Which presumable is going to be correlated. >> Absolutely correct. >> Which is we can use a big data project to ascertain that. >> Well in fact that sort of what we're doing. Because we're a SAAS platform we take in the metadata from what our customers are doing. What users, what clusters, what queries, which tables, all that stuff. We basically use machine learning and artificial intelligence to analyze how you're using your data platform. And tell you what you could do better or automates stuff that you don't have to do anymore. >> So we've presumed that the industry at some point of time, the big data industry at some point of time, is going to start moving it's attention to things like machine learning and A.I., you know, up into applications. >> David: Yep. >> Are we going to see the big data industry basically more pretty rapidly into more of inservice or application conversation, or is it going to kind of are we going to see a rebirth, as folks try to bring a more coherent approach to the existing, many of the tools that are here right now. >> David: Right. >> What do you think? >> Well I think, we're going to see some degree of industry consolidation, and you're going to see vendors, you know, and you're seeing it today. Try to simplify and consolidate. Right so some of that is moving stack towards applications some of that is about repackaging their offerings and adding simplicity. It's about using artificial intelligence to make the operational platform itself easier. I think you'll see a variety of those things, because you know, companies have too many places where they can stumble in their deployment. And you know, it's going to be, you know, the vendor community has to step in and simplify those things to basically gain greater adoption. >> So if you think about it, what is, I mean I have my own idea, but what do you think the metric that businesses should be using as they conceive of how to source different tools and invest in different tools, put things together. I think it's increasingly we're going to talk about time to value. What do you think? >> I think time to value is one. I think another one you could look at is the number of people who have access to the data to create insights. Right so you know, you can say a 100% of my company has access to the data and analytics that they need to help their function run better. Whatever it is, that's a pretty awesome accomplishment. And you know, there's a bunch of people who may or may not have 100% but they're pretty close, right. And they've really become a data driven enterprise. And then you have lots of companies what are sort of stuck with, okay we have this usecase running, thank goodness. Took us two years and a couple million bucks and now they're trying to figure out how to get to the next step. And so they have five users who are able to use their data platform successfully. That's you know, I think that's a big measure of success. >> So I want to talk quickly about, if I may about the cloud. >> David: Yeah. >> Because it's pretty clear there are a number of, that there are some very, very large shops. >> David: Yep. >> That are starting to conceive of important parts of their overall approach to data. >> David: Right. >> And putting things into the cloud. There's a lot of advantages of doing it that way. At the same time they're also thinking about, and how I'm going to integrate, the models that I generate out of big data back into applications that might be running in a lot of different places. >> Right. >> That suggests there's going to be a new challenge on the horizon. Of how do we think about end to end bringing applications together with predictable date of movement and control and other types of activities. >> David: Yeah. >> Do you agree that's on the horizon of how we think about end to end performance across multiple different clouds? >> I think that's coming, you know, I think I'm still surprised at how many people have not figured out that the economic and agility advantages of cloud, are so great, that'd you'd be honestly foolish not to, you know, consider cloud and have that proactive way to migrate there. And so there is just you know a shocking amount of companies that are still plotting away, you know, and building their own prime infrastructures et cetera. And they still have hesitancy and questions about the cloud. I do think that you're right, but I think what you're talking about is, you know, three to five years out for the mainstream in the industry. Certainly there are early adopters you know, who have sort of gotten there. They're talking about that now. But as sort of a mainstream phenomenon I think that's a couple years out. >> Excuse me Peter, one of the things that just kind of made me think of was, you know, these companies as what you're saying, that is till had hesitancy regarding cloud. >> Right. >> And kind of vendor lock in popped into my head. And that kid of brought me back to one of the things that you were mentioning in the beginning. Open source, complexity there. >> David: Yep. >> Are you seeing, or are you helping companies to go back to more of that commercialized proprietary software. Are you seeing a shift in enterprises being less concerned about lock-in because they want simplicity? >> You know that's a great question. I think in the big data space it's hard to avoid, you know, sort of going down the open source path. I think what people are getting concerned about is getting locked into a single cloud vendor. So more and more of the conversations we have are about, what are your multi-cloud and eventually cross-cloud capabilities? >> Peter: That's the question I just asked, right. >> Exactly so I think more and more of that's coming to the front. I was with a large, very large healthcare company a week ago, and I said, what's your cloud strategy? And they said we have a no vendor left behind policy. So you know our, we're standardized on Azure, we've got a bunch of pilots on AWS, and we're planning to move from a data warehousing vendor to Oracle in the cloud. Ha so, I think for large companies a lot of them can't control the fact that different division, departments, whatever will use different clouds. So architecturally, they're going to have to start to think about using these multi-cloud, cross-cloud you know, scenarios. And you know, most large companies, given a choice, will not bet the farm on a single cloud provider. And you know, we're great partners and we love Amazon, but every time they have you know, an S3 outage like they had a few months ago. You know, it really makes people think carefully about what their infrastructure is and how they're dealing with reliability. >> Well in fairness they don't have that many, >> They don't, it only takes one. >> That's right, that's right, and there's reasons to suspect that there will be increased specialization of services in the cloud. >> David: Correct. >> So I mean it's going to get more complex as we go as well. >> David: Oh absolutely correct. >> Not less. >> Well David Hseih, SVP of Marketing at Qubole. Thank you so much for joining, >> Thank you. >> And sharing your insights with Peter and myself. It's been very insightful. >> Right. >> So this is another great example of how we've been talking about the Warriors and food, Sonic was brought up into play here. >> David: Exactly, go Sonic. Very exciting you never know what's going to happen on theCUBE. So for David and Peter, I am Lisa Martin, You're watching Day One, of the Data Work Summit, in the heart of Silicon Valley. But stick around because we've got more great content coming your way.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. in the heart of Silicon Valley. Welcome to theCUBE, we're glad you still have a voice It was a close call 'cause I was So talk to us about you the SVP of Marketing for Qubole. and the barriers to entry and you know, is not about 70 to 80% of the efforts to reach So the research we've done, suggest something as well That kind of failure is still okay. So it's not the failure in the Like you have an idea, you try it out, that's all great. it's failure in the implementation of the infrastructure. If the end result what you hoped, great. So let me test something on you Dave, David. So by that I mean if you put an operating, If you put a development tool same kind of thing. and the use of big data in their business. But at the end of the day, you know, unlike are designed to deliver the result you pay for. So give us an example David, you know, you're a They know there's value in the data, and really start making this big data project successful? So actually the other day, I was sort of surfing the the settings, and you know, how to manage costs, So you know, the more people you have Or the more users you have, the more it gives. or automates stuff that you don't have to do anymore. you know, up into applications. many of the tools that are here right now. And you know, it's going to be, you know, I mean I have my own idea, but what do you think And you know, there's a bunch of people who may or that there are some very, very large shops. of their overall approach to data. and how I'm going to integrate, the models That suggests there's going to be I think that's coming, you know, I think I'm still just kind of made me think of was, you know, And that kid of brought me back to one of the things Are you seeing, or are you helping companies So more and more of the conversations we have And you know, we're great partners and we love Amazon, to suspect that there will be increased Thank you so much for joining, And sharing your insights with Peter and myself. talking about the Warriors and food, Very exciting you never know what's

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Peter Burgess	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Hseih	PERSON	0.99+
Peter	PERSON	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
70	QUANTITY	0.99+
Dave	PERSON	0.99+
five users	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
2018	DATE	0.99+
Oracle	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
two years	QUANTITY	0.99+
100%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
24 open source projects	QUANTITY	0.99+
80%	QUANTITY	0.99+
three	QUANTITY	0.99+
Gardner	PERSON	0.99+
Qubole	ORGANIZATION	0.99+
Sonic Drive-In	ORGANIZATION	0.99+
Linus Torvalds	PERSON	0.99+
yesterday	DATE	0.99+
two things	QUANTITY	0.99+
one	QUANTITY	0.99+
five years	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
Today	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Data Work Summit	EVENT	0.98+
a week ago	DATE	0.98+
eight versions	QUANTITY	0.97+
last night	DATE	0.97+
theCUBE	ORGANIZATION	0.97+
Day One	QUANTITY	0.96+
today	DATE	0.96+
DataWorks Summit 2017	EVENT	0.96+
Sonic	PERSON	0.95+
single cloud	QUANTITY	0.95+
180,000 people	QUANTITY	0.95+
Hortonworks	ORGANIZATION	0.93+
day one	QUANTITY	0.93+
S3	COMMERCIAL_ITEM	0.93+
Spark	ORGANIZATION	0.92+
Azure	TITLE	0.92+
Cloudera	ORGANIZATION	0.92+
SVP	PERSON	0.89+
few months ago	DATE	0.89+
few weeks ago	DATE	0.86+
first autonomous data platform	QUANTITY	0.86+
a couple weeks ago	DATE	0.81+

Bill Schmarzo, Dell EMC | DataWorks Summit 2017

>> Voiceover: Live from San Jose in the heart of Silicon Valley, it's The Cube covering DataWorks Summit 2017. Brought to you by: Hortonworks. >> Hey, welcome back to The Cube. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host Peter Burris. Not only is this day one of the DataWorks Summit, this is the day after the Golden State Warriors won the NBA Championship. Please welcome our next guess, the CTO of Dell AMC, Bill Shmarzo. And Cube alumni, clearly sporting the pride. >> Did they win? I don't even remember. I just was-- >> Are we breaking news? (laughter) Bill, it's great to have you back on The Cube. >> The Division III All-American from-- >> Cole College. >> 1947? >> Oh, yeah, yeah, about then. They still had the peach baskets. You make a basket, you have to climb up this ladder and pull it out. >> They're going rogue on me. >> It really slowed the game down a lot. (laughter) >> All right so-- And before we started they were analyzing the game, it was actually really interesting. But, kick things off, Bill, as the volume and the variety and the velocity of data are changing, organizations know there's a tremendous amount of transformational value in this data. How is Dell AMC helping enterprises extract and maximize that as the economic value of data's changing? >> So, the thing that we find is most relevant is most of our customers don't give a hoot about the three V's of big data. Especially on the business side. We like to jokingly say they care of the four M's of big data, make me more money. So, when you think about digital transformation and how it might take an organization from where they are today to sort of imbed digital capabilities around data and analytics, it's really about, "How do I make more money?" What processes can I eliminate or reduce? How do I improve my ability to market and reach customers? How do I, ya know-- All the things that are designed to drive value from a value perspective. Let's go back to, ya know, Tom Peters kind of thinking, right? I guess Michael Porter, right? His value creation processes. So, we find that when we have a conversation around the business and what the business is trying to accomplish that provides the framework around which to have this digital transformation conversation. >> So, well, Bill, it's interesting. The volume, velocity, variety; three V's, really say something about the value of the infrastructure. So, you have to have infrastructure in place where you can get more volume, it can move faster, and you can handle more variety. But, fundamentally, it is still a statement about the underlying value of the infrastructure and the tooling associated with the data. >> True, but one of the things that changes is not all data is of equal value. >> Peter: Absolutely. >> Right? So, what data, what technologies-- Do I need to have Spark? Well, I don't know, what are you trying to do, right? Do I need to have Kafka or Ioda, right? Do I need to have these things? Well, if I don't know what I'm trying to do, then I don't have a way to value the data and I don't have a way to figure out and prioritize my investment and infrastructure. >> But, that's what I want to come to. So, increasingly, what business executives, at least the ones who we're talking to all the time, are make me more money. >> Right. >> But, it really is, what is the value of my data? And, how do I start pricing data and how do I start thinking about investing so that today's data can be valuable tomorrow? Or the data that's not going to be valuable tomorrow, I can find some other way to not spend money on it, etc. >> Right. >> That's different from the variety, velocity, volume statement which is all about the infrastructure-- >> Amen. >> --and what an IT guy might be worried about. So, I've done a lot of work on data value, you've done a lot of work in data value. We've coincided a couple times. Let's pick that notion up of, ya know, digital transformation is all about what you do with your data. So, what are you seeing in your clients as they start thinking this through? >> Well, I think one of the first times it was sort of an "aha" moment to me was when I had a conversation with you about Adam Smith. The difference between value in exchange versus value in use. A lot of people when they think about monetization, how do I monetize my data, are thinking about value in exchange. What is my data worth to somebody else? Well, most people's data isn't worth anything to anybody else. And the way that you can really drive value is not data in exchange or value in exchange, but it's value in use. How am I using that data to make better decisions regarding customer acquisition and customer retention and predictive maintenance and quality of care and all the other oodles of decisions organizations are making? The evaluation of that data comes from putting it into use to make better decisions. If I know then what decision I'm trying to make, now I have a process not only in deciding what data's most valuable but, you said earlier, what data is not important but may have liability issues with it, right? Do I keep a data set around that might be valuable but if it falls into the wrong hands through cyber security sort of things, do I actually open myself up to all kinds of liabilities? And so, organizations are rushing from this EVD conversation, not only from a data evaluation perspective but also from a risk perspective. Cause you've got to balance those two aspects. >> But, this is not a pure-- This is not really doing an accounting in a traditional accounting sense. We're not doing double entry book keeping with data. What we're really talking about is understand how your business used its data. Number one today, understand how you think you want your business to be able to use data to become a more digital corporation and understand how you go from point "a" to point "b". >> Correct, yes. And, in fact, the underlying premise behind driving economic value of data, you know people say data is the new oil. Well, that's a BS statement because it really misses the point. The point is, imagine if you had a barrel of oil; a single barrel of oil that can be used across an infinite number of vehicles and it never depleted. That's what data is, right? >> Explain that. You're right but explain it. >> So, what it means is that data-- You can use data across an endless number of use cases. If you go out and get-- >> Peter: At the same time. >> At the same time. You pay for it once, you put it in the data lake once, and then I can use it for customer acquisition and retention and upsell and cross-sell and fraud and all these other use cases, right? So, it never wears out. It never depletes. So, I can use it. And what organizations struggle with, if you look at data from an accounting perspective, accounting tends to value assets based on what you paid for it. >> Peter: And how you can apply them uniquely to a particular activity. A machine can be applied to this activity and it's either that activity or that activity. A building can be applied to that activity or that activity. A person's time to that activity or that activity. >> It has a transactional limitation. >> Peter: Exactly, it's an oar. >> Yeah, so what happens now is instead of looking at it from an accounting perspective, let's look at it from an economics and a data science perspective. That is, what can I do with the data? What can I do as far as using the data to predict what's likely to happen? To prescribe actions and to uncover new monetization opportunities. So, the entire approach of looking at it from an accounting perspective, we just completed that research at the University of San Francisco. Where we looked at, how do you determine economic value of data? And we realized that using an accounting approach grossly undervalued the data's worth. So, instead of using an accounting, we started with an economics perspective. The multiplier effect, marginal perpetuity to consume, all that kind of stuff that we all forgot about once we got out of college really applies here because now I can use that same data over and over again. And if I apply data science to it to really try to predict, prescribe, and monetize; all of a sudden economic value of your data just explodes. >> Precisely because of your connecting a source of data, which has a particular utilization, to another source of data that has a particular utilization and you can combine them, create new utilizations that might in and of itself be even more valuable than either of the original cases. >> They genetically mutate. >> That's exactly right. So, think about-- I think it's right. So, congratulations, we agree. Thank you very much. >> Which is rare. >> So, now let's talk about this notion of as we move forward with data value, how does an organization have to start translating some of these new ways of thinking about the value of data into investments in data so that you have the data where you want it, when you want it, and in the form that you need it. >> That's the heart of why you do this, right? If I know what the value of my data is, then I can make decisions regarding what data am I going to try to protect, enhance? What data am I going to get rid of and put on cold storage, for example? And so we came up with a methodology for how we tie the value of data back to use cases. Everything we do is use case based so if you're trying to increase same-store sales at a Chipotle, one of my favorite places; if you're trying to increase it by 7.1 percent, that's worth about 191 million dollars. And the use cases that support that like increasing local even marketing or increasing new product introduction effectiveness, increasing customer cross-sale or upsell. If you start breaking those use cases down, you can start tying financial value to those use cases. And if I know what data sets, what three, five, seven data sets are required to help solve that problem, I now have a basis against which I can start attaching value to data. And as I look across at a number of use cases, now the valued data starts to increment. It grows exponentially; not exponentially but it does increment, right? And it gets more and more-- >> It's non-linear, it's super linear. >> Yeah, and what's also interesting-- >> Increasing returns. >> From an ROI perspective, what you're going to find that as you go down these use cases, the financial value of that use case may not be really high. But, when the denominator of your ROI calculation starts approaching zero because I'm reusing data at zero cost, I can reuse data at zero cost. When the denominator starts going to zero ya know what happens to your ROI? In infinity, it explodes. >> Last question, Bill. You mentioned The University of San Francisco and you've been there a while teaching business students how to embrace analytics. One of the things that was talked about this morning in the keynote was Hortonworks dedication to the open-source community from the beginning. And they kind of talked about there, with kids in college these days, they have access to this open-source software that's free. I'd just love to get, kind of the last word, your take on what are you seeing in university life today where these business students are understanding more about analytics? Do you see them as kind of, helping to build the next generation of data scientists since that's really kind of the next leg of the digital transformation? >> So, the premise we have in our class is we probably can't turn business people into data scientists. In fact, we don't think that's valuable. What we want to do is teach them how to think like a data scientist. What happens, if we can get the business stakeholders to understand what's possible with data and analytics and then you couple them with a data scientist that knows how to do it, we see exponential impact. We just did a client project around customer attrition. The industry benchmark in customer attrition is it was published, I won't name the company, but they had a 24 percent identification rate. We had a 59 percent. We two X'd the number. Not because our data scientists are smarter or our tools are smarter but because our approach was to leverage and teach the business people how to think like a data scientist and they were able to identify variables and metrics they want to test. And when our data scientists tested them they said, "Oh my gosh, that's a very highly predicted variable." >> And trust what they said. >> And trust what they said, right. So, how do you build trust? On the data science side, you fail. You test, you fail, you test, you fail, you're never going to understand 100 percent accuracy. But have you failed enough times that you feel comfortable and confident that the model is good enough? >> Well, what a great spirit of innovation that you're helping to bring there. Your keynote, we should mention, is tomorrow. >> That's right. >> So, you can, if you're watching the livestream or you're in person, you can see Bill's keynote. Bill Shmarzo, CTO of Dell AMC, thank you for joining Peter and I. Great to have you on the show. A show where you can talk about the Warriors and Chipotle in one show. I've never seen it done, this is groundbreaking. Fantastic. >> Psycho donuts too. >> And psycho donuts and now I'm hungry. (laughter) Thank you for watching this segment. Again, we are live on day one of the DataWorks Summit in San Francisco for Bill Shmarzo and Peter Burris, my co-host. I am Lisa Martin. Stick around, we will be right back. (music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by: Hortonworks. in the heart of Silicon Valley. I don't even remember. Bill, it's great to have you back on The Cube. You make a basket, you have to climb It really slowed the game down a lot. and maximize that as the economic value of data's changing? All the things that are designed to drive value and the tooling associated with the data. True, but one of the things that changes Well, I don't know, what are you trying to do, right? at least the ones who we're talking to all the time, Or the data that's not going to be valuable tomorrow, So, what are you seeing in your clients And the way that you can really drive value is and understand how you go from point "a" to point "b". because it really misses the point. You're right but explain it. If you go out and get-- based on what you paid for it. Peter: And how you can apply them uniquely So, the entire approach of looking at it and you can combine them, create new utilizations Thank you very much. so that you have the data where you want it, That's the heart of why you do this, right? the financial value of that use case may not be really high. One of the things that was talked about this morning So, the premise we have in our class is we probably On the data science side, you fail. Well, what a great spirit of innovation Great to have you on the show. Thank you for watching this segment.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Peter Burris	PERSON	0.99+
Peter	PERSON	0.99+
Bill Shmarzo	PERSON	0.99+
Michael Porter	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Chipotle	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Tom Peters	PERSON	0.99+
Golden State Warriors	ORGANIZATION	0.99+
7.1 percent	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Adam Smith	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Bill	PERSON	0.99+
five	QUANTITY	0.99+
100 percent	QUANTITY	0.99+
59 percent	QUANTITY	0.99+
University of San Francisco	ORGANIZATION	0.99+
two aspects	QUANTITY	0.99+
24 percent	QUANTITY	0.99+
tomorrow	DATE	0.99+
Cole College	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
today	DATE	0.99+
1947	DATE	0.99+
zero	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
about 191 million dollars	QUANTITY	0.98+
one	QUANTITY	0.98+
Dell AMC	ORGANIZATION	0.98+
Cube	ORGANIZATION	0.98+
Dell EMC	ORGANIZATION	0.97+
first times	QUANTITY	0.97+
One	QUANTITY	0.97+
DataWorks Summit 2017	EVENT	0.97+
day one	QUANTITY	0.96+
one show	QUANTITY	0.96+
four M's	QUANTITY	0.92+
zero cost	QUANTITY	0.91+
Hortonworks	ORGANIZATION	0.91+
NBA Championship	EVENT	0.89+
CTO	PERSON	0.86+
single barrel	QUANTITY	0.83+
The Cube	ORGANIZATION	0.82+
once	QUANTITY	0.8+
two X	QUANTITY	0.75+
three V	QUANTITY	0.74+
seven data sets	QUANTITY	0.73+
Number one	QUANTITY	0.73+
this morning	DATE	0.67+
double entry	QUANTITY	0.65+
Kafka	ORGANIZATION	0.63+
Spark	ORGANIZATION	0.58+
Hortonworks	PERSON	0.55+
III	ORGANIZATION	0.46+
Division	OTHER	0.38+
Ioda	ORGANIZATION	0.35+
American	OTHER	0.28+

Joe Goldberg, BMC Software - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, it's The Cube covering DataWorks Summit 2017. Brought to you by Horton works. >> Hi. Welcome back to The Cube. We are live at day one of the DataWorks Summit in San Jose, in the heart of Silicon Valley, hosted by Hortonworks. We've had a great day so far. Lots of innovation. Lots of great announcements. We're very excited to be joined by one of this week's keynotes and Cube alumni, Joe Goldberg, Innovation Evangelist at BMC Software. Welcome back to The Cube. >> Thank you very much. Always a pleasure to be here. >> Exactly and we're happy to have you back. So, talk to us, what's happening with BMC? What are you guys doing there? What are people going to learn in your keynote on Thursday? >> So BMC has been really working with all of our customers to modernize, not only our tool chain, but the way automation is used and deployed throughout the organization. We actually did a survey recently, The State of Automation. We got pretty much the kind of results we would've expected, but this let us really sort of make tangible what we have sort of always felt was, you know the state of this kind of approach to how critical automation is in the enterprise. We had a response from leaders and CXOs that 93% thought that automation was key to helping them make that digital transformation that everyone is involved in today. So, that's been one of the key elements that has really kind of driven everything that we've been doing with BMC today. >> Now, BMC's known especially for handling workflows that operate more than a batch work >> Joe Goldberg: Yes So high certainty, very much predictability in terms of when things going to happen, how long's it going to take, what action's going to take place. Very, very complex types of processing takes place. I'm always fascinated and I've talked to other customers that are wondering about this when you come back to the State of Automation that we want to move, everybody wants to move to interactive. >> Joe Goldberg: Yes. >> But often the jump to interactive takes place well in advance of predictability of how the data's actually being constructed and put together and aggregated in the back end. Talk a little bit about the priorities. How does one...? Cause it's really not a chicken and egg kind of a problem. How does one anticipate excellence in the other? So what we've been hearing and actually I think of the previous Hortonworks or DataWorks Summit, we had one of our customers talk about their approach to what was a fundamental data architecture for them, which was the separation between the speed and batch layer. And I think you hear an awful lot of that kind of conversation. And they run in parallel and from our perspective managing the batch layer really underpins the kind of real actionable insides that you can extract from the speed layer, which is focusing on capturing that very small percentage of what is really the signal in the data, but then being able to take that and enrich it with what you've been collecting and managing using the batch layer. I think that that's the kind of approach that we've seen from a lot of customers, where certainly all of the cool stuff and the focus is on the interactive and the realtime and streaming. But in order to really be able to be predictive, because you know there's no magic, we still don't know how to tell the future. The only to be able to do that is by making sure that you are basing yourself on history that is well, sort of collected, curated, make sure that you have actually captured it, that you've enriched it from a variety of different sources. And that's where we come in. What we have been focusing on is providing a set of facilities for managing batch that is... I talk about hyper heterogeneity, I know that's a mouthful, but that's really what the new enterprise environment is like. So you add or you know, a layer on top of your conventional applications and your conventional data, all of this new data formats and data does now arriving in real time in high volume. I think that taking that kind of an approach is really the only way that you can ensure that you are capturing all of your... Ingesting all of the data that's coming in from all of your endpoints, including you know, IOT applications and really being able to combine it with all of the corporate sort of knowledge that you've accumulated through your traditional sources. >> So, batches historically meant, again a lot of precise code, it had to be written to handle complex jobs and it scared off a lot of folks into thinking about interactive. In the last 10 years, there's been some pretty significant advances in how we think about putting together batch workflows, become much more programmable. How does control (mumbles) and some of the other tool set that BNC provides, How does it fit into? How does it look more like the types of application development, tasks and methods that are becoming increasingly popular, as you think about delivering the outcomes of big data processing to other applications or to other segments? >> So, you know that's very, that's a great question. Its almost like, thanks for the set up. So, you can see. >> Well let's not ask it then. (laughs) >> You can see the shirt that I'm wearing and of course this is very intentional, but our history has been that we've come from the data center, operations focus. And the transition in the marketplace today has been that really the focus has shifted, whether you talk about shift left or everything as code, where the new methods of building and delivering applications really look at everything manual that is done, coding to create an application that's done upfront. And then the rigger for enterprise operations is built in through this automated delivery pipeline. And so, obviously you have to invert this kind of approach that we've had in terms of layering management tools on at the very end and instead you have to be able to inject them into your application early. So, we feel that certainly it's true for all applications and it's I think doubly true in data applications, that the automation and the operational instrumentation is an equal partner to your business logic under the code that you write and so it needs to be created right upfront and then moved together with all of the rest of your application components through that delivery pipeline in a CIDC fashion. And so that is what we have done. And again that what the concept is of Jawless. >> So, as you think about what the next step is, is batch going to, presumably batch will be sustained as mode of operation. How is it going to become even more comfortable to a lot of the development methodologys as we move forward? How do you think it's going to be evolved as a tool for increasing the amount of predictability in that back end? >> So, I think that the key to continuing to evolve this Jawless code approach is to enable developers to be able to build and work with that operational plumbing in the same way they work with their business logic. >> Or any other resource? >> Exactly. So, you know, you think about what are the tools that developers have today when they build, whether you're writing in Java or C or R or Scala, there are development environments, there are these tools that let you test that let you step through your logic to be able to identify and find any flaws, you know sort of bugs in your code. And in order for jawless code to really meet the test of being code, we are working on providing the same kind of capabilities to work with our objects that developers expect to have for programming languages. >> So Joe, I'm not going to shift us back last question here. Kind of looking at more of a business industry level, to do big data write, to bring Hadoop to an enterprise successfully, what are some of the mission critical elements that c-suite really needs to embrace in order to be successful across big industries, like healthcare, financial services, Telco? >> So, I think they have to be able to apply the same requirements and the test for how a big data application moves into their enterprise in terms of, not only how it's operated, but how is it made accessible to all of the constituents that need to use it. One of the key elements we hear frequently is that, and I think it's a danger that when technicians solely create what is the end deliverable tool, it frequently is very technical and it has to be consumable by the people that actually need to use it. And so you have to strike this balance between providing sufficient technical sophistication and business usability and I think that that's kind of a goal for being successful in implementing any kind of technology and certainly big data. >> Excellent. Well, Joe Goldberg, thank you so much for coming back to the Cube and joining my cohost, Peter Burris and I for this great chat. And people can watch your keynote on Thursday. >> Yes. >> This week, on the 15th of June. So again for my cohost Peter Burris. I am Lisa Martin. Thanks so much for watching the Cube live, again at day one of the DataWorks Summit. Stick around. We'll be right back. (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Horton works. in San Jose, in the heart of Silicon Valley, Always a pleasure to be here. What are people going to learn in your keynote on Thursday? We got pretty much the kind of results we would've expected, that are wondering about this when you come back is really the only way that you can ensure the outcomes of big data processing to other applications So, you know that's very, that's a great question. Well let's not ask it then. and so it needs to be created right upfront How is it going to become even more comfortable So, I think that the key to continuing to evolve that let you step through your logic that c-suite really needs to embrace and it has to be consumable by the people for coming back to the Cube again at day one of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Joe Goldberg	PERSON	0.99+
Lisa Martin	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
San Jose	LOCATION	0.99+
Thursday	DATE	0.99+
Joe	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
93%	QUANTITY	0.99+
BMC Software	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
15th of June	DATE	0.99+
Scala	TITLE	0.99+
Java	TITLE	0.99+
One	QUANTITY	0.99+
C	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
The Cube	ORGANIZATION	0.97+
today	DATE	0.96+
DataWorks Summit 2017	EVENT	0.95+
Horton	PERSON	0.94+
Jawless	TITLE	0.94+
Hortonworks	ORGANIZATION	0.94+
this week	DATE	0.94+
Cube	ORGANIZATION	0.91+
R	TITLE	0.9+
last 10 years	DATE	0.87+
day one	QUANTITY	0.86+
BNC	ORGANIZATION	0.82+
Hortonworks	EVENT	0.8+
The State of	TITLE	0.67+
elements	QUANTITY	0.58+
Software	EVENT	0.54+
Cube	PERSON	0.43+

Linton Ward, IBM & Asad Mahmood, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE! Covering Data Works Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. I'm Lisa Martin with my co-host George Gilbert. We are live on day one of the Data Works Summit in San Jose in the heart of Silicon Valley. Great buzz in the event, I'm sure you can see and hear behind us. We're very excited to be joined by a couple of fellows from IBM. A very longstanding Hortonworks partner that announced a phenomenal suite of four new levels of that partnership today. Please welcome Asad Mahmood, Analytics Cloud Solutions Specialist at IBM, and medical doctor, and Linton Ward, Distinguished Engineer, Power Systems OpenPOWER Solutions from IBM. Welcome guys, great to have you both on the queue for the first time. So, Linton, software has been changing, companies, enterprises all around are really looking for more open solutions, really moving away from proprietary. Talk to us about the OpenPOWER Foundation before we get into the announcements today, what was the genesis of that? >> Okay sure, we recognized the need for innovation beyond a single chip, to build out an ecosystem, an innovation collaboration with our system partners. So, ranging from Google to Mellanox for networking, to Hortonworks for software, we believe that system-level optimization and innovation is what's going to bring the price performance advantage in the future. That traditional seamless scaling doesn't really bring us there by itself but that partnership does. >> So, from today's announcements, a number of announcements that Hortonworks is adopting IBM's data science platforms, so really the theme this morning of the keynote was data science, right, it's the next leg in really transforming an enterprise to be very much data driven and digitalized. We also saw the announcement about Atlas for data governance, what does that mean from your perspective on the engineering side? >> Very exciting you know, in terms of building out solutions of hardware and software the ability to really harden the Hortonworks data platform with servers, and storage and networking I think is going to bring simplification to on-premises, like people are seeing with the Cloud, I think the ability to create the analyst workbench, or the cognitive workbench, using the data science experience to create a pipeline of data flow and analytic flow, I think it's going to be very strong for innovation. Around that, most notable for me is the fact that they're all built on open technologies leveraging communities that universities can pick up, contribute to, I think we're going to see the pace of innovation really pick up. >> And on that front, on pace of innovation, you talked about universities, one of the things I thought was really a great highlight in the customer panel this morning that Raj Verma hosted was you had health care, insurance companies, financial services, there was Duke Energy there, and they all talked about one of the great benefits of open source is that kids in universities have access to the software for free. So from a talent attraction perspective, they're really kind of fostering that next generation who will be able to take this to the next level, which I think is a really important point as we look at data science being kind of the next big driver or transformer and also going, you know, there's not a lot of really skilled data scientists, how can that change over time? And this is is one, the open source community that Hortonworks has been very dedicated to since the beginning, it's a great it's really a great outcome of that. >> Definitely, I think the ability to take the risk out of a new analytical project is one benefit, and the other benefit is there's a tremendous, not just from young people, a tremendous amount of interest among programmers, developers of all types, to create data science skills, data engineering and data science skills. >> If we leave aside the skills for a moment and focus on the, sort of, the operationalization of the models once they're built, how should we think about a trained model, or, I should break it into two pieces. How should we think about training the models, where the data comes from and who does it? And then, the orchestration and deployment of them, Cloud, Edge Gateway, Edge device, that sort of thing. >> I think it all comes down to exactly what your use case is. You have to identify what use case you're trying to tackle, whether that's applicable to clinical medicine, whether that's applicable to finance, to banking, to retail or transportation, first you have to have that use case in mind, then you can go about training that model, developing that model, and for that you need to have a good, potent, robust data set to allow you to carry out that analysis and whether you want to do exploratory analysis or you want to do predictive analysis, that needs to be very well defined in your training stage. Once you have that model developed, then we have certain services, such as Watson Machine Learning, within data science experience that will allow you to take that model that you just developed, just moments ago, and just deploy that as a restful API that you can then embed into an application and to your solution, and in that solution you can basically use across industry. >> Are there some use cases where you have almost like a tiering of models where, you know, there're some that are right at the edge like, you know, a big device like a car and then, you know, there's sort of the fog level which is the, say, cell towers or other buildings nearby and then there's something in the Cloud that's sort of like, master model or an ensemble of models, I don't assume that's like, Evel Knievel would say you know, "Don't try that at home," but sort-of, is the tooling being built to enable that? >> So the tooling is already in existence right now. You can actually go ahead right now and be able to build out prototypes, even full-level, full-range applications right on the Cloud, and you can do that, you can do that thanks to Data Science Experience, you can do that thanks to IBM Bluemix, you can go ahead and do that type of analysis right there and not only that, you can allow that analysis to actually guide you along the path from building a model to building a full-range application and this is all happening on the Cloud level. We can talk more about it happening on on-premise level but on the Cloud level specifically, you can have those applications built on the fly, on the Cloud and have them deployed for web apps, for moblie apps, et cetera. >> One of the things that you talked about is use cases in certain verticals, IBM has been very strong and vertically focused for a very long time, but you kind of almost answered the question that I'd like to maybe explore a little bit more about building these models, training the models, in say, health care or telco and being able to deploy them, where's the horizontal benefits there that IBM would be able to deliver faster to other industries? >> Definitely, I think the main thing is that IBM, first of all, gives you that opportunity, that platform to say that hey, you have a data set, you have a use case, let's give you the tooling, let's give you the methodology to take you from data, to a model, to ultimately that full range application and specifically, I've built some applications specific to federal health care, specifically to address clinical medicine and behavioral medicine and that's allowed me to actually use IBM tools and some open source technologies as well to actually go out and build these applications on the fly as a prototype to show, not only the realm, the art of the possible when it comes to these technologies, but also to solve problems, because ultimately, that's what we're trying to accomplish here. We're trying to find real-world solutions to real-world problems. >> Linton, let me re-direct something towards you about, a lot of people are talking about how Moore's law slowing down or even ending, well at least in terms of speed of processors, but if you look at the, not just the CPU but FPGA or Asic or the tensor processing unit, which, I assume is an Asic, and you have the high speed interconnects, if we don't look at just, you know what can you fit on one chip, but you look at, you know 3D what's the density of transistors in a rack or in a data center, is that still growing as fast or faster, and what does it mean for the types of models that we can build? >> That's a great question. One of the key things that we did with the OpenPOWER Foundation, is to open up the interfaces to the chip, so with NVIDIA we have NVLink, which gives us a substantial increase in bandwidth, we have created something called OpenCAPI, which is a coherent protocol, to get to other types of accelerators, so we believe that hybrid computing in that form, you saw NVIDIDA on-stage this morning, and we believe especially for deploring the acceleration provided for GPUs is going to continue to drive substantial growth, it's a very exciting time. >> Would it be fair to say that we're on the same curve, if we look at it, not from the point of view of, you know what can we fit on a little square, but if we look at what can we fit in a data center or the power available to model things, you know Jeff Dean at Google said, "If Android users "talk into their phones for two to three minutes a day, "we need two to three times the data centers we have." Can we grow that price performance faster and enable sort of things that we did not expect? >> I think the innovation that you're describing will, in fact, put pressure on data centers. The ability to collect data from autonomous vehicles or other N points is really going up. So, we're okay for the near-term but at some point we will have to start looking at other technologies to continue that growth. Right now we're in the throws of what I call fast data versus slow data, so keeping the slow data cheaply and getting the fast data closer to the compute is a very big deal for us, so NAND flash and other non-volatile technologies for the fast data are where the innovation is happening right now, but you're right, over time we will continue to collect more and more data and it will put pressure on the overall technologies. >> Last question as we get ready to wrap here, Asad, your background is fascinating to me. Having a medical degree and working in federal healthcare for IBM, you talked about some of the clinical work that you're doing and the models that you're helping to build. What are some of the mission critical needs that you're seeing in health care today that are really kind of driving, not just health care organizations to do big data right, but to do data science right? >> Exactly, so I think one of the biggest questions that we get and one of the biggest needs that we get from the healthcare arena is patient-centric solutions. There are a lot of solutions that are hoping to address problems that are being faced by physicians on a day-to-day level, but there are not enough applications that are addressing the concerns that are the pain points that patients are facing on a daily basis. So the applications that I've started building out at IBM are all patient-centric applications that basically put the level of their data, their symptoms, their diagnosis, in their hands alone and allows them to actually find out more or less what's going wrong with my body at any particular time during the day and then find the right healthcare professional or the right doctor that is best suited to treating that condition, treating that diagnosis. So I think that's the big thing that we've seen from the healthcare market right now. The big need that we have, that we're currently addressing with our Cloud analytics technology which is just becoming more and more advanced and sophisticated and is trending towards some of the other health trends or technology trends that we have currently right now on the market, including the Blockchain, which is tending towards more of a de-centralized focus on these applications. So it's actually they're putting more of the data in the hands of the consumer, of the hands of the patient, and even in the hands of the doctor. >> Wow, fantastic. Well you guys, thank you so much for joining us on theCUBE. Congratulations on your first time being on the show, Asad Mahmood and Linton Ward from IBM, we appreciate your time. >> Thank you very much. >> Thank you. >> And for my co-host George Gilbert, I'm Lisa Martin, you're watching theCUBE live on day one of the Data Works Summit from Silicon Valley but stick around, we've got great guests coming up so we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. Welcome guys, great to have you both to build out an ecosystem, an innovation collaboration to be very much data driven and digitalized. the ability to really harden the Hortonworks data platform and also going, you know, there's not a lot is one benefit, and the other benefit is of the models once they're built, and for that you need to have a good, potent, to actually guide you along the path that platform to say that hey, you have a data set, the acceleration provided for GPUs is going to continue or the power available to model things, you know and getting the fast data closer to the compute for IBM, you talked about some of the clinical work There are a lot of solutions that are hoping to address Well you guys, thank you so much for joining us on theCUBE. on day one of the Data Works Summit from Silicon Valley

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Dean	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Asad Mahmood	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Asad	PERSON	0.99+
Mellanox	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Evel Knievel	PERSON	0.99+
OpenPOWER Foundation	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
Linton	PERSON	0.99+
Linton Ward	PERSON	0.99+
three times	QUANTITY	0.99+
Data Works Summit	EVENT	0.99+
one	QUANTITY	0.98+
first time	QUANTITY	0.98+
today	DATE	0.98+
one chip	QUANTITY	0.98+
one benefit	QUANTITY	0.97+
One	QUANTITY	0.96+
Android	TITLE	0.96+
three minutes a day	QUANTITY	0.95+
both	QUANTITY	0.94+
day one	QUANTITY	0.94+
Moore	PERSON	0.93+
this morning	DATE	0.92+
OpenCAPI	TITLE	0.91+
first	QUANTITY	0.9+
single chip	QUANTITY	0.89+
Data Works Summit 2017	EVENT	0.88+
telco	ORGANIZATION	0.88+
DataWorks Summit 2017	EVENT	0.85+
NVLink	COMMERCIAL_ITEM	0.79+
NVIDIDA	TITLE	0.76+
IBM Bluemix	ORGANIZATION	0.75+
Watson Machine Learning	TITLE	0.75+
Power Systems OpenPOWER Solutions	ORGANIZATION	0.74+
Edge	TITLE	0.67+
Edge Gateway	TITLE	0.62+
couple	QUANTITY	0.6+
Covering	EVENT	0.6+
Narrator	TITLE	0.56+
Atlas	TITLE	0.52+
Linton	ORGANIZATION	0.51+
Ward	PERSON	0.47+
3D	QUANTITY	0.36+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Christoph Streubert, SAP - DataWorks Summit Europe 2017 - #DWS17 - #theCUBE

>> Announcer: Live from Munich, Germany, it's The CUBE, covering DataWorks Summit Europe 2017. Brought to you by Heartenworks. >> Okay, welcome back everyone, we are here live in Munich, Germany For DataWorks 2017, the DataWorks Summit, formally Hadoop Summit. I'm John Furrier with Silicone Angle's theCUBE, my co-host Dave Vellante, wrapping up day two of coverage here with Christoph Schubert, who's the Senior Director of SAP Big Data, handles all the go-to-market for SAP Big Data, @sapbigdata is the Twitter handle. You have a great shirt there, Go Live >> Go Live or go home. (Laughs) >> John: You guys are a part. Welcome to theCUBE. >> Christoph: Thank you, I appreciate it. >> Thanks for joining us and on the wrap up. You and I have known each other, we've known each other for a long time. We've been in many Sapphires together, we've had many conversations around the role of data, the role of architecture, the role of how organizations are transforming at the speed of business, which is SAP, it's a lot of software that powers business, under transformation right now. You guys are no stranger to analytics, we have the HANA Cloud Platform now. >> Christoph: We know a thing or two about that, yeah. (laughs) >> You know a little bit about data and legacy as well. You guys power pretty much most of the Fortune 100, if not all of them. What's your thoughts on this? >> Yeah, good point. On the topic of some numbers, about 75% of the world GDP runs through SAP systems eventually. So yes, we know a thing or two about transactional and analytical systems, definitely. >> John: And you're a partner with Hortonworks >> With Hortonworks and other Cloud providers, Hadoop Providers, certainly, absolutely but in this case, Hortonworks. We have, specifically, a solution that runs on Hadoop Spark and that allows, actually, our customers to unify much, much larger data sets with a system of records that we now do so many of them around the world for new and exciting new cases. >> And you were born in Munich. This is your hometown. >> This is actually a home gig for me, exactly. So, yes, unfortunately I'll also be presenting in English but yeah, I want to talk German, Bavarian, all the time. (laughs) >> I see my parents tonight. >> I wish we could help you >> but we don't speak Bavarian. But we do like to drink the beer though. It's the fifth season but a lot of great stuff here in Germany. Dave, you guys, I want to get your thoughts on something. I wanted to get you, just 'cause you're both, you're like an analyst, Christoph as well. I know you're over at SAP but, you know, you have such great industry expertise and Dave obviously covers the stuff everyday. I just think that the data world is so undervalued, in my mind. I think the ecosystem of startups that are coming out in the, out of the open source ecosystems, which are well-defined, by the way, and getting better. But now you have startups doing things like VIMTEC, we just had a bank on. Startups creating value and things like block chain on the horizon. Other new paradigms are coming on, is going to change the landscape of how wealth is created and value is created and charged. So, you've got a whole new tsunami of change. What's your thoughts on how this expands and obviously, certainly, Hortonworks as a public company and Cloudera is going public, so you expect to see that level up in valuation. >> They're in the process, yes. >> But I still think they're both undervalued. Your thoughts. >> Well it's not just the platform, right? and that what, I think, where Hadoop also came from. The legacy of Hadoop is that you don't have to really think about how you want to use your data. You have to, don't think ahead what kind of schema you want to apply and how you want to correlate your data. You can create a large data lake, right? That's the term that was created a long time ago, that allows customers to just collect all that data and think in the second stage about what to use with it and how to correlate it. And that's exactly, now, we're also seeing in the third stage, to not just create analytics but also creating applications instead of analytics or on top of analytics, correlating with data that also drives the business, the core business, from an OLTP perspective or also from an OLAP perspective. >> I mean, Dave, you were the one who said Amazon's a trillion dollar TAM, will be the first trillion dollar company and you were kind of, but you looked at the thousand points of Live with Cloud enables, all these aggregated all together, what's your thoughts on valuation of this industry? Because if Hortonworks continues on this peer play and they've got Cloudera coming in and they're doing well, you could argue that they're both undervalued companies if you count the ecosystem. >> Well, we always knew that big data was going to be a heavy lift, right? And I would agree with what Christoph was saying, was that Hadoop is profound in that it was no schema on right and ship five magabytes of code to a pedabyte of data. But it was hard to get that right. And I remember something you said, John, at one of our early SAP Sapphires, When the big data meme was just coming through. You said, "You know, SAP is not just big data, it's fast data". And you were talking about bringing transaction and analytic data together. >> John: Right. >> Again, something that has only recently been enabled. And you think about, you know, continuous streaming. I think that, now, big data has sort of entered the young-adulthood phase, we're going to start seeing steep part of that S-curve returns, and I think the hype will be realized. I think it is undervalued, much like the internet was. It was overvalued, then nobody wanted to touch it, and then it became. Actually, if you think back to 1999, the internet was undervalued in terms of what it actually achieved. >> John: Yeah. >> I think the same or similar thing is going to happen with big data. And since we have an SAP guest on, I'll say as well, We all remember the early days of ERP. >> Mhm, oh yeah. >> It wasn't clear >> Nope. >> Who was going to emerge as the king. >> Right. >> There were a few solutions. You're right. >> That's right. And, as well, something else we said about big data, it was the practitioners of ERP that made the most money, that created the most value and the same thing is happening here. >> Yeah. In fact, on that topic, I believe that 2017 and 2018 will be the big years for big data, so to speak. >> John: Uh huh. >> In fact, because of some statistics. >> John: In what way? >> Well, we just did >> Adoption, S-curve? >> Right, exactly. Utilizing the value of big data. You're talking about valuation here, right? 75% of CEOs of the top 1000 believe that the next three years are more important to their business than the last 50. And so that tells me that they're willing to invest. Not just the financial market, where I believe really run the most sophisticated big data analytics and models today. They had real use cases with real results very quickly. And so, they showed many how it's done. They created sort of the new role of a data scientist. They have roles like an AML officer. It's a real job, they do nothing else but anti-money laundering, right? So, in that industry they've shown us how to do that and I think others will follow. >> Yeah, and I think that when you look at this whole thing about digital transformation, it's all about data. >> John: Yeah. >> I mean, if you're serious about digital transformation, you must become a data-driven company and you have to hop on that curb. Even if you're talking to the, you know, bank today who got on in 2014, which was relatively late, but the pace at which they're advancing is astronomical. >> John: Yeah. >> I don't remember his name, a British mathematician, created, about 11 years already, that according to the phrase "Data is the new oil". >> John: Mhm. >> And I think it's very true because crude oil, in its original form, you also can't use it. >> John: It has to be refined. >> Right, exactly. It has to be refined to actually use it and use the value of it. Same thing with data. You have to distill it, you have to correlate it, you have to align it, you have to relate it to business transactions so the business really can take advantage of it. >> And then we're seeing, you know, to your point, you've got, I don't know, a list of big data companies that are now in public is growing. It's still small, not much profit. >> I mean, I just think, and this is while I'm getting your reaction, I mean, I'm just reading right now some news popping on my dashboard. Google just released some benchmarks on the TPU, the transistor processing unit, >> Dave: Right. >> Basically a chip dedicated to machine learning. >> Yep. >> You know, so, you're going to start to see some abstraction layers develop, whether it's a hardened-top processor hardware, you guys have certainly done innovation on the analytic side, we've seen that with some of the specialty apps. Just to make things go faster. I mean, so, more and more action is coming, so I would agree that this S-curve is coming. But the game might shift. I mean, this is not an easy, clear path. There's bets being made in big data and there's potential for huge money shift, of value. >> See, one of the things I see, and we talked to Hortonworks about this, the new president, you know, betting all on open source. I happen to think a hybrid model is going to win. I think the rich get richer here. SAP, IBM, even Oracle, you know, they can play the open source game and say, "Hey, we're going to contribute to open source, we're going to participate, we're going to utilize open source, but we're also going to put the imprimatur of our install base, our business model, our trusted brands behind so-called big data." We don't really use that term as much anymore. It's the confluence of not only the technology but the companies who, what'd you say, 75% of the world's transactions run though SAP at some point? >> Christoph: Yeah. >> With companies like SAP behind it, and others, that's when this thing, I think, really takes off. >> What I think a lot of people don't realize, and I've been a customer, also, for a long time before I joined the vendor side, and what is under-realized is the aspect of risk management. Once you have a system and once you have business processes digitized and they run your business, you can't introduce radical changes overnight as quickly anymore as you'd like or your business would like. So, risk management is really very important to companies. That's why you see innovation within organizations not necessarily come from the core digitization organization within their enterprise, it often happens on the outside, within different business units that are closer to the product or to the customer or something. >> Something else that's happening, too, that I wanted to address is this notion of digitization, which is all about data, allows companies to jump industries. You're seeing it everywhere, you're seeing Amazon getting into content, Apple getting into financial services. You know, there's this premise out there that Uber isn't about taxicabs, it's about logistics. >> John: Yeah. >> And so you're seeing these born-digital, born in the cloud companies now being able to have massive impacts across different industries. Huge disruption creates, you know, great opportunities, in my view. >> Christoph: Yeah. >> David: What do you think? >> I mean, I just think that the disruption is going to be brutal, and I want to, I'm trying to synthesize what's happening in this show, and you know, you're going to squint through all the announcements and the products, really an upgrade to 2.6, a new data platform. But here in Europe the IOT thing just, to me, is a catalyst point because it's really a proof point to where the value is today. >> David: Mhm. >> That people can actually look at and say, "This is going to have an impact on our business tier digitization point" and I think IOT is pulling the big data industry and cloud together. And I think machine learning and things that come over the top on it are only going to make it go faster. And so that intersection point, where the AI, augmented intelligence, is going to come in, I think that's where you're going to start to see real proof points on value proposition of data. I mean, right now it's all kind of an inner circle game. "Oh yeah, got to get the insights, optimize this process here and there" and so there's some low hanging fruit, but the big shifting, mind blowing, CEO changing strategies will come from some bigger moves. >> To that point, actually, two things I want to mention that SAP does in that space, specifically, right? Startups, we have a program actually, SAP.io, that Bill McDermont also recently introduced again, where we invest in startups in this space to help foster innovation faster, right? And also connecting that with our customers. >> John: What is it called? >> SAP.io Something to look out for. And on the topic of IOT, we made, also, an announcement at the beginning of the year, Project Leonardo. >> Yeah. >> It's a commitment, it's a solution set, and it's also an investment strategy, right? We're committed in this market to invest, to create solutions, we have solutions already in the cloud and also in primus. There are a few companies we also purchased in conjunction with Loeonardo, RT specifically. Some of our customers in the manufacturing space, very strong opportunity for IOT, sensor collection, creating SLAs for robotics on the manufacturing floor. For example, we have a complete solution set to make that possible and realize that for our customers and that's exactly a perfect example where these sensor applications in IOT, edge, compute rich environments come together also with a core where, then, a system of references like machine points, for example, matter because if you manage the SLA for a machine, for example, you just not only monitor it, you want to also automatically trigger the replacement of a part, for example, and that's why you need an SAP component, as well. So, in that space, we're heavily investing, as well. >> The other think I want to say about IOT is, I see it, I mean, cloud and big data have totally disrupted the IT business. You've seen Dell buying EMC, HP had to get out of the cloud business, Oracle pivoted to the cloud, SAP obviously, going hard after the cloud. Very, very disruptive, those two trends. I see IOT as not necessarily disruptive. I see those who have the install base as adopting IOT and doing very, very well. I think it's maybe disruptive to the economy at large, but I think existing companies like GE, like Siemens, like Dimar, are going to do very, very well as a result of IOT. I mean, to the extent they embrace digitization, which they would be crazy not to. >> Alright guys, final thoughts. What's your walkaway from this show? Dave, we'll start with you. >> I was going to say, you know, Hadoop has definitely not failed, in my mind, I think it's been wildly successful. It is entering this new phase that I call sort of young-adulthood and I think it's, we know it's gone mainstream into the enterprise, now it's about, okay, how do I really drive the value of data, as we've been discussing, and hit that steep part of the S-curve. Which, I agree, it's going to be within the next two years, you're going to start to see massive returns. And I think this industry is going to be realized, looked back, it was undervalued in 2017. >> Remember how long it took to align on TCP/IP? (laughter) >> Walk away, I mean interoperability was key with TCP/IP. >> Christoph: Yeah. One of the things that made things happen. >> I remember talking about it. (laughter) >> Yeah, two megabits per second. Yeah, but I mean, bringing back that, what's your walkaway? Because is it a unification opportunity? Is it more of an ecosystem? >> A good friend of mine, also at SAP on the West Coast, Andreas Walter, he shared an observation that he saw in another presentation years ago. It was suits versus hoodies. Different kind of way to run your IT shop, right? Top-down structure, waterfall projects, and suits, open source, hack it, quickly done, you know, get in, walk away, make money. >> Whoa, whoa, whoa, the suits were the waterfall, hoodies was the agile. >> Christoph: That's correct. >> Alright, alright, okay. >> Christoph: Correct. So, I think that it's not just the technology that's coming together, it's mindsets that are coming together. And I think organizationally for companies, that's the bigger challenge, actually. Because one is very subscribed, change control oriented, risk management aware. The other is very progressive, innovative, fast adopters. That these two can't bring those together, I think that's the real challenge in organizations. >> John: Mhm, yeah. >> Not the technology. And on that topic, we have a lot of very intelligent questions, very good conversations, deep conversations here with the audience at this event here in Munich. >> Dave, my walkaway was interesting because I had some preconceived notions coming in. Obviously, we were prepared to talk about, and because we saw the S1 File by Cloudera, you're starting to see the level of transparency relative to the business model. One's worth one billion dollars in private value, and then Hortonworks pushing only 2700 million in a public market, which I would agree with you is undervalued, vis a vis what's going on. So obviously, you're going to see my observation coming in from here is that I think that's going to be a haircut for Cloudera. The question is how much value will be chopped down off Cloudera, versus how much value of Hortonworks will go up. So the question is, does Cloudera plummit, or does Cloudera get a little bit of a haircut or stay and Hortonworks rises? Either way, the equilibrium in the industry will be established. The other option would be >> Dave: I think the former and the numbers are ugly, let's not sugarcoat it. And so that's got to change in order for this prediction that we're making. >> John: Former being the haircut? >> Yeah, the haircut's going to happen, I think. But the numbers are really ugly. >> But I think the question is how far does it drop and how much of that is venture. >> Sure. >> Venture, arbitrage, or just how they are capitalized but Hortonworks could roll up. >> But my point is that those numbers have to change and get better in order for our prediction to come true. Okay, so, but in your second talk, sorry to interrupt you but >> No, I like a debate and I want to know where that line is. We'll be watching. >> Dave: Yeah. >> But the value in, I think you guys are pointing out but I walk away, is IOT is bigger here, and I already said that, but I think the S-curve is, you're right on. I think you're going to start to see real, fast product development around incorporating data, whether that's a Hortonworks model, which seems to be the nice unifying, partner-oriented one, that's going to start seeing specialized hardware that people are going to start building chips for using flash or other things, and optimizing hard complexities. You pointed that out on the intro yesterday. And putting real product value on the table. I think the cards are going to start hitting the table in ecosystem, and what I'm seeing is that happening now. So, I think just an overall healthy ecosystem. >> Without a doubt. >> Okay. >> Great. >> Any final comments? >> Let's have a beer. >> Great to see you in Munich. (laughter) >> We'll have a beer, we had a pig knuckle last night, Dave. We had some sauerkraut. >> Christoph: (speaks foreign word) >> Yeah, we had the (speaks foreign word). Dave, we'll grab the beer, thanks. Good to be with you again. Thanks to the crew, thanks to everyone watching. >> Thanks, John. >> The CUBE, signing off from Munich, Germany for DataWorks 2017. Thanks for watching, see ya next time. (soft techno music)

Published Date : Apr 7 2017

SUMMARY :

Brought to you by Heartenworks. @sapbigdata is the Twitter handle. Go Live or go home. Welcome to theCUBE. at the speed of business, which is SAP, Christoph: We know a thing or two most of the Fortune 100, about 75% of the world GDP around the world for new And you were born in Munich. Bavarian, all the time. like block chain on the horizon. But I still think in the third stage, to I mean, Dave, you were the one who said And I remember something you said, John, the internet was undervalued in terms is going to happen with big data. There were a few solutions. that created the most value big data, so to speak. of some statistics. that the next three Yeah, and I think that when and you have to hop on that curb. that according to the phrase And I think it's very You have to distill it, you know, to your point, on the TPU, the transistor to machine learning. on the analytic side, we've seen that but the companies who, what'd you say, that's when this thing, I often happens on the outside, allows companies to jump industries. born in the cloud companies now being able that the disruption that come over the top on it to help foster innovation faster, right? And on the topic of IOT, we made, also, in the cloud and also in primus. I mean, to the extent Dave, we'll start with you. and hit that steep part of the S-curve. interoperability was key with TCP/IP. One of the things that made things happen. I remember talking about it. Is it more of an ecosystem? also at SAP on the West Coast, were the waterfall, hoodies was the agile. not just the technology And on that topic, we have a lot coming in from here is that I think and the numbers are ugly, But the numbers are really ugly. and how much of that is venture. but Hortonworks could roll up. sorry to interrupt you but and I want to know where that line is. that people are going to Great to see you in Munich. We'll have a beer, we had a Good to be with you again. Thanks for watching, see ya next time.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Christoph Schubert	PERSON	0.99+
Christoph	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Siemens	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
GE	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Andreas Walter	PERSON	0.99+
2014	DATE	0.99+
Europe	LOCATION	0.99+
Munich	LOCATION	0.99+
2017	DATE	0.99+
David	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
1999	DATE	0.99+
HP	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
75%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Uber	ORGANIZATION	0.99+
Dimar	ORGANIZATION	0.99+
Christoph Streubert	PERSON	0.99+
2018	DATE	0.99+
Bill McDermont	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
third stage	QUANTITY	0.99+
first trillion dollar	QUANTITY	0.99+
second stage	QUANTITY	0.99+
one billion dollars	QUANTITY	0.99+
two	QUANTITY	0.99+
SAP	ORGANIZATION	0.99+
second talk	QUANTITY	0.99+
yesterday	DATE	0.99+
Munich, Germany	LOCATION	0.99+
DataWorks Summit	EVENT	0.99+
SAP Big Data	ORGANIZATION	0.98+
both	QUANTITY	0.98+
fifth season	QUANTITY	0.98+
Bavarian	OTHER	0.98+
One	QUANTITY	0.98+

Nadeem Gulzar | DataWorks Summit Europe 2017

>> Announcer: Live from Munich, Germany, it's the CUBE, covering DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hey welcome back everyone. We're here live in Munich Germany for DataWorks 2017 Summit, formerly know as Hadoop Summit, now called DataWorks. I'm John Furrier with the CUBE, my co-host Dave Vellante, here for two days of wall-to-wall coverage. Our next guest is Nadeem Gulzar, head of advanced Analytics at Danske Bank. Welcome to the CUBE. >> Thank you. >> You're a customer but also talking here at the event, bringing all your folks here. Your observation, I mean, Hadoop is not going away, certainly we see that. But now, as John Kreisa, who was MC'ing, was on earlier said, open up the aperture to analytics, is really where the action is. >> Nadeem: Absolutely. >> Your reaction to that. >> I completely agree, because again, Hadoop is basically just the basic infrastructure, right. Components build on components, and things like that. But, when you really utilize it, is when you add the advanced analytics frameworks. There are many out there. I'm not going to favor one over another. But the main thing is, you need that to really leverage Hadoop. And, at the same time, I think it's very important to realize how much power there actually is in this. For us at, in Danske Bank, getting Hadoop, getting the advanced analytics framework, has really proven quite a lot. It allowed us actually to dig into our core data, transaction data for instance, which we haven't been able to for decades. >> So take me through, because you guys are an interesting use case because you're advanced. You're gettin' at the data, which is cutting edge. But you're going through this transformation, and you have to because you're on the front lines. Take us inside the company, without giving away any trade secrets, and describe the environment. What's the current situation, and how is it evolving from an IT standpoint, and also from the relationship with the stakekholders in the business side. >> So again, we are a bank with 20,000 employees, so of course in a large organization you have silos, People feeling okay, this is my domain, this is my kingdom, don't touch it. Don't approach me, or you can approach me, talk to me, you have to convince me, otherwise don't talk to me at all. So we get that quite a lot, and to be honest, from my point of view, if we do not lift as a bank, we're not going to succeed. If I have success, if my organization of almost 60 people have success, that's good in itself, but we are not going to succeed as a bank. So for me, it's quite important that I go down and break down these barriers, and allow us to come in, tell the business units, tell them what sort of capabilities do we bring, and include them. That is actually the main key. I don't want to replace them or anything like that. >> So an organizational challenge is to get the mindset shifted. How 'about process gaps and product gaps? 'Cause I mean I almost see the sequence, kind of a group hug if you will, organizational mindset, kind of a reset or calibration. And then identify processes and then product gaps, seem to be the next transition. >> Absolutely, absolutely, and there are some gaps. Still, even though we have been on this journey for a considerable amount of time, there are still gaps, both in terms of processes and products. Because again, even though we have top management buy in, it doesn't go through all the way down to the middle layer. So we still struggle with this from time to time. >> How do you break down those barriers? What do you do, what's your strategy? >> I'm humble, to be honest. I go in, I tell them, listen you guys I have some capabilities that I can add to your capabilities. I want you to leverage me to make your life easier. I want to lift you as an organization. I don't care about myself, I want you to be better at what you're doing. >> So Nadeem, the money business and the technology business have always had a close relationship. It was like in 2010 after we came out of the downturn, it was like this other massive collision. You had begun experimenting with Cloud, the shift, CapEx to OpEx. The data thing hit in a big way, obviously mobile became real. So talk about the confluence of those technologies, specifically in the context of your big data journey. Where did you get started, and how did it evolve? >> So actually it fit in quite nicely because we were coming out of this down period, right, so there was extreme amount of focus on cost. So, of course at the time where we wanted to go into this journey, a lot of people were asking, okay how much does this cost, what's the big strategy, and so on. And how's the road map going to look like, and what's the cost of the road map? The thing is, if you buy some off the shelf commercial product, it's quite expensive. We can easily talk like half a billion, something like that, for a full end to end system. So with this, you were allowed, or we were allowed, to start up with relatively small funding, and I'm actually talking about just like a million dollars, roughly. And that actually allowed us a substantial boost in the capability department, in allowing us to show what kind of use cases we could build, and what kind of value we could bring to Danske Bank. >> So you started with understanding Hadoop? Is that right, was that the starting point? >> Yes, in a fairly small, very researched team set up. We did the initial research, we looked at, okay what could this bring? We did some initial, what we call, proof of value. So small, small, pilot projects, looking at, okay this is the data. We can leverage it in this way, this is the value we can bring. How much can we actually boost the business? So everything is directly linked to business value. So, for instance, one of the use cases was within customers, understanding customer behavior, directly linking it to marketing, do more targeted marketing, and at the end get more results in terms of increased sales. >> We just started a journey 2009, 2010, is that right? Or was it later? >> No, we started somewhat later. The initial research was in '14. >> In '14? Okay, alright, so '14 you sort of became familiar with Hadoop, and then I imagine, like many customers, you said okay, wow this stuff is complicated, but you were takin' it in small chunks, low risk. Let's get some value. Marketing is an obvious use case. I would imagine fraud is another obvious use case. So then, how did that evolve? I mean it's only a few years now, but I imagine you've evolved very quickly. >> Extremely quickly. Actually, within two months of the research, we actually saw a huge benefit in this area, and directly we went with the material to the senior members of the different boards we wanted to affect, and actually, you could call it luck. But, maybe we were just well prepared and convincing, so we actually directly got funding at that point in time. They said, listen, this is very promising. Here you go, start off with the initial, slightly larger projects, prove some value, and then come back to us. Initially they wanted us to do two things, look into the customer journey, or doing deeper customer behavior analytics, and the second was within risk. Doing things like, text mining, financial statements, getting some deeper into that, doing some web crawling on financial data such as Bloomberg, etcetera, and then pull it into the system. >> To inform your investments as a financial institution. From an architecture and infrastructure standpoint, we talked about starting at Hadoop. Has it evolved, how has it evolved? Where do you see it going? >> It has evolved quite a lot in the past couple of years. And again, to be honest, it's like every quarter something new is happening and we need to do some adjustments even to the core architecture. And with the introduction of HDB 3 hence later this year, I think we're going to see a massive change once again. Hortonworks already calls it a major change, or a major release. But actually, the things they are doing is extremely promising, so we want to take that step with them. But again, it's going to affect us. >> What's exciting about that to you? >> The thing that's very exciting is, we are now at like a balance point, where we have played quite a lot, we have released a couple of production grade solutions, but we have really not reached the full enterprise potential. So getting like into the real deep stuff with living under heavy SLA's, regulation stuff. All these kind of things is not in place yet, from my point of view. >> We talk a lot about, in the CUBE, and in our company, about these emergent work loads; you had batch, interactive, and the world went back to batch with Hadoop, and now you have this continuous workload, this streaming real-time workloads. How is that affecting your organization, generally, and specifically, you're thinking about architecture. How real is that and where do you see that in the future? >> It's the core, to be honest. Again, one of the main things we are trying to do is look into, so, gone are the days with heavy, heavy batches of data coming in. Because if you look at Weblocks for instance, so when customers interacts with our web, or our tablet solution, or mobile solution, the amount of data generated is humongous. So, no way on earth you can think about batches anymore. So it's more about streaming the data all the way in, doing real time analytics and then produce results. >> What would you say are your biggest, big data challenges, problems that you really want to attack and solve. >> So, what I really want to attack is, getting all sorts of data into the system. So, you can imagine, as a bank we have 2,000 plus systems. We have approximately 4,000 different points that delivers data. So getting all that mass into our data link, it's a huge task. We actually underestimated it. But now, we have seen we have to attack it and get it in because that is the gold. Data is the future gold. So we need to mine it in, we need to do analytics on top of it and produce value. >> And then once you get it in there, I'm sure you're anticipating that you want to make sure this doesn't go stale, doesn't become a swamp, doesn't get frozen. It's your job to talk about data oceans, which is really the long term vision I presume, right? >> And that is a key as well because with the GDPR for instance, we need to have full mapping and full control of all the data coming in. We need to be able to generate metadata, we need to have full data lineage. We need to know what, all the data where it came from, how it's interconnected, relations, all that. >> And that's what, two years away from implementation? Is that about right? >> It's going to take a while, of course. But again, the key thing is we make the framework so all the data coming in step by step, has that. >> Yeah, but so GDPR though, it goes into effect in '19, is that correct? >> It's actually May '18. >> May '18, oh, so it's much tighter time frame then I realized. >> John: You're under the gun. >> Nadeem: Yes. >> Okay, observation here at this event, obviously a lot of IOT, for you that's people. People and things are kind of the edge of the network. The intelligent edge is a big, big topic. Very dynamic. >> Nadeem: Extremely dynamic. >> A lot of things happening. Lot of opportunities for you to be this humble service provider to your constituents, but also your customers. How do you guys view that? What's the current landscape look like as you look outside the company and look at what's happening around you, the world. >> A lot of cool things are going on, to be honest. Especially in IOT, right? I mean, even though we are a core bank, still, there are a lot of sensors we can use. I talked a bit about, under the keynote, about ATM's, right? So, we're also looking at how can we utilize this technology? How can we enable our customers? If you look at our apps, they also generate extreme amounts of data, right? The mobile solution that we have, it gives away GPS location and things like that. And we want to include all that data in. At the end of the day, it's not for our gain, we are not always looking at making the next buck, right? It's also about being there for the customer, providing the services they need, making their banking life easier. >> And your ecosystem is evolving and rapidly adding new constituents to your network because, then you have the consumer with the phone, the mobile app alone, never mind the point of sale opportunity at the ATM. Now a digital, augmented reality experience could be enabled where you now have fintech suppliers, and potentially other suppliers in this now digital network that could be relational with you. >> Yes, and our job is to make sure that we leverage that. Acquiring a banking license is extremely difficult. But we have it, and what we need to do is to engage these fintechs, partners, even other banks, and say listen guys we invite you in. Utilize our services, utilize our framework, utilize our foundation and let's build something upon that. >> If you had to explain, Nadeem, this fintech start up trend because it is super hot, what is it? I mean how would you describe to someone who's not in the banking world. 'Cause most people would be scratching their head and say, isn't that banking? But, now this ecosystem is developing of new entrepreneurial activity and they're skyrocketing with success 'cause they have either a specialty focus, they do something extremely well. It may or may not be in a direct big space with a bank, but a white space. Use cases. So, is it good? Is it bad? Is it hype? What's the current state of the fintech situation? >> From my point of view, it's awesome. And the reason is, these guys are pushing us. Remember, we are a hundred fifty plus year old bank. And sometimes we do tend to just pat on our back and say, okay, this is going good, right? But, these guys are coming in, giving some competition, and we love it. >> Give me an example of a fintech capabilities. Randomly bring up some examples to highlight what fintech is. >> So what we've seen in, for instance the German market, is the fintechs coming in, utilizing some of the customer data, and then producing awesome new applications. Whether it is a new net bank, where a customer can interact with it, in a much, much more smoother way. Some of the banks tend to over clutter things, not make it simple. So things like, where you can put in, you can look at your transactions in a Google Map, for instance. You can see how much do you spend at this location. You can move around. >> You could literally follow the money, on a map. (laughing) >> So this is your home base, you go out here, you spend this amount of money, and maybe even add more on it. So, let's say you do your grocery shopping over here, but if I moved all my business from this company to this company, how much could I save? Imagine if you could just drag and drop it and see, okay, I could actually save a couple of thousand bucks, awesome. >> And machine learning is going to totally change the game with Augmented Intelligence. AI is called Artificial Intelligence, or Augmented Intelligence, depending upon your definition. This is a good thing for consumers. >> It is, it is. >> And thinking about disruption, what do you guys, what are your thoughts on blockchain? What is your research showing? You playing around with Hyperledger at all? >> Yes we are. And blockchain, it's also quite interesting. We're doing lots of research on that. What's it's shown actually is that this is a technology that we can also use. And we can also really utilize, even the security aspects of it. If you just take that, you could really implement that. >> The identity aspect, it's federating identity around fraud, another area you can innovate on. I'm bullish on blockchain, a lot of people are skeptical, but Dave knows I really, I love blockchain. Because it's not about Bitcoin per se, it's sort of the underlying opportunity. It just seems fascinating. Dave you know, I got to get on my soapbox, blockchain soapbox. >> We've never really looked at Bitcoin as just a currency, it's move of a technology platform, and I have always been fascinated with the security angle. Virtually unhackable, put that in quotes. No need for a third party to intermediate. So many positive fundamentals, now it's guys like you figuring out, okay the practitioner saying, here's how we're going to implement it and commercialize it. >> And actually it fits in quite well with things like GDPR. This is also about opening up, the same with PSD 2. Exposing the customer data, making it available for the general public. And ultimately the goal is, so you as a consumer, me as a consumer, we own our data. >> Nadeem, thank you so much for coming on the CUBE and sharing your practitioner situation, and your advice, as well as commentary. I'll give you the last word. As you and your team embark from DataWorks 2017 and head back to the ranch, so to speak, and bring back some stuff. What are you going to work on? What's the to do item? What are you going to sharpen the saw on and cut when you get back? >> So for us on the very, very short term, it's about taking our platform and our capabilities and move it into the real enterprise world. That is our first key milestone that we are going to go for. And, I'll tell you, we're going to go all in for that. Because, unless we do that, we're not able to really attack the core of banking, which requires this, right? Please remember that a consumer doing a transaction somewhere in the world, he cannot stand and wait for ages for something to be processed. It needs to be instantaneous. So, this is what we need to do. >> You think this event, you're armed up with product. >> Absolutely, absolutely. Lots of good insight we've gotten from this. Lots of potential, lots of networking guys and other companies that we can talk to about this. >> Also great recruiting, get some developers out there too, lot of great people. Congratulations on your success and thanks for sharing this great insight here on the CUBE, exposing the data to you live on the CUBE. Silicon Angle dot TV, I'm John Furrier, with Dave Vellante my co-host, more great coverage stay with us here live in Munich, Germany for DataWorks 2017 Summit. We'll be right back.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Welcome to the CUBE. You're a customer but also talking here at the event, is when you add the advanced analytics frameworks. and you have to because you're on the front lines. So again, we are a bank with 20,000 employees, kind of a group hug if you will, So we still struggle with this from time to time. I want you to leverage me to make your life easier. the shift, CapEx to OpEx. And how's the road map going to look like, We did the initial research, we looked at, No, we started somewhat later. so '14 you sort of became familiar with Hadoop, and directly we went with the material Where do you see it going? and we need to do some adjustments So getting like into the real deep stuff and now you have this continuous workload, Again, one of the main things we are trying to do What would you say are your biggest, and get it in because that is the gold. And then once you get it in there, of all the data coming in. But again, the key thing is we make the framework so it's much tighter time frame then I realized. obviously a lot of IOT, for you that's people. Lot of opportunities for you A lot of cool things are going on, to be honest. then you have the consumer with the phone, and say listen guys we invite you in. I mean how would you describe to someone and we love it. Give me an example of a fintech capabilities. Some of the banks tend to over clutter things, You could literally follow the money, on a map. So, let's say you do your grocery shopping over here, And machine learning is going to totally change the game that we can also use. Dave you know, I got to get on my soapbox, and I have always been fascinated with the security angle. so you as a consumer, me as a consumer, we own our data. and cut when you get back? That is our first key milestone that we are going to go for. that we can talk to about this. exposing the data to you live on the CUBE.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Danske Bank	ORGANIZATION	0.99+
Nadeem	PERSON	0.99+
John Kreisa	PERSON	0.99+
Nadeem Gulzar	PERSON	0.99+
Dave	PERSON	0.99+
May '18	DATE	0.99+
2009	DATE	0.99+
2010	DATE	0.99+
John Furrier	PERSON	0.99+
Bloomberg	ORGANIZATION	0.99+
20,000 employees	QUANTITY	0.99+
two days	QUANTITY	0.99+
half a billion	QUANTITY	0.99+
two years	QUANTITY	0.99+
two months	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
'19	DATE	0.99+
'14	DATE	0.99+
CUBE	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
Google Map	TITLE	0.99+
Munich, Germany	LOCATION	0.99+
both	QUANTITY	0.98+
DataWorks 2017 Summit	EVENT	0.98+
GDPR	TITLE	0.98+
Hadoop	TITLE	0.98+
DataWorks	EVENT	0.98+
Munich Germany	LOCATION	0.98+
PSD 2	TITLE	0.98+
first key milestone	QUANTITY	0.98+
second	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
Hadoop Summit	EVENT	0.97+
one	QUANTITY	0.97+
almost 60 people	QUANTITY	0.95+
Hadoop	ORGANIZATION	0.95+
later this year	DATE	0.94+
approximately 4,000 different points	QUANTITY	0.94+
2,000 plus systems	QUANTITY	0.93+
a hundred fifty plus year old	QUANTITY	0.93+
Silicon Angle dot TV	ORGANIZATION	0.93+
2017	EVENT	0.93+
a couple of thousand bucks	QUANTITY	0.87+
DataWorks Summit Europe 2017	EVENT	0.85+
decades	QUANTITY	0.85+
German	LOCATION	0.84+
OpEx	ORGANIZATION	0.82+
past couple of years	DATE	0.76+
a million dollars	QUANTITY	0.76+
earth	LOCATION	0.75+
CapEx	ORGANIZATION	0.74+
Hyperledger	ORGANIZATION	0.71+
2017	DATE	0.7+
3	COMMERCIAL_ITEM	0.68+
Bitcoin	OTHER	0.64+
SLA	TITLE	0.64+
Europe	LOCATION	0.59+
Weblocks	ORGANIZATION	0.59+
HDB	TITLE	0.58+
years	QUANTITY	0.57+
DataWorks	TITLE	0.49+
Cloud	TITLE	0.44+
CUBE	TITLE	0.41+

Gianthomas Volpe & Bertrand Cariou | DataWorks Summit Europe 2017

(upbeat music) >> Announcer: Live from Munich, Germany, it's the Cube covering DataWorks Summit Europe, 2017. Brought to you by Hortonworks. >> Hey, welcome back everyone. We're here live in Munich, Germany, at the DataWorks 2017 Summit. I'm John Furrier, my co-host Dave Vellante with the Cube, and our next two guests are Gianthomas Volpe, head of customer development e-media for Alation. Welcome to the Cube. And we have Bertrand Cariou, who's the director of solution marketing at Trifecta with partners. Guys, welcome to the Cube. >> Thank you. >> Thank you for having us. >> Big fans of both your start-ups and growing. You guys are doing great. We had your CEO on our big data SV, Joe Hellerstein, he talked about the rang, all the cool stuff that's going on, and Alation, we know Stephanie has been on many times, but you guys are start ups that are doing very well and growing in this ecosystem, and, you know, everyone's going public. Cloud Air has filed their S1, great news for those guys, so the data world has changed beyond Hadoop. You're seeing it, obviously Hadoop is not dead, but it's still going to be a critical component of a larger ecosystem that's developing. You guys are part of that. So I want to get your thoughts of why you're here in Europe, okay? And how you guys are working together to take data to the next level, because, you know, we're hearing more and more data is a foundational conversation starter, because now there's other things happening, IOT, business analysts, you guys are in the heart of it. Your thoughts? >> You know, going to be you. >> All in, yeah, sure. So definitely at Alation what we're seeing is more and more people across the organization want to get access to the data, and we're kind of breaking out of the traditional roles around IP managing both metadata, data preparation, like Trifecta's focused on. So we're pretty squarely focused on how do we bring that access to a wider range of people? How do we enable that social and collaborative approach to working with that data, whether it's in a data lake so, or here at DataWorks. So clearly that's one of the main topics. But also other data sources within the organization. >> So you're freeing the data up and the whole collaboration thing is more of, okay, don't just look at IT as this black box of give me some data and now spit out some data at me. Maybe that's the old way. The new way is okay, all of the data's out there, they're doing their thing, but the collaboration is for the user to get into that data you know, ingestion. Playing with the data, using the data, shaping the data. Developing with the data. Whatever they're doing, right? >> It's just bringing transparency to not only what IT is doing and making that accessible to users, but also helping users collaborate across different silos within an organization, so. We look at things like logs to understand who is doing what with the data, so if I'm working in one group, I can find out that somebody in a completely different group in the organization is working with similar data, bringing new techniques to their analysis, and can start leveraging that and have a conversation that others can learn from, too. >> So basically it's like a discovery platform for saying hey, you know, Mary in department X has got these models. I can leverage that. Is that kind of what you guys are all about? >> Yeah, definitely. And breaking through that, enabling communication across the different levels of the organization, and teaching other people at all different levels of maturity within the company, how they can start interacting with data and giving them the tools to up skill throughout that process. >> Bertrand, how about the Trifecta? 'Cause one of the things that I find exciting about Europe value proposition and talking to Joe, the founder, besides the fact that they all have GitHub on their about page, which is the coolest thing ever, 'cause they're all developers. But the more reality is is that a business person or person dealing with data in some part of a geography, could be whether it's in Europe or in the US, might have a completely different view and interest in data than someone in another area. It could be sales data, could be retail data, it doesn't matter but it's never going to be the same schema. So the issue is, got to take that away from the user complexity. That is really fundamental change. >> Yeah. You're totally correct. So information is there, it is available. Alation helps identify what is the right information that can be used, so if I'm in marketing, I could reuse sales information, associating maybe with web logs information. Alation will give me the opportunity to know what information is available and if I can trust it. If someone in finance is using that information, I can trust that data. So now as a user, I want to take that data, maybe combine the data, and the data is always a different format, structure, level of quality, and the work of data wrangling is really for the end user, you can be an analyst. Someone in the line of business most of the time, these could be like some of the customers we are here in Germany like Munich Re would be actuaries. Building risk models and or claimed for casting, payment for casting. So they are not technologies at all, but they need to combine these data sets by themselves, and at scale, and the work they're doing, they are producing new information and this information is used directly to their own business, but as soon as they share this information, back to the data lake, Alation will index this information, see how it is used, and put it to this visibility to the other users for reuse as well. >> So you guys have a partnership, or is this more of a standard API kind of thing? >> So we do have a partnership, we have plan development on the road map. It's currently happening. So I think by the end of the quarter, we're going to be delivering a new integration where whether I'm in Alation and looking for data and finding something that I want to work with, I know needs to be prepared I can quickly jump into Trifecta to do that. Or the other way around in Trifecta, if I'm looking for data to prepare, I can open the catalog, quickly find out what exists and how to work with it better. >> So basically the relationship, if I get this right is, you guys pass on your expertise of the data wrangling all the back processes you guys have, and advertise that into Alation. They discover it, make it surfaceable for the social collaboration or the business collaboration. >> Exactly. And when the data is wrangled, it began indexed and so it's a virtual circle where all the data that is traded and combined is exposed to the user to be reused. >> So if I were Chief Data Officer, I'd say okay, there's three sequential things that I need to do, and you can maybe help me with a couple of them. So the first one is I need to understand how data contributes to the monetization of my company, if I'm a public company or a for profit company. That's, I guess my challenge. But then, there are other two things that I need to give people access to that data, and I need quality. So I presume Alation can help me understand what data's available. I can actually, it kind of helps with number one as well because like you said, okay, this is the type of data, this is how the business process works. Feed it. And then the access piece and quality. I guess the quality is really where Trifecta comes in. >> GianThomas: Yes. >> What about that sequential flow that I just described? Is that common? >> Yeah >> In your business, your customer base. >> It's definitely very common. So, kind of going back to the Munich Re examples, since we're here in Munich, they're very focused on providing better services around risk reduction for their customers. Data that can impact that risk can be of all kinds from all different places. You kind of have to think five, ten years ahead of where we are now to see where it might be coming from. So you're going to have a ton of data going in to the data lake. Just because you have a lot of data, that does not mean that people will know how to work with it they won't know that it exists. And especially since the volumes are so high. It doesn't mean that it's all coming in at a greatly usable format. So Alation comes in to play in helping you find not only what exists, by automating that process of extraction but also looking at what data people are actually using. So going back to your point of how do I know what data's driving value for the organization, we can tell you in this schema, this is what's actually being used the most. That's a pretty good starting point to focus in on what is driving value and when you do find something, then you can move over to Trifecta to prepare it and get it ready for analysis. >> So keying on that for a second, so in the example of Munich Re, the value there is my reduction in expected loss. I'm going to reduce my risk, that puts money in my bottom line. Okay, so you can help me with number one, and then take that Munich Re example into Trifecta. >> Yes, so the user will be the same user using Alation and Trifecta. So is an actuary. So as soon as the actuary items you find the data that is the most relevant for what you'll be planning, so the actuaries are working with terms like development triangles over 20 years. And usually it's column by column. So they have to pivot the data row by row. They have to associate that with the paid claims the new claims coming in, so all these information is different format. Then they have to look at maybe weather information, or additional third party information where the level of quality is not well known, so they are bringing data in the lake that is not yet known. And they're combining all this data. The outcome of that work, that helps in the Reese modeling so that could be used by, they could use Sass or our older technology for the risk modeling. But when they've done that modeling and building these new data sets. They're, again, available to the community because Alation would index that information and explain how it is used. The other things that we've seen with our users is there's also a very strong, if you think about insurances banks, farmer companies, there is a lot of regulation. So, as the user, as you are creating new data, said where the data coming from. Where the data is going, how is it used in the company? So we're capturing all that information. Trifecta would have the rules to transform the data, Alation will see the overall eye level picture from table to the source system where the data is come. So super important as well for the team. >> And just one follow up. In that example, the actuary, I know hard core data scientists hate this term, but the actuaries, the citizen data scientist. Is that right? >> The actuaries would know I would say statistics, usually. But you get multiple level of actuaries. You get many actuaries, they're Excel users. They have to prepare data. They have to pin up, structure the data to give it to next actuary that will be doing the pricing model or the next actuary that will risk modeling. >> You guys are hitting on a great formula which is cutting edge, which is why you guys are on the startups. But, Bertrand I want to talk to you about your experience at Informatica. You were the founder the Informatica France. And you're also involved in some product development in the old, I'd say old days, but like. Back in the days when structured data and enterprise data, which was once a hard problem, deal with metadata, deal with search, you had schemes, all kinds of stuff to deal with. It was very difficult. You have expertise. I want you to talk about what's different now in this environment. Because it's still challenging. But now the world has got so much fast data, we got so much new IOT data, especially here in Europe. >> Oh yes. >> Where you have an industrialized focus, certainly Germany, like case in point, but it's pretty smart mobility going on in Europe. You've always had that mobile environment. You've got smart cities. A lot of focus on data. What's the new world like now? How are people dealing with this? What's your perspective? >> Yes, so there's and we all know about the big data and with all this volume, additional volume and new structure of data. And I would say legacy technology can deal as you mentioned, with well structured information. Also you want to give that information to the masses. Because the people who know the data best, are the business people. They know what to do with the data, but the access of this data is pretty complicated. So where Trifecta is really differentiating and has been thinking through that is to say whatever the structure of the data, IOT, Web Logs, Value per J son, XML, that should be for an end user, just metrics. So that's the way you understand the data. The next thing when play with data, usually you don't know what the schema would be at the end. Because you don't know what the outcome is. So, you are, as an end user, you are exploring the data combining data set and the structure is trading as you discover the data. So that is also something new compared to the old model where an end user would go to the data engineer to say I need that information, can you give me that information? And engineers would look at that and say okay. We can access here, what is the schema? There was all this back and forth. >> There was so much friction in the old way, because the creativity of the user is independent now of all that scaffolding and all the wrangling, pre-processing. So I get that piece of the Citizen's Journal, Citizen Analyst. But the key thing here is you were shrecking with the complexity to get the job done. So the question then comes in, because it's interesting, all the theme here at DataWorks Summit in Europe and in the US is all the big transformative conversations are starting with business people. So this a business unit so the front lines if you will, not IT. Although IT now's got to support that. If that's the case, the world's shifting to the business owners. Hence your start up. Is that kind of getting that right? >> I think so. And I think that's also where we're positioning ourselves is you have a data lake, you can put tons of data in it, but if you don't find an easy way to make that accessible to a business user, you're not going to get a value out of it. It's just going to become a storage place. So really, what we've focused on is how do you make that layer easily accessible? How do you share around and bring some of the common business practices to that? And make sure that you're communicating with IT. So IT shouldn't be cast aside, but they should have an ongoing relationship with the business user. >> By the way, I'll point out that Dave knows I'm not really a big fan of the data lake concept mainly because they've turned it into data swamps because IT deploys it, we're done! You know, check the box. But, data's getting stale because it's not being leveraged. You're not impacting the data or making it addressable, or discoverable or even wrangleable. If that's a word. But my point is that's all complexities. >> Yes, so we call it sort of frozen data lake. You build a lake, and then it's frozen and nobody can go fishing. >> You play hockey on it. (laughs) >> You dig and you're fishing. >> And you need to have this collaboration ongoing with the IT people, because they own the infrastructure. They can feed the lake with data with the business. If there is no collaboration, and we've seen that multiple times. Data lake initiatives, and then we come back one year after there is no one using the lake, like one, two person of the processing power, or the data is used. Nobody is going to the lake. So you need to index the data, catalog the data to know what is available. >> And the psychology for IT is important here, and I was talking yesterday with IBM folks, Nevacarti here, but this is important because IT is not necessarily in a position of doing it because doing the frozen lake or data swamp because they want to screw over the business people, they just do their job, but here you're empowering them because you guys are got some tech that's enabling the IT to do a data lake or data environment that allows them to free up the hassles, but more importantly, satisfy the business customer. >> GeanThomas: Exactly. >> There's a lot of tech involved. And certainly we've talked to you guys about that. Talk about that dynamic of the psychology because that's what IT wants. So what's that dev ops mindset for data, data ops if you will or you know, data as code if you will, constantly what we've been calling it but that's now the cloud ethos hits the date ethos. Kind of coming together. >> Yes, I think data catalogs are subtly different in that traditionally they are more of an IT function, but to some extent on the metadata side, where as on the business side, they tended to be a siloed organization of information that business itself kept to maintain very manually. So we've tried to bring that together. All the different parties within this process from the IT side to the govern stewardship all the way down to the analysts and data scientists can get value out of a data catalog that can help each other out throughout that process. So if it's communicating to end users what kind of impact any change IT will make, that makes their life easier, and have one way to communicate that out and see what's going to happen. But also understand what the business is doing for governance or stewardship. You can't really govern or curate if you don't know what exists and what matters to the business itself. So bring those different stages together, helping them help each other is really what Alation does. >> Tell about the prospects that you guys are engaging in from a customer standpoint. What are some of the conversations of those customers you haven't gotten yet together. And and also give an example of a customer that you guys have, and use cases where they've been successful. >> Absolutely. So typically what we see, is that an organization is starting up a data lake or they already have legacy data warehouses. Often it's both, together. And they just need a unified way of making information about those environments available to end users. And they want to have that better relationship. So we're often seeing IT engaged in trying to develop that relationship along with the business. So that's typically how we start and we in the process of deploying, work in to that conversation of now that you know what exists, what you might want to work with, you're often going to have to do some level of preparation or transformation. And that's what makes Trifecta a great fit for us, as a partner, is coming to that next step. >> Yeah, on Mobile Market Share, one of our common customers, we have DNSS, also a common customer, eBay, a common customer. So we've got already multiple customers and so some information about the issue Market Share, they have to deal with their customer information. So the first thing they receive is data, digital information about ads, and so it's really marketing type of data. They have to assess the quality of the data. They have to understand what values and combine the value with their existing data to provide back analytics to their customers. And that use case, we were talking to the business users, my people selling Market Share to their customers because the fastest they can unboard their data, they can qualify the quality of the data the easiest it is to deliver right level of quality analytics. And also to engage more customers. So it was really was to be fast onboarding customer data and deliver analytics. And where Alatia explain is that they can then analyze all the sequel statement that the customers, maybe I'll let you talk about use case, but there's also, it was the same users looking at the same information, so we engage with the business users. >> I wonder if we can talk about the different roles. You hear about the data scientists obviously, the data engineer, there might be a data quality professional involved, there's certainly the application developer. These guys may or may not even be in IT. And then you got a DVA. Then you may have somebody who's statistician. They might sit in the line of business. Am I overcomplicating it? Do larger organizations have these different roles? And how do you help bring them together? >> I'd say that those roles are still influx in the industry. Sometimes they sit on IT's legs, sometimes they sit in the business. I think there's a lot of movement happening it's not a consistent definition of those different roles. So I think it comes down to different functions. Sometimes you find those functions happening within different places in the company. So stewardship and governance may happen on the IT side, it might happen on the business side, and it's almost a maturity scale of how involved the two sides are within that. So we play with all of those different groups so it's sometimes hard to narrow down exactly who it is. But generally it's on the consumptions side whether it's the analyst or data scientists, and there's definitely a crossover between the two groups, moving up towards the governance and stewardship that wants to enable those users or document curing the data for them all the way to the IT data engineers that operationalize a lot of the work that the data scientists and analysts might be hypothesizing and working with in their research. >> And you sell to all of those roles? Who's your primary user constituency, or advocate? >> We sell both to the analytics groups as well as governance and they often merge together. But we tend to talk to all of those constituencies throughout a sales cycle. >> And how prominent in your customer base do you see that the role of the Chief Data Officer? Is it only reconfined within regulated industries? Does he seep into non-regulated industries? >> I'd say for us, it seeps with non-regulated industries. >> What percent of the customers, for instance have, just anecdotally, not even customers, just people that you talk to, have a Chief Data Officer? Formal Chief Data Officer? >> I'd say probably about 60 to 70 percent. >> That high? >> Yeah, same for us. In regulated industries (mumbles). I think they play a role. The real advantage a Chief Data and Analytical Officer, it's data and analytics, and they have to look at governance. Governance could be for regulation, because you have to, you've got governance policy, which data can be combined with which data, there is a lot. And you need to add that. But then, even if you are less regulated, you need to know what data is available, and what data is (mumbles). So you have this requirement as well. We see them a lot. We are more and more powerful, I would say in the enterprise where they are able to collaborate with the business to enable the business. >> Thanks so much for coming on the Cube, I really appreciate it. Congratulations on your partnership. Final word I'll give you guys before we end the segment. Share a story, obviously you guys have a unique partnership, you've been in the business for awhile, breaking into the business with Alation. Hot startups. What observations out there that people should know about that might not be known in this data world. Obviously there's a lot of false premises out there on what the industry may or may not be, but there's a lot of certainly a sea change happening. You see AI, it gives a mental model for people, Eugene Learning, Autonomous Vehicles, Smart Cities, some amazing, kind of magical things going on. But for the basic business out there, they're struggling. And there's a lot of opportunities if they get it right, what thing, observation, data, pattern you're seeing that people should know about that may not be known? It could be something anecdotal or something specific. >> You go first. (laughs) >> So maybe there will be surprising, but like Kaiser is a big customer of us. And you know Kaiser in California in the US. They have hundreds or thousands of hospitals. And surprisingly, some of the supply chain people where I've been working for years, trying to analyze, optimizing the relationship with their suppliers. Typically they would buy a staple gun without staples. Stupid. But they see that happening over and over with many products. They were never able to sell these, because why? There will be one product that have to go to IT, they have to work, it would take two months and there's another supplier, new products. So how to know- >> John: They're chasing their tail! >> Yeah. It's not super excited, they are now to do that in a couple of hours. So for them, they are able, by going to the data lakes, see what data, see how this hospital is buying, they were not able to do it. So there is nothing magical here, it's just giving access to the data who know the data best, the analyst. >> So your point is don't underestimate the innovation, as small as it may seem, or inconsequential, could have huge impacts. >> The innovation goes with the process to be more efficient with the data, not so much building new products, just basically being good at what you do, so then you can focus on the value you bring to the company. >> GianThomas what's your thoughts? >> So it's sort of related. I would actually say something we've seen pretty often is companies, all sizes, are all struggling with very similar, similar problems in the data space specifically so it's not a big companies have it all figured out, small companies are behind trying to catch up, and small companies aren't necessarily super agile and aren't able to change at the drop of a hat. So it's a journey. It's a journey and it's understanding what your problems are with the data in the company and it's about figuring out what works best for your solution, or for your problems. And understanding how that impacts everyone in the business. So it's really a learning process to understand what's going- >> What are your friends who aren't in the tech business say to you? Hey, what's this data thing? How do you explain it? The fundamental shift, how do you explain it? What do you say to them? >> I'm more and more getting people that already have an idea of what this data thing is. Which five years ago was not the case. Five years ago, it was oh, what's data? Tell me more about that? Why do you need to know about what's in these databases? Now, they actually get why that's important. So it's becoming a concept that everyone understands. Now it's just a matter of moving its practice and how that actually works. >> Operationalizing it, all the things you're talking about. Guys, thanks so much for bringing the insights. We wrangled it here on the Cube. Live. Congratulations to Trifecta and Alation. Great startups, you guys are doing great. Good to see you guys successful again and rising tide floats all boats in this open source world we're living in and we're bringing you more coverage here at DataWowrks 2017, I'm John Furrier with Dave Vellante. Stay with us, more great content coming after this short break. (upbeat music)

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. at the DataWorks 2017 Summit. so the data world has So clearly that's one of the main topics. and the whole collaboration thing group in the organization Is that kind of what levels of the organization, So the issue is, the opportunity to know I can open the catalog, all the back processes you guys have, is exposed to the user to be reused. So the first one is I need to understand So Alation comes in to so in the example of Munich Re, So, as the user, as you In that example, the actuary, or the next actuary Back in the days when structured data What's the new world like now? So that's the way you understand the data. so the front lines if you will, not IT. some of the common fan of the data lake concept and nobody can go fishing. You play hockey on it. They can feed the lake with that's enabling the IT to do a data lake Talk about that dynamic of the psychology from the IT side to the govern stewardship What are some of the of now that you know what exists, the easiest it is to deliver You hear about the data that the data scientists and analysts We sell both to the analytics groups with non-regulated industries. about 60 to 70 percent. and they have to look at governance. breaking into the business with Alation. You go first. California in the US. it's just giving access to the the innovation, as small as it may seem, to be more efficient with the data, impacts everyone in the business. and how that actually works. Good to see you guys successful again

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Joe	PERSON	0.99+
Dave	PERSON	0.99+
Joe Hellerstein	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
Europe	LOCATION	0.99+
California	LOCATION	0.99+
Germany	LOCATION	0.99+
Bertrand	PERSON	0.99+
Bertrand Cariou	PERSON	0.99+
hundreds	QUANTITY	0.99+
Gianthomas Volpe	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Alation	ORGANIZATION	0.99+
yesterday	DATE	0.99+
Stephanie	PERSON	0.99+
two groups	QUANTITY	0.99+
US	LOCATION	0.99+
two months	QUANTITY	0.99+
Mary	PERSON	0.99+
John Furrier	PERSON	0.99+
five	QUANTITY	0.99+
Kaiser	ORGANIZATION	0.99+
two sides	QUANTITY	0.99+
Munich Re	ORGANIZATION	0.99+
GianThomas	PERSON	0.99+
Trifecta	ORGANIZATION	0.99+
eBay	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
Cloud Air	ORGANIZATION	0.99+
one product	QUANTITY	0.99+
Alation	PERSON	0.99+
five years ago	DATE	0.99+
Munich, Germany	LOCATION	0.98+
both	QUANTITY	0.98+
Excel	TITLE	0.98+
GeanThomas	PERSON	0.98+
over 20 years	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
one	QUANTITY	0.98+
Five years ago	DATE	0.98+
Informatica France	ORGANIZATION	0.98+
two person	QUANTITY	0.98+
first one	QUANTITY	0.98+
Hadoop	TITLE	0.97+
DataWorks	ORGANIZATION	0.97+
thousands	QUANTITY	0.97+
Munich Re	ORGANIZATION	0.96+
Hortonworks	ORGANIZATION	0.96+
one group	QUANTITY	0.96+
DataWorks 2017 Summit	EVENT	0.96+
first	QUANTITY	0.96+
GitHub	ORGANIZATION	0.96+
ten years	QUANTITY	0.96+
about 60	QUANTITY	0.96+
first thing	QUANTITY	0.95+
Cube	ORGANIZATION	0.95+
Eugene Learning	ORGANIZATION	0.94+

Carlo Vaiti | DataWorks Summit Europe 2017

>> Announcer: You are CUBE Alumni. Live from Munich, Germany, it's theCUBE. Covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello, everyone, welcome back to live coverage at DataWorks 2017, I'm John Furrier with my cohost, Dave Vellante. Two days of coverage here in Munich, Germany, covering Hortonworks and Yahoo, presenting Hadoop Summit, now called DataWorks 2017. Our next guest is Carlo Vaiti, who's the HPE chief technology strategist, EMEA Digital Solutions, Europe, Middle East, and Africa. Welcome to theCUBE. >> Thank you, John. >> So we were just chatting before we came on, of your historic background at IBM, Oracle, and now HPE, and now back into the saddle there. >> Don't forget Sun Microsystems. >> Sun Microsystems, sorry, Sun, yeah. I mean, great, great run. >> It was a long run. >> You've seen the computer revolution happen. I worked at HP for nine years, from '88 to '97. Again, Dave was a premier analyst during that run of client-server. We've seen the computer revolution happen. Now we're seeing the digital revolution where the iPhone is now 10 years old, Cloud is booming, data's at the center of the value proposition, so a completely new disruptive capability. >> Carlo: Sure, yes. >> So what are you doing as the CTO, chief technologist for HPE, how are you guys bringing this story together? 'Cause there's so much going on at HPE. You got the services spit, you got the software split, and HP's focusing on the new style of IT, as Meg Whitman calls it. >> So, yeah. My role in EMEA is actually all about having basically a visionary kind of strategy role for what's going to be HP in the future, in terms of IT. And one of the things that we are looking at is, is specifically to have, we split our strategy in three different aspects, so three transformation areas. The first one which we usually talk is what I call hybrid IT, right, which is basically making services around either On-Premise or on Cloud for our customer base. The second one is actually power the Intelligent Edge, so is actually looking after our collaboration and when we acquire Aruba components. And the third one, which is in the middle, and that's why I'm here at the DataWorks Summit, is actually the data-analytics aspects. And we have a couple of solution in there. One is the Enterprise great Hadoop, which is part of this. This is actually how we generalize all the figure and the strategy for HP. >> It's interesting, Dave and I were talking yesterday, being in Europe, it's obviously a different sideshow, it's smaller than the DataWorks or Hadoop Summit in North America in San Jose, but there's a ton of Internet of things, IoT or IIoT, 'cause here in Germany, obviously, a lot of industrial nations, but in Europe in general, a lot of smart cities initiatives, a lot of mobility, a ton of Internet of things opportunity, more than in the US. >> Absolutely. >> Can you comment on how you guys are tackling the IoT? Because it's an Intelligent Edge, certainly, but it's also data, it's in your wheelhouse. >> Yes, sure. So I'm actually working, it's a good question, because I'm actually working a couple of projects in Eastern Europe, where it's all about Industrial IoT Analytics, IIoTA. That's the new terminology we use. So what we do is actually, we analyze from a business perspective, what are the business pain points, in an oil and gas company for example. And we understand for example, what kind of things that they need and must have. And what I'm saying here is, one of the aspects for example, is the drilling opportunity. So how much oil you can extract from a specific rig in the middle of the North Sea, for example. This is one of the key question, because the customer want to understand, in the future, how much oil they can extract. The other one is for example, the upstream business. So doing on the retail side and having, say, when my customer is stopping in a gas station, I want go in the shop, immediately giving, I dunno, my daughter, a kind of campaign for the Barbie, because they like the Barbie. So IoT, Industrial IoT help us in actually making a much better customer experience, and that's the case of the upstream business, but is also helping us in actually much faster business outcomes. And that's what the customer wants, right? 'Cause, and was talking with your colleague before, I'm talking to the business guy. I'm not talking to the IT anymore in these kind of place, and that's how IoT allow us a chance to change the conversation at the industry level. >> These are first-time conversations too. You're getting at the kinds of business conversations that weren't possible five years ago. >> Carlo: Yes, sure. >> I mean and 10 years ago, they would have seemed fantasy. Now they're reality. >> The role of analytics in my opinion, is becoming extremely key, and I said this morning, for me my best center is that the detail, is the stone foundation of the digital economy. I continue to repeat this terminology, because it's actually where everything is starting from. So what I mean is, let's take a look at the analytic aspect. So if I'm able to analyze the data close to the shop floor, okay, close to the shop manufacturing floor, if I'm able to analyze my data on the rig, in the oil and gas industry, if I'm able to analyze doing preprocessing analytics, with Kafka, Druid, these kind of open-source software, where close to the Intelligent Edge, then my customers going to be happy, because I give them very fast response, and the decision-maker can get to decision in a faster time. Today, it takes a long time to take these type of decision. So that's why we want to move into the power Intelligent Edge. >> So you're saying, data's foundational, but if you get to the Intelligent Edge, it's dynamic. So you have a dynamic reactive, realtime time series, or presences of data, but you need the foundational pre-data. >> Perfect. >> Is that kind of what you're getting at? >> Yes, that's the first step. Preprocessing analytics is what we do. In the next generation of, we think is going to be Industrial IoT Analytics, we're going to actually put massive amount of compute close to the shop manufacturing floor. We call internally or actually externally, convergent planned infrastructure. And that's the key point, right? >> John: Convergent plan? >> Convergent planned infrastructure, CPI. If you look at in Google, you will find. It's a solution we bring in the market a few months ago. We announce it in December last year. >> Yeah, Antonio's smart. He also had a converged systems as well. One of the first ones. >> Yeah, so that's converge compute at the edge basically. >> Correct, converge compute-- >> Very powerful. >> Very powerful, and we run analytics on the edge. That's the key point. >> Which we love, because that means you don't have to send everything back to the Cloud because it's too expensive, it's going to take too long, it's not going to work. >> Carlo: The bandwidth on the network is much less. >> There's no way that's going to be successful, unless you go to the edge and-- >> It takes time. >> With a cost. >> Now the other thing is, of course, you've got the Aruba asset, to be able to, I always say, joke, connect the windmill. But, Carlo, can we go back to the IoTA example? >> Carlo: Correct, yeah. >> I want to help, help our audience understand, sort of, the new HP, post these spin merges. So perviously you would say, okay, we have Vertica. You still have partnership, or you still own Vertica, but after September 1st-- >> Absolutely, absolutely. It's part of the columnar side-- >> Right, yes, absolutely, but, so. But the new strategy is to be more of a platform for a variety of technology. So how for instance would you solve, or did you solve, that problem that you described? What did you actually deliver? >> So again, as I said, we're, especially in the Industrial IoT, we are an ecosystem, okay? So we're one element of the ecosystem solution. For the oil and gas specifically, we're working with other system integrator. We're working with oil and the industry gas expertise, like DXC company, right, the company that we just split a few days ago, and we're working with them. They're providing the industry expertise. We are a infrastructure provided around that, and the services around that for the infrastructure element. But for the industry expertise, we try to have a kind of little bit of knowledge, to start the conversation with the customer. But again, my role in the strategy is actually to be a ecosystem digital integrator. That's the new terminology we like to bring in the market, because we really believe that's the way HP role is going to be. And the relevance of HP is totally depending if we are going to be successful in these type of things. >> Okay, now a couple other things you talked about in your keynote. I'm just going to list them, and then we can go wherever we want. There was Data Link 3.0, Storage Disaggregation, which is kind of interesting, 'cause it's been a problem. Hadoop as a service, Realtime Everywhere, and then Analytics at the Edge, which we kind of just talked about. Let's pick one. Let's start with Data Link 3.0. What is that? John doesn't like the term data link. He likes data ocean. >> I like data ocean. >> Is Data Link 3.0 becoming an ocean? >> It's becoming an ocean. So, Data Link 3.0 for us is actually following what is going to be the future for HDFS 3.0. So we have three elements. The erasure coding feature, which is coming on HDFS. The second element is around having HDFS data tier, multi-data tier. So we're going to have faster SSD drives. We're going to have big memory nodes. We're going to have GPU nodes. And the reason why I say disaggregation is because some of the workload will be only compute, and some of the workload will be only storage, okay? So we're going to bring, and the customer require this, because it's getting more data, and they need to have for example, YARN application running on compute nodes, and the same level, they want to have storage compute block, sorry, storage components, running on the storage model, like HBase for example, like HDFS 3.0 with the multi-tier option. So that's why the data disaggregation, or disaggregation between compute and storage, is the key point. We call this asymmetric, right? Hadoop is becoming asymmetric. That's what it mean. >> And the problem you're solving there, is when I add a node to a cluster, I don't have to add compute and storage together, I can disaggregate and choose whatever I need, >> Everyone that we did. >> based on the workload. >> They are all multitenancy kind of workload, and they are independent and they scale out. Of course, it's much more complex, but we have actually proved that this is the way to go, because that's what the customer is demanding. >> So, 3.0 is actually functional. It's erasure coding, you said. There's a data tier. You've got different memory levels. >> And I forgot to mention, the containerization of the application. Having dockerized the application for example. Using mesosphere for example, right? So having the containerization of the application is what all of that means, because what we do in Hadoop, we actually build the different clusters, they need to talk to each other, and change data in a faster way. And a solution like, a product like SQL Manager, from Hortonworks, is actually helping us to get this connection between the cluster faster and faster. And that's what the customer wants. >> And then Hadoop as a service, is that an on-premise solution, is that a hybrid solution, is it a Cloud solution, all three? >> I can offer all of them. Hadoop is a service could be run on-premise, could be run on a public Cloud, could be run on Azure, or could be mix of them, partially on-premise, and partially on public. >> And what are you seeing with regard to customer adoption of Cloud, and specifically around Hadoop and big data? >> I think the way I see that option is all the customer want to start very small. The maturity is actually better from a technology standpoint. If you're asking me the same question maybe a year ago, I would say, it's difficult. Now I think they've got the point. Every large customer, they want to build this big data ocean, note the delay, ocean, whatever you want to call it. >> John: Love that. (laughs) >> All right. They want to build this data ocean, and the point I want to make is, they want to start small, but they want to think very high. Very big, right, from their perspective. And the way they approach us is, we have a kind of methodology. We establish the maturity assessment. We do a kind of capability maturity assessment, where we find that if the customer is actually a pioneer, or is actually a very traditional one, so it's very slow-going. Once we determine where is the stage of the customer is, we propose some specific proof of concept. And in three months usually, we're putting this in place. >> You also talked about realtime everywhere. We in our research, we talk about the, historically, you had batchy of interactive, and now you have what we call continuous, or realtime streaming workloads. How prevalent is that? Where do you see it going in the future? >> So I think is another train for the future, as I mentioned this morning in my presentation. So and Spark is actually doing the open-source memory engine process, is actually the core of this stuff. We see 60 to 70 time faster analytics, compared to not to use Spark. So many customer implemented Spark because of this. The requirement are that the customer needs an immediate response time, okay, for a specific decision-making that they have to do, in order to improve their business, in order to improve their life. But this require a different architecture. >> I have a question, 'cause you, you've lived in the United States, you're obviously global, and spent a lot of time in Europe as well, and a lot of times, people want to discuss the differences between, let's make it specific here, the European continent and North America, and from a sophistication standpoint, same, we can agree on that, but there are still differences. Maybe, more greater privacy concerns. The whole thing with the Cloud and the NSA in the United States, created some concerns. What do you see as the differences today between North America and Europe? >> From my perspective, I think we are much more for example take IoT, Industrial IoT. I think in Europe we are much more advanced. I think in the manufacturing and the automotive space, the connected car kind of things, autonomous driving, this is something that we know already how to manage, how to do it. I mean, Tesla in the US is a good example that what I'm saying is not true, but if I look at for example, large German manufacturing car, they always implemented these type of things already today. >> Dave: For years, yeah. >> That's the difference, right? I think the second step is about the faster analytic approach. So what I mentioned before. The Power the Intelligent Edge, in my opinion at the moment, is much more advanced in the US compared to Europe. But I think Europe is starting to run back, and going on the same route. Because we believe that putting compute capacity on the edge is what actually the customer wants. But that's the two big differences I see. >> The other two big external factors that we like to look at, are Brexit and Trump. So (laughs) how 'about Brexit? Now that it's starting to sort of actually become, begin the process, how should we think about it? Is it overblown? It is critical? What's your take? >> Well, I think it's too early to say. UK just split a few days ago, right, officially. It's going to take another 18 months before it's going to be completed. From a commercial standpoint, we don't see any difference so far. We're actually working the same way. For me it's too early to say if there's going to be any implication on that. >> And we don't know about Trump. We don't have to talk about it, but the, but I saw some data recently that's, European sentiment, business sentiment is trending stronger than the US, which is different than it's been for the last many years. What do you see in terms of just sentiment, business conditions in Europe? Do you see a pick up? >> It's getting better, it is getting better. I mean, if I look at the major countries, the P&L is going positive, 1.5%. So I think from that perspective, we are getting better. Of course we are still suffering from the Chinese, and Japanese market sometimes. Especially in some of the big large deals. The inclusion of the Japanese market, I feel it, and the Chinese market, I feel that. But I think the economy is going to be okay, so it's going to be good. >> Carlo, I want to thank you for coming on and sharing your insight, final question for you. You're new to HPE, okay. We have a lot of history, obviously I was, spent a long part of my career there, early in my career. Dave and I have covered the transformation of HP for many, many years, with theCUBE certainly. What attracted you to HP and what would you say is going on at HP from your standpoint, that people should know about? >> So I think the number one thing is that for us the word is going to be hybrid. It means that some of the services that you can implement, either on-premise or on Cloud, could be done very well by the new Pointnext organization. I'm not part of Pointnext. I'm in the EG, Enterprise Group division. But I am fan for Pointnext because I believe this is the future of our company, is on the services side, that's where it's going. >> I would just point out, Dave and I, our commentary on the spin merge has been, create these highly cohesive entities, very focused. Antonio now running EG, big fans, of where it's actually an efficient business model. >> Carlo: Absolutely. >> And Chris Hsu is running the Micro Focus, CUBE Alumni. >> Carlo: It's a very efficient model, yes. >> Well, congratulations and thanks for coming on and sharing your insights here in Europe. And certainly it is an IoT world, IIoT. I love the analytics story, foundational services. It's going to be great, open source powering it, and this is theCUBE, opening up our content, and sharing that with you. I'm John Furrier, Dave Vellante. Stay with us for more great coverage, here from Munich after the short break.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Welcome to theCUBE. and now back into the saddle there. I mean, great, great run. data's at the center of the value proposition, and HP's focusing on the new style And one of the things that we are looking at is, it's smaller than the DataWorks or Hadoop Summit Can you comment on how you guys are tackling the IoT? and that's the case of the upstream business, You're getting at the kinds of business conversations I mean and 10 years ago, they would have seemed fantasy. and the decision-maker can get to decision in a faster time. So you have a dynamic reactive, And that's the key point, right? It's a solution we bring in the market a few months ago. One of the first ones. That's the key point. it's going to take too long, it's not going to work. Now the other thing is, sort of, the new HP, post these spin merges. It's part of the columnar side-- But the new strategy is to be more That's the new terminology we like to bring in the market, John doesn't like the term data link. and the same level, they want to have but we have actually proved that this is the way to go, So, 3.0 is actually functional. So having the containerization of the application Hadoop is a service could be run on-premise, all the customer want to start very small. John: Love that. and the point I want to make is, they want to start small, and now you have what we call continuous, is actually the core of this stuff. in the United States, created some concerns. I mean, Tesla in the US is a good example is much more advanced in the US compared to Europe. actually become, begin the process, before it's going to be completed. We don't have to talk about it, but the, and the Chinese market, I feel that. Dave and I have covered the transformation of HP It means that some of the services that you can implement, our commentary on the spin merge has been, I love the analytics story, foundational services.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Carlo	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Trump	PERSON	0.99+
Meg Whitman	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Pointnext	ORGANIZATION	0.99+
Chris Hsu	PERSON	0.99+
John	PERSON	0.99+
Carlo Vaiti	PERSON	0.99+
John Furrier	PERSON	0.99+
HP	ORGANIZATION	0.99+
Munich	LOCATION	0.99+
HPE	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
Sun Microsystems	ORGANIZATION	0.99+
Antonio	PERSON	0.99+
US	LOCATION	0.99+
EG	ORGANIZATION	0.99+
second element	QUANTITY	0.99+
United States	LOCATION	0.99+
second step	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
December last year	DATE	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
San Jose	LOCATION	0.99+
1.5%	QUANTITY	0.99+
yesterday	DATE	0.99+
North America	LOCATION	0.99+
September 1st	DATE	0.99+
'97	DATE	0.99+
'88	DATE	0.99+
Africa	LOCATION	0.99+
one	QUANTITY	0.99+
Today	DATE	0.99+
three months	QUANTITY	0.99+
Eastern Europe	LOCATION	0.99+
Sun	ORGANIZATION	0.99+
Two days	QUANTITY	0.99+
60	QUANTITY	0.99+
DataWorks 2017	EVENT	0.99+
10 years ago	DATE	0.99+
DXC	ORGANIZATION	0.98+
EMEA Digital Solutions	ORGANIZATION	0.98+
five years ago	DATE	0.98+
a year ago	DATE	0.98+
Tesla	ORGANIZATION	0.98+

John Kreisa, Hortonworks– DataWorks Summit Europe 2017 #DWS17 #theCUBE

>> Announcer: Live from Munich, Germany, it's theCUBE, covering DataWorks Summit Europe 2017. Brought to you by HORTONWORKS. (electronic music) (crowd) >> Okay, welcome back everyone, we are here live in Munich, Germany, for DataWorks 2017, formerly Hadoop Summit, the European version. Again, different kind of show than the main show in North America, in San Jose, but it's a great show, a lot of great topics. I'm John Furrier, my co-host, Dave Vellante. Our next guest is John Kreisa, Vice President of International Marketing. Great to see you emceeing the event. Great job, great event! >> John Kreisa: Great. >> Classic European event, its got the European vibe. >> Yep. >> Germany everything's tightly buttoned down, very professional. (laughing) But big IOT message-- >> Yes. >> Because in Germany a lot of industrial action-- >> That's right. >> And then Europe, in general, a lot of smart cities, a lot of mobility, and issues. >> Umm-hmm. >> So a lot of IOT, a lot of meat on the bone here. >> Yep. >> So congratulations! >> John Kreisa: Thank you. >> How's your thoughts? Are you happy with the event? Give us by the numbers, how many people, what's the focus? >> Sure, yeah, no, thanks, John, Dave. Long-time CUBE attendee, I'm really excited to be here. Always great to have you guys here-- >> Thanks. >> Thanks. >> And be participating. This is a great event this year. We did change the name as you mentioned from Hadoop Summit to DataWorks Summit. Perhaps, I'll just riff on that a little bit. I think that really was in response to the change in the community, the breadth of technologies. You mentioned IOT, machine learning, and AI, which we had some of in the keynotes. So just a real expansion of from data loading, data streaming, analytics, and machine learning and artificial intelligence, which all sit on top and use the core Hadoop platform. We felt like it was time to expand the conference itself. Open up the aperture to really bring in the other technologies that were involved, and really represent what was already starting to kind of feed into Hadoop Summit, so it's kind of a natural change, a natural evolution. >> And there's a 2-year visibility. We talk about this two years ago. >> John Kreisa: Yeah, yeah. >> That you are starting to see this aperture open up a little bit. >> Yeah. >> But it's interesting. I want to get your thoughts on this because Dave and I were talking yesterday. It's like we've been to every single Hadoop Summit. Even theCUBE's been following it all as you know. It's interesting the big data space was created by the Hadoop ecosystem. >> Umm-hmm. >> So, yeah, you rode in on the Hadoop horse. >> Yeah. >> I get that. A lot of people don't get them. They say, Oh, Hadoop's dead, but it's not. >> No. >> It's evolving to a much broader scope. >> That's right. >> And you guys saw that two years ago. Comment on your reaction to Hadoop is not dead. >> Yeah, wow (laughing). It's far from dead if you look at the momentum, largest conference ever here in Europe. I think strong interest from them. I think we had a very good customer panel, which talked about the usage, right. How they were really transforming. You had Walgreens Booth's talking about how they're redoing their shelf, shelving, and how they're redesigning their stores. Don Ske-bang talking about how they're analyzing, how they replenish their cash machines. Centrica talking about how they redo their... Or how they're driving down cost of energy by being smarter around energy consumption. So, these are real transformative use cases, and so, it's far from dead. Really what might be confusing people is probably the fact that there are so many other technologies and markets that are being enabled by this open source technologies and the breadth of the platform. And I think that's maybe people see it kind of move a little bit back as a platform play. And so, we talk more about streaming and analytics and machine learning, but all that's enabled by Hadoop. It's all riding on top of this platform. And I think people kind of just misconstrue that the fact that there's one enabling-- >> It's a fundamental element, obviously. >> John Kreisa: Yeah. >> But what's the new expansion? IOT, as I mentioned, is big here. >> Umm-hmm. >> But there's a lot more in connective tissue going on, as Shawn Connelly calls it. >> Yeah, yep. >> What are those other things? >> Yeah, so I think, as you said, smart cities, smart devices, the analytics, getting the value out of the technologies. The ability to load it and capture it in new ways with new open source technology, NyFy and some of those other things, Kafka we've heard of. And some of those technologies are enabling the broader use cases, so I don't think it's... I think it's that's really the fundamental change in shift that we see. It's why we renamed it to DataWorks Summit because it's all about the data, right. That's the thing-- >> But I think... Well, if you think about from a customer perspective, to me anyway, what's happened is we went through the adolescent phase of getting this stuff to work and-- >> Yeah. >> And figuring out, Okay, what's the relationship with my enterprise data warehouse, and then they realize, Wow, the enterprise data warehouse is critical to my big data platform. >> Umm-hmm. >> So what's customers have done as they've evolved, as Hadoop has evolved, their big data platforms internally-- >> Umm-hmm. And now they're turning to to their business saying, Okay, we have this platform. Let's now really start to go up the steep part of the S-curve and get more value out of it. >> John Kreisa: Umm-hmm. >> Do you agree with that scenario? >> I would definitely agree with that. I think that as companies have, and in particularly here in Europe, it's interesting because they kind of waited for the technology to mature and its reached that inflection point. To your point, Dave, such that they're really saying, Alright, let's really get this into production. Let's really drive value out of the data that they see and know they have. And there's sort of... We see a sense of urgency here in Europe, to get going and really start to get that value out. Yeah, and we call it a ratchet game. (laughing) The ratchet is, Okay, you get the technology to work. Okay, you still got to keep the lights on. Okay, and oh, by the way, we need some data governance. Let's ratchet it up that side. Oh, we need a CDO! >> Umm-hmm. >> And so, because if you just try to ratchet up one side of the house (laughing) (cross-talk)-- >> Well, Carlo from HPE said it great on our last segment. >> Yeah. >> And I thought this was fundamental. And this was kind of like you had a CUBE moment where it's like, Wow, that's a really amazing insight. And he said something profound, The data is now foundational to all conversations. >> Right. >> And that's from a business standpoint. It's never always been the case. Now, it's like, Okay, you can look at data as a fundamental foundation building block. >> Right. >> And then react from there. So if you get the data locked in, to Dave's point about compliance, you then can then do clever things. You can have a conversation about a dynamic edge or-- >> Right. >> Something else. So the foundational data is really now fundamental, and I think that is... Changes, it's not a database issue. It's just all data. >> Right, now all data-- >> All databases. >> You're right, it's all data. It's driving the business in all different functions. It's operational efficiency. It's new applications. It's customer intimacy. All of those different ways that all these companies are going, We've got this data. We now have the systems, and we can go ahead and move forward with it. And I think that's the momentum that we're seeing here in Europe, as evidence by the conference and those kinds of things, just I think really shows how maybe... We used to say... I'd say when I first moved over here, that Europe was maybe a year and a half behind the U.S., in terms of adoption. I'd say that's shrunk to where a lot of the conversations are the exact same conversations that we're having with big European companies, that we're having with U.S. companies. >> And, even in... >> Yeah. >> Like we were just talking to Carlo, He was like, Well, and Europe is ahead in things like certain IOT-- >> Yeah. >> And Industrial IOT. >> Yeah. >> Yeah. >> Even IOT analytics. Some of the... Tesla not withstanding some of the automated vehicles. >> John Kreisa: Correct. >> Autonomous vehicles activity that's going on. >> John Kreisa: That's right. >> Certainly with Daimler and others. So there's an advancement. It almost reminds me of the early days of mobile, so... (laughing) >> It's actually, it's a good point. If you look at... Squint through some of the perspectives, it depends on where you are in the room and what your view is. You could argue there are many things that Europe is advanced on and where we're behind. If you look at Amazon Web Services, for instance. >> Umm-hmm. >> They are clearly running as fast as they can to deploy regions. >> Umm-hmm. >> So the scoop's coming out now. I'm hearing buzz that there's another region coming out. >> Right. >> From Amazon soon (laughing). They can't go fast enough. Google is putting out regions again. >> Right. >> Data centers are now pushing global, yet, there's more industrial here than is there. So it's interesting perspective. It depends on how you look at it! >> Yeah, yeah, no, I think it's... And it's perfectly fair to say there are many places where it's more advanced. I think in this technology and open source technologies, in general, are helping drive some of those and enable some of those trends. >> Yeah. >> Because if you have the sensors, you need a place to store and analyze that data whether it's smart cars or smart cities, or energy, smart energy, all those different places. That's really where we are. >> What's different in the international theater that you're involved in because you've been on both sides. >> Yep. >> As you came from the U.S. then when we first met. What's different out here now? And I see the gaps closing? What other things that notable that you could share? >> Yeah, yeah, so I'd say, we still see customers in the U.S. that are still very much wanting to use the shiniest, new thing, like the very latest version of Spark or the very latest version of NyFy or some other technologies. They want to push and use that latest version. In Europe, now the conversations are slightly different, in terms of understanding the security and governance. I think there's a lot more consciousness, if you will, around data here. There's other rules and regulations that are coming into place. And I think they're a little bit more advanced in how they think of-- >> Yeah. >> Data, personal data, how to be treated, and so, consequently, those are where the conversations are about the platform. How do we secure it? How does it get governed? So that you need regulations-- >> John Furrier: It's not as fast, as loose as the U.S. >> Yeah, it's not as fast. And you look and see some of the regulations. (laughing) My wife asked me if we should set up a VPIN on our home WiFi because of this new rule about being able to sell the personal data. I've said, Well, we're not in the U.S., but perhaps, when we move to the U.S. >> In order to get the right to block chain (laughing). (cross-talk) >> Yeah, absolutely (cross-talk). >> John Furrier: Encrypt everything. >> (laughing) Yeah, exactly. >> Well, another topic is... Let's talk about the ecosystem a little bit. >> Umm-hmm. >> You've got now some additional public brethren, obviously Cloudera's, there's been a lot of talk here about-- >> Umm-hmm. Tow-len and Al-trex-is have gone public. >> Yeah. >> The ecosystem you've evolved that. IBM was up on stage with you guys. >> Yeah, yep. >> So that continues to be-- >> Gallium C. >> Can we talk about that a little bit? >> Gallium C >> Gallium C. >> We had a great... Partners are great. We've always been about the ecosystem. We were talking about before we came on-screen that for us it's not Marney Partnership. They're very much of substance, engineering to try to drive value for the customers. It's where we see that value in that joint value. So IBM is working with us across all of the DataWorks Summit, but, even in all of the engineering work that we're doing, participated in HDP 2.6 announcement that we just did. And I'm sure what you covered with Shawn and others, but those partnerships really help drive value for the customer. >> Umm-hmm. For us, it's all making sure the customer is successful. And to make a complete solution, it is a range of products, right. It is whether it's data warehousing, servers, networks, all of the different analytics, right. There's not one product that is the complete solution. It does take a stack, a multitude of technologies, to make somebody successful. >> Cloudera's S-1, was file, what's been part of the conversation, and we've been digging into, it's great to see the numbers. >> Umm-hmm. >> Anything surprise you in the S-1? And advice you'd give to open source companies looking to go public because, as Dave pointed out, there's a string now of comrades in arms, if you will, Mool-saw, that's doing very well. >> Yeah, yeah. >> And Al-trex-is just went public. >> Yeah. >> You guys have been public for a long time. You guys been operating the public open-- >> Yeah. >> Both open source, pure open source. But also on the public markets. You guys have experience. You got some scar tissue. >> John Kreisa: (laughing) Yeah, yeah. >> What's your advice to Cloudera or others that are... Because the risk certainly will be a rush for more public companies. >> Yeah. >> It's a fantastic trend. >> I think it is a fantastic trend. I completely agree. And I think that it shows the strength of the market. It shows both the big data market, in general, the analytics market, kind of all the different components that are represented in some of those IPOs or planned IPOs. I think that for us, we're always driving for success of the customer, and I think any of the open source companies, they have to look at their business plan and take it step-wise in approach, that keeps an eye on making the customer successful because that's ultimately what's going to drive the company success and drive revenue for it and continue to do it. But we welcome as many companies as possible to come into the public market because A: it just allows everybody to operate in an open and honest way, in terms of comparison and understanding how growth is. But B: it's shows that strength of how open source and related technologies can help-- >> Yeah. >> Drive things forward. >> And it's good for the customer, too, because now they can compare-- >> Yes! >> Apples to Apples-- >> Exactly. >> Visa V, Cloudera, and what's interesting is that they had such a head start on you guys, HORTONWORKS, but the numbers are almost identical. >> Umm-hmm, yeah. >> Really close. >> Yeah, I think it's indicative of the opportunity that they're now coming out and there's rumors of other companies coming out. And I think it's just gives that visibility. We welcome it, absolutely-- >> Yeah. >> To show because we're very proud of our performance and now are growth. And I think that's something that we stand behind and stand on top of. And we want to see others come out and show what they got. >> Let's talk about events, if we can? >> Yeah. >> We were there at the first Hadoop Summit in San Jose. Thrilled to be-- >> John Kreisa: In a few years. >> In Dublin last year. >> Yeah. >> So what's the event strategy? I love going into the local flavor. >> Umm-hmm. >> Last year we had the Irish singers. This year we had a great (laughing) locaL band. >> John Kreisa: (laughing) Yeah, yeah, yeah. >> So I don't know if you've announced where next year's going to be? Maybe you can share with us some of the roll-out strategies? >> Yeah, so first of all, DataWorks Summit is a great event as you guys know, And you guys are long participants, so it's a great partnership. We've moving them international, of course, we did a couple... We are already international, but moving a couple to Asia last year so-- >> Right. >> Those were a tremendous success, we actually exceeded our targets, in terms of how many people we thought would go. >> Dave: Where did you do those? >> We were in Melburn in Tokyo. >> Dave: That's right, yeah. >> Yeah, so in both places great community, kind of rushed to the event and kind of understanding, really showed that there is truly a global kind of data community around Hadoop and other related technologies. So from here as you guys know because you're going to be there, we're thinking about San Jose and really wanting to make sure that's a great event. It's already stacking up to be tremendous, call for papers is all done. And all that's announced so, even the sessions we're really starting build for that, We'll be later this year. We'll be in Sydney, so we're going to have to take DataWorks into Sydney, Australia, in September. So throughout the rest of this year, there's going to be continued building momentum and just really global participation in this community, which is great. >> Yeah. >> Yeah. >> Yeah, it's fantastic. >> Yeah, Sydney should be great. >> Yeah. >> Looking forward to it. We're going to expand theCUBE down under. Dave and I are are excited-- >> Dave: Yeah, let's talk about that. >> We got a lot of interest (laughing). >> Alright. >> John, great to have you-- >> Come on down. >> On theCUBE again. Great to see you. Congratulations, I'm going to see you up on stage. >> Thank you. >> Doing the emcee. Great show, a lot of great presenters and great customer testimonials. And as always the sessions are packed. And good learning, great community. >> Yeah. >> Congratulations on your ecosystem. This is theCUBE broadcasting live from Munich, Germany for DataWorks 2017, presented by HORTONWORKS and Yahoo. I'm John Furrier with Dave Vellante. Stay with us, great interviews on day two still up. Stay with us. (electronic music)

Published Date : Apr 6 2017

SUMMARY :

Brought to you by HORTONWORKS. Great to see you emceeing the event. its got the European vibe. But big IOT message-- a lot of smart cities, a lot of meat on the bone here. Always great to have you guys here-- We did change the name as you mentioned And there's a 2-year visibility. to see this aperture It's interesting the big data space in on the Hadoop horse. A lot of people don't get them. to a much broader scope. And you guys saw that two years ago. that the fact that there's one enabling-- But what's the new expansion? But there's a lot more in because it's all about the data, right. of getting this stuff to work and-- Wow, the enterprise data warehouse of the S-curve and get for the technology to mature it great on our last segment. And I thought It's never always been the case. So if you get the data locked in, So the foundational data a lot of the conversations of the automated vehicles. activity that's going on. It almost reminds me of the it depends on where you are in the room as fast as they can to deploy regions. So the scoop's Google is putting out regions again. It depends on how you look at it! And it's perfectly fair to have the sensors, the international theater And I see the gaps closing? or the very latest version of NyFy So that you need regulations-- fast, as loose as the U.S. some of the regulations. In order to get the right Let's talk about the Tow-len and Al-trex-is IBM was up on stage with you guys. even in all of the engineering work networks, all of the it's great to see the numbers. in the S-1? You guys been operating the public open-- But also on the public markets. Because the risk certainly will be kind of all the different components HORTONWORKS, but the numbers indicative of the opportunity And I think that's something at the first Hadoop Summit in San Jose. I love going into the local flavor. the Irish singers. Yeah, yeah, yeah. And you guys are long participants, in terms of how many kind of rushed to the event We're going to expand theCUBE down under. to see you up on stage. And as always the sessions are packed. I'm John Furrier with Dave Vellante.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
John Kreisa	PERSON	0.99+
John Furrier	PERSON	0.99+
Carlo	PERSON	0.99+
Sydney	LOCATION	0.99+
Asia	LOCATION	0.99+
Shawn Connelly	PERSON	0.99+
2-year	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Tokyo	LOCATION	0.99+
Dublin	LOCATION	0.99+
Melburn	LOCATION	0.99+
San Jose	LOCATION	0.99+
North America	LOCATION	0.99+
John	PERSON	0.99+
Last year	DATE	0.99+
U.S.	LOCATION	0.99+
Daimler	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
September	DATE	0.99+
Centrica	ORGANIZATION	0.99+
Tesla	ORGANIZATION	0.99+
Yahoo	ORGANIZATION	0.99+
last year	DATE	0.99+
Walgreens Booth	ORGANIZATION	0.99+
Both	QUANTITY	0.99+
both sides	QUANTITY	0.99+
HORTONWORKS	ORGANIZATION	0.99+
This year	DATE	0.99+
S-1	TITLE	0.99+
yesterday	DATE	0.98+
next year	DATE	0.98+
Munich, Germany	LOCATION	0.98+
Shawn	PERSON	0.98+
HPE	ORGANIZATION	0.98+
Hadoop Summit	EVENT	0.98+
both	QUANTITY	0.98+
two years ago	DATE	0.98+
a year and a half	QUANTITY	0.98+
one product	QUANTITY	0.98+
DataWorks 2017	EVENT	0.98+
this year	DATE	0.98+
Sydney, Australia	LOCATION	0.97+
DataWorks Summit	EVENT	0.97+
Apples	ORGANIZATION	0.97+
day two	QUANTITY	0.97+
Hortonworks–	ORGANIZATION	0.97+
Gallium C	ORGANIZATION	0.96+
Gallium C.	ORGANIZATION	0.96+

Raj Verma | DataWorks Summit Europe 2017

>> Narrator: Live from Munich, Germany it's the CUBE, covering Dataworks Summit Europe 2017. Brought to you by Hortonworks. >> Okay, welcome back everyone here at day two coverage of the CUBE here in Munich, Germany for Dataworks 2017. I'm John Furrier, my co-host Dave Vellante. Two days of wall to wall coverage SiliconANGLE Media's the CUBE. Our next guest is Raj Verma, the president and COO of Hortonworks. First time on the CUBE, new to Hortonworks. Welcome to the CUBE. >> Thank you very much, John, appreciate it. >> Looking good with a three piece suit we were commenting when you were on stage. >> Raj: Thank you. >> Great scene here in Europe, again different show vis-a-vis North America, in San Jose. You got the show coming up there, it's the big show. Here, it's a little bit different. A lot of IOT in Germany. You got a lot of car manufacturers, but industrial nation here, smart city initiatives, a lot of big data. >> Uh-huh. >> What's your thoughts? >> Yeah no, firstly thanks for having me here. It's a pleasure and good chit chatting right before the show as well. We are very, very excited about the entire data space. Europe is leading many initiatives about how to use data as a sustainable, competitive differentiator. I just moderated a panel and you guys heard me talk to a retail bank, a retailer. And really, Centrica, which was nothing but British Gas, which is rather an organization steeped in history so as to speak and that institution is now, calls itself a technology company. And, it's a technology company or an IOT company based on them using data as the currency for innovation. So now, British Gas, or Centrica calls itself a data company, when would you have ever thought that? I was at dinner with a very large automotive manufacturers and the kind of stuff they are doing with data right from the driving habits, driver safety, real time insurance premium calculation, the autonomous drive. It's just fascinating no matter what industry you talk about. It's just very, very interesting. And, we are very glad to be here. International business is a big priority for me. >> We've been following Hortonworks since it's inception when it spun out of Yahoo years ago. I think we've been to every Hadoop World going back, except for the first one. We watched the transition. It's interesting, it's always been a learning environment at these shows. And certainly the customer testimonials speaks to the ecosystem, but I have to ask you, you're new to Hortonworks. You have interesting technology background. Why did you join Hortonworks? Because you certainly see the movies before and the cycles of innovation, but now we're living in a pretty epic, machine learning, data AI is on the horizon. What were the reasons why you joined Hortonworks? >> Yeah sure, I've had a really good run in technology, fortunately was associated with two great companies, Parametric Technology and TIBCO Software. I was 16 years at TIBCO, so I've been dealing with data for 16 years. But, over the course of the last couple of years whenever I spoke to a C level executive, or a CIO they were talking to us about the fact that structured data, which is really what we did for 16 years, was not good enough for innovation. Innovation and insights into unstructured data was the seminal challenge of most of the executives that I was talking to, senior level executives. And, when you're talking about unstructured data and making sense of it there isn't a better technology than the one that we are dealing with right now, undoubtedly. So, that was one. Dealing with data because data is really the currency of our times. Every company is a data company. Second was, I've been involved with proprietary software for 23 years. And, if there is a business model that's ready for disruption it's the proprietary software business model because I'm absolutely convinced that open source is what I call a green business model. It's good for planet Earth so as to speak. It's a community based, it's based on innovation and it puts the customer and the technology provider on the same page. The customer success drives the vendor success. Yeah, so the open source community, data-- >> It's sustainables, pun intended, in the sense that it's had a continuing run. And, it's interesting Tier One software is all open source now. >> 100%, and by the way not only that if you see large companies like IBM and Microsoft they have finally woken up to the fact that if they need to attract talent and if they want to be known as talk leaders they have to have some very meaningful open source initiatives. Microsoft loves Linux, when did we ever think that was going to happen, right? And, by the way-- >> I think Steve Bauman once said it was the cancer of the industry. Now, they're behind it. But, this is the Linux foundation has also grown. We saw a project this past week. Intel donated a big project to the Linux now it's taking over, so more projects. >> Raj: Yes. >> There's more action happening than ever before. >> You know absolutely, John. Five years ago when I would go an meet a CIO and I would ask them about open source and they would wink, they say "Of course, "we do open source. But, it's less than 5%, right? Now, when I talk to a CIO they first ask their teams to go evaluated open source as the first choice. And, if they can't they come kicking and screaming towards propriety software. Most organizations, and some organizations with a lot of historical gravity so as to speak have a 50/50 even split between proprietary and open source. And, that's happened in the last three years. And, I can make a bold statement, and I know it'll be true, but in the next three years most organizations the ratio of proprietary to open source would be 20 proprietary 80 open source. >> So, obviously you've made that bet on open source, joining Hortonworks, but open is a spectrum. And, on one end of the spectrum you have Hortonworks which is, as I see it, the purest. Now, even Larry Ellison, when he gets onstage at Oracle Open World will talk about how open Oracle is, I guess that's the other end of the spectrum. So, my question is won't the Microsofts and the Oracles and the IBM, they're like recovering alcoholics and they'll accommodate their platforms through open source, embracing open source. We'll see if AWS is the same, we know it's unidirectional there. How do you see that-- >> Well, not necessarily. >> Industry dynamic, we'll talk about that later. How do you see that industry dynamic shaking out? >> No, absolutely, I think I remember way back in I think the mid to late 90s I still loved that quote by Scott McNeely, who is a friend, Dell, not Dell, Digital came out with a marketing campaign saying open VMS. And, Scott said, "How can someone lie "so much with one word?" (laughs) So, it's the fact that Oracle calling itself open, well I'll just leave it at, it's a good joke. I think the definition of open source, to me, is when you acquire a software you have three real costs. One is the cost of initial procuring that software and the hardware and all the rest of it. The second is implementation and maintenance. However, most people miss the third dimension of cost when acquiring software, which is the cost to exit the technology. Our software and open source has very low exit barriers to our technology. If you don't like our technology, switch it off. You own the software anyways. Switch off our services and the barrier of exits are very, very low. Having worked in proprietary software, as I said, for 23 years I very often had conversations with my customers where I would say, "Look, you really "don't have a choice, because if you want to exit "our technology it's going to probably cost you "ten times more than what you've spent till date." So, it a lock in architecture and then you milk that customer through maintenance, correct? >> Switching costs really are the metric-- >> Raj: Switching costs, exactly. >> You gave the example of Blockbuster Camera, and the rental, the late charge fees. Okay, that's an example of lock in. So, as we look at the company you're most compared with, now that's it's going public, Cloudera, in a way I see more similarities than differences. I mean, you guys are sort of both birds of a feather. But, you are going for what I call the long game with a volume subscription model. And, Cloudera has chosen to build proprietary components on top. So, you have to make big bets on open. You have to support those open technologies. How do you see that affecting the long term distance model? >> Yeah, I think we are committed to open source. There's absolutely no doubt about it. I do feel that we are connected data platform, which is data at rest and data in motion across on prem and cloud is the business model the going to win. We clearly have momentum on our side. You've seen the same filings that I have seen. You're talking about a company that had a three year head start on us, and a billion dollars of funding, all right, at very high valuations. And yet, they're only one year ahead in terms of revenue. And, they have burnt probably three times more cash than we have. So clearly, and it's not my opinion, if you look at the numbers purely, the numbers actually give us the credibility that our business model and what we are doing is more efficient and is working better. One of the arguments that I often hear from analysts and press is how are your margins on open source? According to the filings, again, their margins are 82% on proprietary software, my margins on open source are 84%. So, from a health of the business perspective we are better. Now, the other is they've claimed to have been making a pivot to more machine learning and deep learning and all the rest of it. And, they actually'd like us to believe that their competition is going to be Amazon, IBM, and Google. Now, with a billion dollars of funding with the Intel ecosystem behind them they could effectively compete again Hortonworks. What do you think are their chances of competing against Google, Amazon, and IBM? I just leave that for you guys to decide, to be honest with you. And, we feel very good that they have virtually vacated the space and we've got the momentum. >> On the numbers, what jumps out at you on filing since obviously, I sure, everyone at Hortonworks was digging through the S1 because for the first time now Cloudera exposes some of the numbers. I noticed some striking things different, obviously, besides their multiple on revenue valuation. Pretty obvious it's going to be a haircut coming after the public offering. But, on the sales side, which is your wheelhouse there's a value proposition that you guys at Hortonworks, we've been watching, the cadence of getting new clients, servicing clients. With product evolution is challenging enough, but also expensive. It's not you guys, but it's getting better as Sean Connolly pointed out yesterday, you guys are looking at some profitability targets on the Ee-ba-dep coming up in Q four. Publicly stated on the earnings call. How's that different from Cloudera? Are they burning more cash because of their sales motions or sales costs, or is it the product mix? What's you thoughts on the filings around Cloudera versus the Hortonworks? >> Well, look I just feel that, I can talk more about my business than theirs. Clearly, you've seen the same filings that I have and you've see the same cash burn rates that we have seen. And, we clearly are ore efficient, although we can still get better. But, because of being public for a little more than two years now we've had a thousand watt bulb being shown at us and we have been forced to be more efficient because we were in the limelight. >> John: You're open. >> In the open, right? So, people knew what our figures are, what our efficiency ratios were. So, we've been working diligently at improving them and we've gotten better, and there's still scope for improvement. However, being private did not have the same scrutiny on Cloudera. And, some would say that they were actually spending money like drunken sailors if you really read their S1 filing. So, they will come under a lot of scrutiny as well. I'm sure they'll get more efficient. But right now, clearly, you've seen the same numbers that I have, their numbers don't talk about efficiency either in the R and D side or the sales and marketing side. So, yeah we feel very good about where we are in that space. >> And, open source is this two edged sword. Like, take Yarn for example, at least from my perspective Hortonworks really led the charge to Yarn and then well before Doctor and Kubernetes ascendancy and then all of a sudden that happens and of course you've got to embrace those open source trends. So, you have the unique challenge of having to support sort of all the open source platforms. And, so that's why I call it the long game. In order for you guys to thrive you've got to both put resources into those multiple projects and you've got to get the volume of your subscription model, which you pointed out the marginal economics are just as good as most, if not any software business. So, how do you manage that resource allocation? Yes, so I think a lot of that is the fact that we've got plenty of contributors and committers to the open source community. We are seen as the angel child in open source because we are just pure, kosher open source. We just don't have a single line of proprietary code. So, we are committed to that community. We have over the last six or seven years developed models of our software development which helps us manage the collective bargaining power, so as to speak, of the community to allocate resources and prioritize the allocation of resources. It continues to be a challenge given the breadth of the open source community and what we have to handle, but fortunately I'm blessed that we've got a very, very capable engineering organization that keeps us very efficient and on the cutting edge. >> We're here with Raj Verma, With the new president and COO of Hortonworks, Chief Operating Officer. I've got to ask you because it's interesting. You're coming in with a fresh set of eyes, coming in as you mentioned, from TIBCO, interesting, which was very successful in the generation of it's time and history of TIBCO where it came from and what it did was pretty fantastic. I mean, everyone knows connecting data together was very hard in the enterprise world. TIBCO has some challenges today, as you're seeing, with being disrupted by open source, but I got to ask you. As a perspective, new executive you got, looking at the battlefield, an opportunity with open source there's some significant things happening and what are you excited about because Hortonworks has actually done some interesting things. Some, I would say, the world spun in their direction, their relationship with Microsoft, for instance, and their growth in cloud has been fantastic. I mean, Microsoft stock price when they first started working with Hortonworks I think was like 26, and obviously with Scott Di-na-tell-a on board Azure, more open source, on Open Compute to Kubernetes and Micro Services, Azure doing very, very well. You also have a partnership with Amazon Web Services so you already are living in this cloud era, okay? And so, you have a cloud dynamic going on. Are you excited by that? You bring some partnership expertise in from TIBCO. How do you look at partners? Because, you guys don't really compete with anybody, but you're partners with everybody. So, you're kind of like Switzerland, but you're also doing a lot of partnerships. What are you excited about vis-a-vis the cloud and some of the other partnerships that are happening. >> Yeah, absolutely, I think having a robust partner ecosystem is probably my number one priority, maybe number two after being profitable in a short span of time, which is, again, publicly stated. Now, our partnership with Microsoft is very, very special to us. Being available in Azure we are seeing some fantastic growth rates coming in from Azure. We are also seeing remarkable amount of traction from the market to be able to go and test out our platform with very, very low barriers of entry and, of course, almost zero barriers of exit. So, from a partnership platform cloud providers like Amazon, Microsoft, are very, very important to us. We are also getting a lot of interest from carriers in Europe, for example. Some of the biggest carriers want to offer business services around big data and almost 100%, actually not almost, 100% of the carriers that we have spoken to thus far want to partner with us and offer our platform as a cloud service. So, cloud for us is a big initiative. It gives us the entire capability to reach audiences that we might not be able to reach ringing one door bell at a time. So, it's, as I said, we've got a very robust, integrated cloud strategy. Our customers find that very, very interesting. And, building that with a very robust partner channel, high priority for us. Second, is using our platform as a development platform for application on big data is, again, a priority. And that's, again, building a partner ecosystem. The third is relationships with global SIs, Extensia, Deloitte, KPMG. The Indian SIs of In-flu-ces, and Rip-ro, and HCL and the rest. We have some work to do. We've done some good work there, but there's some work to be done there. And, not only that I think some of the initiatives that we are launching in terms of training as a service, free certification, they are all things which are aimed at reaching out to the partners and building, as I said, a robust partner ecosystem. >> There's a lot of talk a conferences like this about, especially in Hadoop, about complexity, complexity of the ecosystem, new projects, and the difficulties of understanding that. But, in reality it seems as though today anyway the technology's pretty well understood. We talked about Millennials off camera coming out today with social savvy and tooling and understanding gaming and things like that. Technology, getting it to work seems to not be the challenge anymore. It's really understanding how to apply it, how to value data, we heard in your panel today. The business process, which used to be very well known, it's counting, it's payroll, simple. Now, it's kind of ever changing daily. What do you make of that? How do you think that will effect the future of work? Yeah, I think there's some very interesting questions that you've asked in that the first, of course, is what does it take to have a very successful big data, or Hadoop project. And, I think we always talk about the fact that if you have a very robust business case backing a Hadoop project that is the number one key ingredient to delivering a Hadoop project. Otherwise, you can tend to boil the ocean, all right, or try and eat an elephant in one bite as I like to say. So, that's one and I think you're right. It's not the technology, it's not the complexity, it's not the availability of the resources. It is a leadership issue in organizations where the leader demands certain outcomes, business outcomes from the Hadoop project team and we've seen whenever that happens the projects seem to be very, very successful. Now, the second part of the question about future of work, which is a very, very interesting topic and a topic which is very, very close to my heart. There are going to be more people than jobs in the next 20, 25 years. I think that any job that can be automated will be automated, or has been automated, right? So, this is going to have a societal impact on how we live. I've been lucky enough that I joined this industry 25 years ago and I've never had to change or switch industries. But, I can assure you that our kids, and we were talking about kids off camera as well, our kids will have to probably learn a new skill every five years. So, how does that impact education? We, in our generation, were testing champions. We were educated to score well on tests. But, the new form of education, which you and I were talking about, again in California where we live, and where my daughter goes to high school and in her school the number one, the number one priority is to instill a sense of learning and joy of learning in students because that is what is going to contribute to a robust future. >> That's a good point, I want to just interject here because I think that the trend we're seeing in the higher Ed side too also point to the impact of data science, to curriculum and learning. It's not just putting catalogs online. There's now kind of an iterative kind of non-linear discovery to proficiency. But, there's also the emotional quotient aspect. You mentioned the love of learning. The immersion of tech and digital is creating an interdisciplinary requirement. So, all the folks say that, what the statistic's like half the jobs that are going to be available haven't even been figured out yet. There's a value creation around interdisciplinary skill sets and emotional quotient. >> Absolutely. >> Social, emotional because of the human social community connectedness. This is also a big data challenge opportunity. >> Oh, 100% and I think one of the things that we believe is in the future, jobs that require a greater amount of empathy are least susceptible to automation. So, things like caring for old age people in the world, and nursing, and teaching, and artists, and all the rest will be professions which will be highly paid and numerous. I also believe that the entire big data challenge about how you use data to impact communities is going to come into play. And also, I think John, you and I were again talking about it, the entire concept of corporations is only 200 years old, really, 200, 300 years old. Before that, our forefathers were individual contributors who contributed a certain part in a community, barbers, tailors, farmers, what have you. We are going to go back to the future where all of us will go back to being individual contributors. And, I think, and again I'm bringing it back to open source, open source is the start of that community which will allow the community to go back to its roots of being individual contributors rather than being part of a organization or a corporation to be successful and to contribute. >> Yeah, the Coase's Penguin has been a very famous seminal piece of work. Obviously, Ronald Coase who's wrote the book The Nature of the Firm is interesting, but that's been a kind of historical document. You look at blockchain for instance. Blockchain actually has the opportunity to disrupt what the Nature of the Firm is about because of smart contracts, supply chain, and what not. And, we have this debate on the CUBE all the time, there's some naysayers, Tim Conner's a VC and I were talking on our Friday show, Silicon Valley Friday show. He's actually a naysayer on blockchain. I'm actually pro blockchain because I think there's some skeptics that say blockchain is really hard to because it requires an ecosystem. However, we're living in an ecosystem, a world of community. So, I think The Nature of the Firm will be disrupted by people organizing in a new way vis-a-vis blockchain 'cause that's an open source paradigm. >> Yeah, no I concur. So, I'm a believer in that entire concept. I 100%-- >> I want to come back to something you talked about, about individual contributors and the relationship in link to open source and collaboration. I personally, I think we have to have a frank conversation about, I mean machines have always replaced humans, but for the first time in our history it's replacing cognitive functions. To your point about empathy, what are the things that humans can do that machines can't? And, they become fewer and fewer every year. And, a lot of these conferences people don't like to talk about that, but it's a reality that we have to talk about. And, your point is right on, we're going back to individual contribution, open source collaboration. The other point is data, is it going to be at the center of that innovation because it seems like value creation and maybe job creation, in the future, is going to be a result of the combinatorial effects of data, open source, collaboration, other. It's not going to because of Moore's Law, all right. >> 100%, and I think one of the aspects that we didn't touch upon is the new societal model that automation is going to create would need data driven governance. So, a data driven government is going to be a necessity because, remember, in those times, and I think in 25, 30 years countries will have to explore the impact of negative taxation, right? Because of all the automation that actually happens around citizen security, about citizen welfare, about cost of healthcare, cost of providing healthcare. All of that is going to be fueled by data, right? So, it's just, as the Chinese proverb says, "May you live in interesting times." We definitely are living in very interesting times. >> And, the public policy implications are, your friend and one of my business heroes, Scott McNeally says, "There's no privacy in "the internet, get over it." We interviewed John Tapscott last week he said "That's unacceptable, "we have to solve that problem." So, it brings up a lot of public policy issues. >> Well, the social economic impact, right now there's a trend we're seeing where the younger generation, we're talking about the post 9/11 generation that's entering the workforce, they have a social conscience, right? So, there's an emphasis you're seeing on social good. AI for social good is one of the hottest trends out there. But, the changing landscape around data is interesting. So, the word democratization has been used whether you're looking at the early days of blogging and podcasting which we were involved in and research to now in media this notion of data and transparency and open source is probably at a tipping point, an all time high in terms of value creation. So, I want to hear your thoughts on this because as someone who's been in the proprietary world the mode of operation was get something proprietary, lock it dowm, build a fence and a wall, protect it with folks with machine guns and fight for the competitive advantage, right? Now, the competitive advantage is open. Okay, so you're looking at pure open source model with Hortonworks. It changes how companies are competing. What is the competitive advantage of Hortonworks? Actually, to be more open. >> 100%. >> How do you manage that? >> No absolutely, I just think the proprietary nature of software, like software has disrupted a lot of businesses, all right? And, it's not a resistance to disruption itself. I mean, there has never been a business model in the history of time where you charge a lot of money to build a software, or sell a software that you built and then whatever are the defects in that software you get paid more money to fix them, all right? That's the entire perpetual and maintenance model. That model is going to get disrupted. Now, there are hundreds of billions of dollars involved in it so people are going to come kicking and screaming to the open source world, but they will have to come to the open source world. Our advantage that we're seeing is innovation now in a closed loop environment, no matter what size of a company you are, cannot keep up with the changing landscape around you from a data perspective. So, without the collective innovation of the community I don't really think a technology can stay at par with the changes around them. >> This is what I say about, this is what I think is such an important point that you're getting at because we were started SiliconANGLE actually in the Cloudera office, so we have a lot of friends that work there. We have a great admiration for them, but one of the things that Cloudera has done through their execution is they have been very profit oriented, go public at all costs kind of thing that they're doing now. You've seen that happen. Is the competitive advantage that you're pointing out is something we're seeing that similar that Andy Jasseys doing at AWS, which is it's not so much to build something proprietary per se, it's just to ship something faster. So, if you look at Amazon's competitive advantage is that they just continue to ship product faster and faster and faster than companies can build themselves. And also, the scale that they're getting with these economies is increasing the quality. So, open source has also hit the naysayers on security, right? Everyone said, "Oh, open source is not secure." As it turns out, it's more secure. Amazon at scale is actually becoming more secure. So, you're starting to see the new competitive advantage be ship more, be more open as the way to do business. What do you think the impact will be to traditional companies whether it's a startup competing or an existing bank? This is a paradigm shift, what's the impact going to be for a CIO or CEO of a big company? How do they incorporate that competitive advantage? Yeah, I think the proprietary software world is not going to go away tomorrow, John, you know that. There so much of installed software and there's a saying from where I come from that "Even a dead elephant is worth a million dollars," right? So, even that business model even though it is sort of dying it'll still be a good investment for the next ten years because of the locked in business model where customers cannot get out. Now, from a perspective of openness and what that brings as a competitive differentiators to our customer just the very base at which, as I've said I've lived in a proprietary world, you would be lucky if you were getting the next version of our software every 18 months, you'd be lucky. In the open source community you get a few versions in 18 months. So, the cadence at which releases come out have just completely disrupted the proprietary model. It is just the collective, as I said, innovative or innovation ability of the community has allowed us to release, to increase the release cadence to a few months now, all right? And, if our engineering team had it's way it'll further be cut short, right? So, the ability of customers, and what does that allow the customer to do? Ten years ago if you looked for a capability from your proprietary vendor they would say you have to wait 18 months. So, what do you do, you build it yourself, all right? So, that is what the spaghetti architecture was all about. In the new open source model you ask the community and if enough people in the community think that that's important the community builds it for you and gives it to you. >> And, the good news is the business model of open source is working. So, you got you guys have been public, you got Cloudera going public, you have MuleSoft out there, a lot of companies out there now that are public companies are open source companies, a phenomenal change over. But, the other thing that's interesting is that the hiring factor for the large enterprise to the point of, your point about so proprietary not updating, it's the same is true for the enterprise. So, just hiring candidates out of open source is now increased, the talent pool for a large enterprise. >> 100%, 100%. >> Well, I wonder if I could challenge this love fest for a minute. (laughs) So, there's another saying, I didn't grow up there, but a dying snake can still bite you. So, I bring that up because there is this hybrid model that's emerging because these elephants eventually they figure it out. And so, an example would be, we talked about Cloudera and so forth, but the better example, I think, is IBM. What IBM has done to embrace open source with investing years ago a billion dollars into Linux, what it's doing with Spark, essentially trying to elbow its way in and say, "Okay, "now we're going to co-opt the ecosystem. "And then, build our proprietary pieces on top of it." That, to me, that's a viable business model, is it not? >> Yes, I'm sure it is and to John's point with the Mule going IPO and with Cloudera having successfully built a $250 million, $261 million business is testimony, yeah, it's a testimony to the fact that companies can be built. Now, can they be more efficient, sure they can be more efficient. However, my entire comment on this is why are you doing open source? What is your intent of doing open source, to be seen as open, or to be truly open? Because, in our philosophy if you a add a slim layer of proprietariness, why are you doing that? And, as a businessman I'll tell you why you increase the stickiness factor by locking in your customer, right? So, let's not, again, we're having a frank conversation, proprietary code equals customer lock in, period. >> Agreed. And, as a business model-- >> I'm not sure I agree with that. >> As a business model. >> Please. (laughs) We'll come back to that. >> So, it's a customer lock in. Now, as a business model it is, if you were to go with the business models of the past, yes I believe most of the analysts will say it a stickier, better business model, but then we would like to prove them wrong. And, that's our mission as open source purely. >> I would caution though, Amazon's the mother of all lock in's. You kind of bristled at that before. >> They're not, I mean they use a lot of open source. I mean, did they open source it? Getting back to the lock in, the lock in is a function of stickiness, right? So, stickiness can be open source. Now, you could argue that Horonworks through they're relationship with partnering is a lock in spec with their stickiness of being open. Right, so I come back down to the proprietary-- >> Dave: My search engine I like Google. >> I mean Google's certainly got-- >> It's got to be locked in 'cause I like it? >> Well, there's a lot of do you care with proprietary technology that Google's built. >> Switching costs, as we talked about before. >> But, you're not paying for Si-tch >> If the value exceeds the price of the lock in then it's an opportunity. So, Palma Richie's talking about the hardened top, the hardened top. Do you care what's in an Intel processor? Well, Intel is a proprietary platform that provides processing power, but it enables a lot of other value. So, I think the stickiness factor of say IBM is interesting and they've done a lot open source stuff to defend them on Linux, for example they do a (mumbles) blockchain. But, they're priming the pump for their own business, that's clear for their lock In. >> Raj wasn't saying there's not value there. He's saying it's lock in, and it is. >> Well, some customers will pay for convenience. >> Your point is if the value exceeds the lock in risk than it's worth it. >> Yeah, that's my point, yeah. >> 1005, 100%. >> And, that's where the opportunity is. So, you can use open source to get to a value projectory. That's the barriers to entry, we seen 'em on the entrepreneurship side, right? It's easier to start a company now than ever before. Why? Because of open source and cloud, right? So, does that mean that every startup's going to be super successful and beat IBM? No, not really. >> Do you thinK there will be a red hat of big data and will you be it? >> We hope so. (laughs) If I had my that's definitely. That's really why I am here. >> Just an example, right? >> And, the one thing that excites us about this this year is as my former boss used to say you could be as good as you think you are or the best in the world but if you're in the landline business right now you're not going to have a very bright future. However, the business that we are in we pull from the market that we get, and you're seeing here, right? And, these are days that we have very often where customer pool is remarkable. I mean, this industry is growing at, depending on which analyst you're talking to somewhere between 50 to 80% ear on ear. All right, every customer is a prospect for us. There isn't a single conversation that we have with any organization almost of any size where they don't think that they can use their data better, or they can enhance and improve their data strategy. So, if that is in place and I am confident about our execution, very, very happy with the technology platform, the support that we get from out customers. So, all things seem to be lining up. >> Raj, thanks so much for coming on, we appreciate your time. We went a little bit over, I think, the allotted time, but wanted to get your insight as the new President and Chief Operating Officer for Hortonworks. Congratulations on the new role, and looking forward to seeing the results. Since you're a public company we'll be actually able to see the scoreboard. >> Raj: Yes. >> Congratulations, and thanks for coming on the CUBE. There's more coverage here live at Dataworks 2017. I John Furrier, stay with us more great interviews, day two coverage. We'll be right back. (jaunty music)

Published Date : Apr 6 2017

SUMMARY :

Munich, Germany it's the CUBE, of the CUBE here in Munich, Thank you very much, we were commenting when you were on stage. You got the show coming up about the entire data space. and the cycles of of most of the executives in the sense that it's 100%, and by the way of the industry. happening than ever before. a lot of historical gravity so as to speak And, on one end of the How do you see that industry So, it's the fact that and the rental, the late charge fees. the going to win. But, on the sales side, to be more efficient because either in the R and D side or of that is the fact that and some of the other from the market to be the projects seem to be So, all the folks say that, the human social community connectedness. I also believe that the the opportunity to disrupt So, I'm a believer in that entire concept. and maybe job creation, in the future, Because of all the automation And, the public and fight for the innovation of the community allow the customer to do? is now increased, the talent and so forth, but the better the fact that companies And, as a business model-- I agree with that. We'll come back to that. most of the analysts Amazon's the mother is a function of stickiness, right? Well, there's a lot of do you care we talked about before. If the value exceeds there's not value there. Well, some customers Your point is if the value exceeds That's the barriers to If I had my that's definitely. the market that we get, and Congratulations on the new role, on the CUBE.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
TIBCO	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Google	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott	PERSON	0.99+
Steve Bauman	PERSON	0.99+
Centrica	ORGANIZATION	0.99+
British Gas	ORGANIZATION	0.99+
John	PERSON	0.99+
Tim Conner	PERSON	0.99+
John Tapscott	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
KPMG	ORGANIZATION	0.99+
Deloitte	ORGANIZATION	0.99+
California	LOCATION	0.99+
John Furrier	PERSON	0.99+
Scott McNeally	PERSON	0.99+
Sean Connolly	PERSON	0.99+
Larry Ellison	PERSON	0.99+
Ronald Coase	PERSON	0.99+
Dell	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Germany	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Raj	PERSON	0.99+
Scott McNeely	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
$261 million	QUANTITY	0.99+
Andy Jasseys	PERSON	0.99+
AWS	ORGANIZATION	0.99+
82%	QUANTITY	0.99+
$250 million	QUANTITY	0.99+
16 years	QUANTITY	0.99+
100%	QUANTITY	0.99+
Dave	PERSON	0.99+
84%	QUANTITY	0.99+
23 years	QUANTITY	0.99+
18 months	QUANTITY	0.99+
Scott Di	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
last week	DATE	0.99+
Extensia	ORGANIZATION	0.99+
Oracles	ORGANIZATION	0.99+

Steve Roberts, IBM– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: Covering DataWorks Summit, Europe 2017, brought to you by Hortonworks. >> Welcome back to Munich everybody. This is The Cube. We're here live at DataWorks Summit, and we are the live leader in tech coverage. Steve Roberts is here as the offering manager for big data on power systems for IBM. Steve, good to see you again. >> Yeah, good to see you Dave. >> So we're here in Munich, a lot of action, good European flavor. It's my second European, formerly Hadoop Summit, now DataWorks. What's your take on the show? >> I like it. I like the size of the venue. It's the ability to interact and talk to a lot of the different sponsors and clients and partners, so the ability to network with a lot of people from a lot of different parts of the world in a short period of time, so it's been great so far and I'm looking forward to building upon this and towards the next DataWorks Summit in San Jose. >> Terri Virnig VP in your organization was up this morning, had a keynote presentation, so IBM got a lot of love in front of a fairly decent sized audience, talking a lot about the sort of ecosystem and that's evolving, the openness. Talk a little bit about open generally at IBM, but specifically what it means to your organization in the context of big data. >> Well, I am from the power systems team. So we have an initiative that we have launched a couple years ago called Open Power. And Open Power is a foundation of participants innovating from the power processor through all aspects, through accelerators, IO, GPUs, advanced analytics packages, system integration, but all to the point of being able to drive open power capability into the market and have power servers delivered not just through IBM, but through a whole ecosystem of partners. This compliments quite well with the Apache, Hadoop, and Spark philosophy of openness as it relates to software stack. So our story's really about being able to marry the benefits of open ecosystem for open power as it relates to the system infrastructure technology, which drives the same time to innovation, community value, and choice for customers as it relates to a multi-vendor ecosystem and coupled with the same premise as it relates to Hadoop and Spark. And of course, IBM is making significant contributions to Spark as part of the Apache Spark community and we're a key active member, as is Hortonworks with the ODPi organization forwarding the standards around Hadoop. So this is a one, two combo of open Hadoop, open Spark, either from Hortonworks or from IBM sitting on the open power platform built for big data. No other story really exists like that in the market today, open on open. >> So Terri mentioned cognitive systems. Bob Picciano has recently taken over and obviously has some cognitive chops, and some systems chops. Is this a rebranding of power? Is it sort of a layer on top? How should we interpret this? >> No, think of it more as a layer on top. So power will now be one of the assets, one of the sort of member family of the cognitive systems portion on IBM. System z can also be used as another great engine for cognitive in certain clients, certain use cases where they want to run cognitive close to the data and they have a lot of data sitting on System z. So power systems as a server really built for big data and machine learning, in particular our S822LC for high performance computing. This is a server which is landing very well in the deep learning, machine learning space. It offers the Tesla P100 GPU and with the NVIDIA NVLink technology can offer up to 2.8x bandwidth benefits CPU to GPU over what would be available through a PCIe Intel combination today. So this drives immediate value when you need to ensure that not just you're exploiting GPUs, but you of course need to move your data quickly from the processor to the GPU. >> So I was going to ask you actually, sort of what make power so well suited for big data and cognitive applications, particularly relative to Intel alternatives. You touched on that. IBM talks a lot about Moore's Law starting to hit its peak, that innovation is going to come from other places. I love that narrative 'cause it's really combinatorial innovation that's going to lead us in the next 50 years, but can we stay on that thread for a bit? What makes power so substantially unique, uniquely suited and qualified to run cognitive systems and big data? >> Yeah, it actually starts with even more of the fundamentals of the power processors. The power processor has eight threads per core in contrast to Intel's two threads per core. So this just means for being able to parallelize your workloads and workloads that come up in the cognitive space, whether you're running complex queries and need to drive SQL over a lot of parallel pipes or you're writing iterative computation, the same data set as when you're doing model training, these can all benefit from highly parallelized workloads, which can benefit from this 4x thread advantage. But of course to do this, you also need large, fast memory, and we have six times more cache per core versus Broadwell, so this just means you have a lot of memory close to the processor, driving that throughput that you require. And then on top of that, now we get to the ability to add accelerators, and unique accelerators such as I mentioned the NVIDIA in the links scenario for GPU or using the open CAPI as an approach to attach FPGA or Flash to get access speeds, processor memory access speeds, but with an attached acceleration device. And so this is economies of scale in terms of being able to offload specialized compute processing to the right accelerator at the right time, so you can drive way more throughput. The upper bounds are driving workload through individual nodes and being able to balance your IO and compute on an individual node is far superior with the power system server. >> Okay, so multi-threaded, giant memories, and this open CAPI gives you primitive level access I guess to a memory extension, instead of having to-- >> Yeah, pluggable accelerators through this high speed memory extension. >> Instead of going through, what I often call the horrible storage stack, aka SCSI, And so that's cool, some good technology discussion there. What's the business impact of all that? What are you seeing with clients? >> Well, the business impact is not everyone is going to start with supped up accelerated workloads, but they're going to get there. So part of the vision that clients need to understand is to begin to get more insights from their data is, it's hard to predict where your workloads are going to go. So you want to start with a server that provides you some of that upper room for growth. You don't want to keep scaling out horizontally by requiring to add nodes every time you need to add storage or add more compute capacity. So firstly, it's the flexibility, being able to bring versatile workloads onto a node or a small number of nodes and be able to exploit some of these memory advantages, acceleration advantages without necessarily having to build large scale out clusters. Ultimately, it's about improving time to insights. So with accelerators and with large memory, running workloads on a similar configured clusters, you're simply going to get your results faster. For example, recent benchmark we did with a representative set of TPC-DS queries on Hortonworks running on Linux and power servers, we're able to drive 70% more queries per hour over a comparable Intel configuration. So this is just getting more work done on what is now similarly priced infrastructure. 'Cause power family is a broad family that now includes 1U, 2U, scale out servers, along with our 192 core horsepowers for enterprise grade. So we can directly price compete on a scale out box, but we offer a lot more flexible choice as clients want to move up in the workload stack or to bring accelerators to the table as they start to experiment with machine learning. >> So if I understand that right, I can turn two knobs. I can do the same amount of work for less money, TCO play. Or, for the same amount of money, I can do more work. >> Absolutely >> Is that fair? >> Absolutely, now in some cases, especially in the Hadoop space, the size of your cluster is somewhat gated by how much storage you require. And if you're using the classic scale up storage model, you're going to have so many nodes no matter what 'cause you can only put so much storage on the node. So in that case, >> You're scaling storage. >> Your clusters can look the same, but you can put a lot more workload on that cluster or you can bring in IBM, a solution like IBM Spectrum Scale our elastic storage server, which allows you to essentially pull that storage off the nodes, put it in a storage appliance, and at that point, you now have high speed access to storage 'cause of course the network bandwidth has increased to the point that the performance benefit of local storage is no longer really a driving factor to a classic Hadoop deployment. You can get that high speed access in a storage appliance mode with the resiliency at far less cost 'cause you don't need 3x replication, you just have about a 30% overhead for the software erasure coding. And now with your compete nodes, you can really choose and scale those nodes just for your workload purposes. So you're not bound by the number of nodes equal total storage required by storage per node, which is a classic, how big is my cluster calculation. That just doesn't work if you get over 10 nodes, 'cause now you're just starting to get to the point where you're wasting something right? You're either wasting storage capacity or typically you're wasting compute capacity 'cause you're over provisioned on one side or the other. >> So you're able to scale compute and storage independent and tune that for the workload and grow that resource efficiently, more efficiently? >> You can right size the compute and storage for your cluster, but also importantly is you gain the flexibility with that storage tier, that data plan can be used for other non-HDFS workloads. You can still have classic POSIX applications or you may have new object based applications and you can with a single copy of the data, one virtual file system, which could also be geographically distributed, serving both Hadoop and non-Hadoop workloads, so you're saving then additional replicas of the data from being required by being able to onboard that onto a common data layer. >> So that's a return on asset play. You got an asset that's more fungible across the application portfolio. You can get more value out of it. You don't have to dedicate it to this one workload and then over provision for another one when you got extra capacity sitting here. >> It's a TCO play, but it's also a time saver. It's going to get you time to insight faster 'cause you don't have to keep moving that data around. The time you spend copying data is time you should be spending getting insights from the data, so having a common data layer removes that delay. >> Okay, 'cause it's HDFS ready I don't have to essentially move data from my existing systems into this new stovepipe. >> Yeah, we just present it through the HDFS API as it lands in the file system from the original application. >> So now, all this talk about rings of flexibility, agility, etc, what about cloud? How does cloud fit into this strategy? What do are you guys doing with your colleagues and cohorts at Bluemix, aka SoftLayer. You don't use that term anymore, but we do. When we get our bill it says SoftLayer still, but any rate, you know what I'm talking about. The cloud with IBM, how does it relate to what you guys are doing in power systems? >> Well the cloud is still, really the born on the cloud philosophy of IBM software analytics team is still very much the motto. So as you see in the data science experience, which was launched last year, born in the cloud, all our analytics packages whether it be our BigInsights software or our business intelligence software like Cognos, our future generations are landing first in the cloud. And of course we have our whole arsenal of Watson based analytics and APIs available through the cloud. So what we're now seeing as well as we're taking those born in the cloud, but now also offering a lot of those in an on-premise model. So they can also participate in the hybrid model, so data science experience now coming on premise, we're showing it at the booth here today. Bluemix has a on premise version as well, and the same software library, BigInsights, Cognos, SPSS are all available for on prem deployment. So power is still ideal place for hosting your on prem data and to run your analytics close to the data, and now we can federate that through hybrid access to these elements running in the cloud. So the focus is really being able to, the cloud applications being able to leverage the power and System z's based data through high speed connectors and being able to build hybrid configurations where you're running your analytics where they most make sense based upon your performance requirements, data security and compliance requirements. And a lot of companies, of course, are still not comfortable putting all their jewels in the cloud, so typically there's going to be a mix and match. We are expanding the footprint for cloud based offerings both in terms of power servers offered through SoftLayer, but also through other cloud providers, Nimbix is a partner we're working with right now who actually is offering our Power AI package. Power AI is a package of open source, deep learning frameworks, packaged by IBM, optimized for Power in an easily deployed package with IBM support available. And that's, could be deployed on premise in a power server, but also available on a pay per drink purpose through the Nimbix cloud. >> All right, we covered a lot of ground here. We talked strategy, we talked strategic fit, which I guess is sort of a adjunct to strategy, we talked a little bit about the competition and where you differentiate, some of the deployment models, like cloud, other bits and pieces of your portfolio. Can we talk specifically about the announcements that you have here at this event, just maybe summarize for use? >> Yeah, no absolutely. As it relates to IBM, and Hadoop, and Spark, we really have the full stack support, the rich analytics capabilities that I was mentioning, deep insight, prescriptive insights, streaming analytics with IBM Streams, Cognos Business Intelligence, so this set of technologies is available for both IBMs, Hadoop stack, and Hortonworks Hadoop stack today. Our BigInsights and IOP offering, is now out for tech preview, their next release their 4.3 release, is available for technical preview will be available for both Linux on Intel, Linux on power towards the end of this month, so that's kind of one piece of new Hadoop news at the analytics layer. As it relates to power systems, as Hortonworks announced this morning, HDP 2.6 is now available for Linux on power, so we've been partnering closely with Hortonworks to ensure that we have an optimized story for HDP running on power system servers as the data point I shared earlier with the 70% improved queries per hour. At the storage layer, we have a work in progress to certify Hortonworks, to certify Spectrum Scale file system, which really now unlocks abilities to offer this converged storage alternative to the classic Hadoop model. Spectrum Scale actually supports and provides advantages in both a classic Hadoop model with local storage or it can provide the flexibility of offering the same sort of multi-application support, but in a scale out model for storage that it also has the ability to form a part of a storage appliance that we call Elastic Storage Server, which is a combination of power servers and high density storage enclosures, SSD or spinning disk, depending upon the, or flash, depending on the configuration, and that certification will now have that as an available storage appliance, which could underpin either IBM Open Platform or HDP as a Hadoop data leg. But as I mentioned, not just for Hadoop, really for building a common data plane behind mixed analytics workloads that reduces your TCO through converged storage footprint, but more importantly, provides you that flexibility of not having to create data copies to support multiple applications. >> Excellent, IBM opening up its portfolio to the open source ecosystem. You guys have always had, well not always, but in the last 20 years, major, major investments in open source. They continue on, we're seeing it here. Steve, people are filing in. The evening festivities are about to begin. >> Steve: Yeah, yeah, the party will begin shortly. >> Really appreciate you coming on The Cube, thanks very much. >> Thanks a lot Dave. >> You're welcome. >> Great to talk to you. >> All right, keep it right there everybody. John and I will be back with a wrap up right after this short break, right back.

Published Date : Apr 6 2017

SUMMARY :

brought to you by Hortonworks. Steve, good to see you again. Munich, a lot of action, so the ability to network and that's evolving, the openness. as it relates to the system and some systems chops. from the processor to the GPU. in the next 50 years, and being able to balance through this high speed memory extension. What's the business impact of all that? and be able to exploit some of these I can do the same amount of especially in the Hadoop space, 'cause of course the network and you can with a You don't have to dedicate It's going to get you I don't have to essentially move data as it lands in the file system to what you guys are and to run your analytics a adjunct to strategy, to ensure that we have an optimized story but in the last 20 years, Steve: Yeah, yeah, the you coming on The Cube, John and I will be back with a wrap up

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Steve	PERSON	0.99+
Steve Roberts	PERSON	0.99+
Dave	PERSON	0.99+
Munich	LOCATION	0.99+
Bob Picciano	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Terri	PERSON	0.99+
3x	QUANTITY	0.99+
six times	QUANTITY	0.99+
70%	QUANTITY	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
two knobs	QUANTITY	0.99+
Bluemix	ORGANIZATION	0.99+
NVIDIA	ORGANIZATION	0.99+
eight threads	QUANTITY	0.99+
Linux	TITLE	0.99+
Hadoop	TITLE	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
Nimbix	ORGANIZATION	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.98+
SoftLayer	TITLE	0.98+
second	QUANTITY	0.97+
Hadoop Summit	EVENT	0.97+
Intel	ORGANIZATION	0.97+
Spark	TITLE	0.97+
IBMs	ORGANIZATION	0.95+
single copy	QUANTITY	0.95+
end of this month	DATE	0.95+
Watson	TITLE	0.95+
S822LC	COMMERCIAL_ITEM	0.94+
Europe	LOCATION	0.94+
this morning	DATE	0.94+
firstly	QUANTITY	0.93+
HDP 2.6	TITLE	0.93+
first	QUANTITY	0.93+
HDFS	TITLE	0.91+
one piece	QUANTITY	0.91+
Apache	ORGANIZATION	0.91+
30%	QUANTITY	0.91+
ODPi	ORGANIZATION	0.9+
DataWorks Summit Europe 2017	EVENT	0.89+
two threads per core	QUANTITY	0.88+
SoftLayer	ORGANIZATION	0.88+

Day 1 Wrap - DataWorks Summit Europe 2017 - #DWS17 - #theCUBE

(Rhythm music) >> Narrator: Live, from Munich, Germany, it's The Cube. Coverage, DataWorks Summit Europe, 2017. Brought to you by Hortonworks. >> Okay, welcome back everyone. We are live in Munich, Germany for DataWorks 2017, formally known as Hadoop Summit. This is The Cube special coverage of the Big Data world. I'm John Furrier my co-host Dave Vallente. Two days of live coverage, day one wrapping up. Now, Dave, we're just kind of reviewing the scene here. First of all, Europe is a different vibe. But the game is still the same. It's about Big Data evolving from Hadoop to full open source penetration. Puppy's now public in markets Hortonworks, Cloudera is now filing an S-1, Neosoft, Talon, variety of the other public companies. Alteryx. Hadoop is not dead, it's not dying. It certainly is going to have a position in the industry, but the Big Data conversation is front and center. And one thing that's striking to me is that in Europe, more than in the North America, is IOT is more centrally themed in this event. Europe is on the Internet of Things because of the manufacturing, smart cities. So this is a lot of IOT happening here, and I think this is a big discovery certainly, Hortonworks event is much more of a community event than Strata Hadoop. Which is much more about making money and modernization. This show's got a lot more engagement with real conversations and developers sessions. Very engaging audience. Well, yeah, it's Europe. So you've go a little bit different smaller show than North America but to me, IOT, Internet of Things, is bringing the other cloud world with Big Data. That's the forcing function. And real time data is the center of the action. I think is going to be a continuing theme as we move forward. >> So, in 2010 John, it was all about 'What is Hadoop?' With the middle part of that decade was all about Hadoop's got to go into the enterprise. It's gone mainstream in to the enterprise, and now it's sort of 'what's next?' Same wine new bottle. But I will say this, Hadoop, as you pointed out, is not dead. And I liken it to the early web. Web one dot O it was profound. It was a new paradigm. The profundity of Hadoop was that you could ship five megabytes of code to a petabyte of data. And that was the new model and that's spawned, that's catalyzed the Big Data movement. That is with us now and it's entrenched, and now you're seeing layers of innovation on top of that. >> Yeah, and I would just reiterate and reinforce that point by saying that Cloudera, the founders of this industry if you will, with Hadoop the first company to be commercially funded to do what Hortonworks came in after the fact out of Yahoo, came out of a web-scale world. So you have the cloud native DevOps culture, Amar Ujala's at Yahoo, Mike Olson, Jeff Hammerbacher, Christopher Vercelli. These guys were hardcore large-scale data guys. Again, this is the continuation of the evolution, and I think nothing is changed it that regard because those pioneers have set the stage for now the commercialization and now the conversation around operationalizing this cloud is big. And having Alan Nance, a practitioner, rock-star, talking about radical deployments that can drop a billion dollars at a cost savings to the bottom line. This is the kind of conversations we're going to see more of this is going to change the game from, you know, "Hey, I'm the CFO buyer" or "CIO doing IT", to an operational CEO, chief operating officer level conversation. That operational model of cloud is now coming into the view what ERP did in software, those kinds of megatrends, this is happening right now. >> As we talk about the open, the people who are going to make the real money on Big Data are the practitioners, those people applying it. We talked about Alan Nance's example of billion dollar, half a billion dollar cost-savings revenue opportunities, that's where the money's being made. It's not being made, yet anyway with these public companies. You're seeing it Splunk, Tableau, now Cloudera, Hortonworks, MapR. Is MapR even here? >> Haven't seen 'em. >> No I haven't seen MapR, they used to have pretty prominent display at the show. >> You brought up point I want to get back to. This relates to those guys, which is, profitless prosperity. >> Yeah. >> A term used for open source. I think there's a trend happening and I can't put a finger on it but I can kind of feel it. That is the ecosystems of open source are now going to a dimension where they're not yet valued in the classic sense. Most people that build platforms value ecosystems, that's where developers came from. Developer ecosystems fuel open source. But if you look at enterprise, at transformations over the decades, you'd see the successful companies have ecosystems of channel partners; ecosystems of indirect sales if you will. We're seeing the formation, at least I can start seeing the formation of an indirect engine of value creation, vis-à-vis this organic developer community where the people are building businesses and companies. Shaun Connolly pointed to Thintech as an example. Where these startups became financial services businesses that became Thintech suppliers, the banks. They're not in the banking business per se, but they're becoming as important as banks 'cuz they're the providers in Thintech, Thintech being financial tech. So you're starting to see this ecosystem of not "channel partners", resell my equipment or software in the classic sense as we know them as they're called channel partners. But if this continues to develop, the thousand flower blooming strategy, you could argue that Hortonworks is undervalued as a company because they're not realizing those gains yet or those gains can't be measured. So if you're an MBA or an investment banker, you've got to be looking at the market saying, "wow, is there a net-present value to an ecosystem?" It begs the question Dave. >> Dave: It's a great question John. >> This is a wealth creation. A rising tide floats all boats, in that rising tide is a ecosystem value number there. No one has their hands on that, no one's talked about that. That is the upshot in my mind, the silver-lining to what some are saying is the consolidation of Hadoop. Some are saying Cloudera is going to get a huge haircut off their four point one billion dollar value. >> Dave: I think that's inevitable. >> Which is some say, they may lose two to three billion in value, in the IPO. Post IPO which would put them in line with Hortonworks based on the numbers. You know, is that good or bad? I don't think it's bad because the value shifts to the ecosystem. Both Cloudera and Hortonworks both play in open source so you can be glass half-full on one hand, on the haircut, upcoming for Cloudera, two saying "No, the glass is half-full because it's a haircut in the short-term maybe", if that happens. I mean some said Pure Storage was going to get a haircut, they never really did Dave. So, again, no one yet has pegged the valuation of an ecosystem. >> Well, and I think that is a great point, personally I think, I've been sort of racking my brain, will this Big Data hike be realized. Like the internet. You remember the internet hyped up, then it crashed; no one wanted to own any of these companies. But it actually lived up to the hype. It actually exceeded the hype. >> You can get pet food online now, it's called amazon. [Co-Hosts Chuckle Together] All the e-commerce played out. >> Right, e-commerce played out. But I think you're right. But everybody's expecting sort of, was expecting a similar type of cycle. "Oh, this will replace that." And that's now what's going to happen. What's going to happen is the ecosystem is going to create a flywheel effect, is really what you're saying. >> Jeff: Yes. >> And there will be huge valuations that emerge out of this. But today, the guys that we know and love, the Hortonworks, the Clouderas, et cetera, aren't really on the winners list, I mean some of their founders maybe are. But who are the winners? Maybe the customers because they saw a big drop in cost. Apache's a big winner here. Wouldn't ya say? >> Yeah. >> Apache's looking pretty good, Apache Foundation. I would say AWS is a pretty big winner. They're drifting off of this. How about Microsoft and IBM? I mean I feel in a way IBM is sort of co-opted this Big Data meme, and said, "okay, cognitive." And layered all of it's stuff on top of it. Bought the weather company, repositioned the company, now it hasn't translated in to growth, but certainly has profitability implications. >> IBM plays well here, I'll tell you why. They're very big in open source, so that's positive. Two, they have huge track record and staff dealing with professional services in the enterprise. So if transformation is the journey conversation, IBM's right there. You can't ignore IBM on this one. Now, the stack might be different, but again, beauty is in the eye of the beholder because depending on what work clothes you have it depends. IBM is not going to leave you high and dry 'cuz they have a really you need for what they can do with their customers. Where people are going to get blindsided in my opinion, the IBMs and Oracles of the world, and even Microsoft, is what Alan Nance was talking about, the radical transformation around the operating model is going to force people to figure out when to start cannibalizing their own stacks. That's going to be a tell sign for winners and losers in the big game. Because if IBM can shift quickly and co-op the megatrends, make it their own, get out in front of that next wave as Pat Gelsinger would say, they could surf that wave and then tweak, and then get out in front. If they don't get behind that next wave, they're driftwood. It really is all about where you are in the spectrum, and analytics is one of those things in data where, you've got to have a cohesive horizontal strategy. You got to be horizontally scalable with data. You got to make data freely available. You have to have an abstraction layer of software that will allow free movement of data, across systems. That's the number one thing that comes out of seeing the Hortonwork's data platform for instance. Shaun Connolly called it 'connective tissue'. Cloudera is the same thing, they have to start figuring out ways to be better at the data across the horizontal view. Cloudera like IBM has an opportunity as well, to get out in front of the next wave. I think you can see that with AI and machine learning, clearly they're going to go after that. >> Just to finish off on the winners and losers; I mean, the other winner is systems integrators to service these companies. But I like what you said about cannibalizing stacks as an indicator of what's happening. So let's talk about that. Oracle clearly cannibalizing it's stacks, saying, "okay, we're going to the red stack to the cloud, go." Microsoft has made that decision to do that. IBM? To a large degree is cannibalizing it's stack. HP sold off it's stack, said, "we don't want to cannibalize our stack, we want to sell and try to retool." >> So, your question, your point? >> So, haven't they already begun to do that, the big legacy companies? >> They're doing their tweaking the collet and mog, as an example. At Oracle Open World and IBM Interconnect, all the shows we, except for Amazon, 'cuz they're pure cloud. All are taking the unique differentiation approach to their own stuff. IBM is putting stuff that's relate to IBM in their cloud. Oracle differentiates on their stack, for instance, I have no problem with Oracle because they have a huge database business. And, you're high as a kite if you think Oracle's going to lose that database business when data is the number one asset in the world. What Oracle's doing which I think is quite brilliant on Oracle's part is saying, "hey, if you want to run on premise with hardware, we got Sun, and oh by the way, our database is the fastest on our stuff." Check. Win. "Oh you want to move to the cloud? Come to the Oracle cloud, our database runs the fastest in our cloud", which is their stuff in the cloud. So if you're an Oracle customer you just can't lose there. So they created an inimitability around their own database. So does that mean they're going to win the new database war? Maybe not, but they can coexist as a system of records so that's a win. Microsoft Office 365, tightly coupling that with Azure is a brilliant move. Why wouldn't they do that? They're going to migrate their customer base to their own clouds. Oracle and Microsoft are going to migrate their customers to their own cloud. Differentiate and give their customers a gateway to the cloud. VVMware is partnering with Amazon. Brilliant move and they just sold vCloud Air which we reported at Silicon Angle last night, to a French company recently so vCloud Air is gone. Now that puts the VMware clearly in bed with Amazon web services. Great move for VMware, benefit to AWS, that's a differentiation for VMware. >> Dave: Somebody bought vCloud Air? >> I think you missed that last night 'cuz you were traveling. >> Chuckling: That's tongue-in-cheek, I mean what did they get for vCloud Air? >> OVH bought them, French company. >> More de-levering by Michael. >> Well, they're inter-clouding right? I mean de-leveraging the focus, right? So OVH, French company, has a very much coexisted... >> What'd they pay? >> ... strategy. It's undisclosed. >> Yeah, well why? 'Cuz it wasn't a big number. That's my point. >> Back to the other cloud players, Google. I think Google's differentiating on their technology. Great move, smart move. They just got to get, as someone who's been following them, and you know, you and I both love an enterprise experience. They got to speak the enterprise language and execute the language. Not through 19 year olds and interns or recent smart college grads ad and say, "we're instantly enterprise." There's a dis-economies of scale for trying to ramp up and trying to be too heavy on the enterprise. Amazon's got the same problem, you can't hire sales guy fast enough, and oh by the way, find me a sales guy that has ten 15 years executive selling experience to a complex strategic sales, like the enterprise where you now have stakeholders that are in multiple roles and changing roles as Alan Nance pointed out. So the enterprise game is very difficult. >> Yup. >> Very very difficult. >> Well, I think these dupe startups are seeing that. None of them are making money. Shaun Connolly basically said, "hey, it used to be growth they would pay for growth, but now their punishing you if you don't have growth plus profitability." By the way, that's not all totally true. Amazon makes no money, unless stock prices go through the roof. >> There is no self-service, there is no self-service business model for digital transformation for enterprise customers today. It doesn't exist. The value proposition doesn't resinate with customers. It works good for Shadow IT, and if you want to roll out G Suite in some pockets of your organization, but an ad-sense sales force doesn't work in the enterprise. Everyone's finding that out right now because they're basically transforming their enterprise. >> I think Google's going to solve their problem. I think Google has to solve their problem 'cuz... >> I think they will, but to me it's, buy a company, there's a zillion company out there they could buy tomorrow that are private, that have like 300 sales people that are senior people. Pay the bucks, buy a sales force, roll your stuff out and start speaking the language. I think Dianne Green gets this. So, I think, I expect to see Google ... >> Dave: Totally. >> do some things in that area. >> And I think, to you're point, I've always said the rich get richer. The traditional legacy companies, they're holding servant in this. They waited they waited they waited, and they said, "okay now we're going to go put our chips on the table." Oracle made it's bets. IBM made it's bets. HP, not really, betting on hardware. Okay. Fine. Cisco, Microsoft, they're all making their bets. >> It's all about bets on technology and profitability. This is what I'm looking at right now Dave. We talked about it on our intro. Shaun Connolly who's in charge of strategy at Hortonworks clarified it that clearly revenue, losing money is not going to solve the problem for credibility. Profitability matters. This comes back to the point we've said on The Cube multiple years ago and even just as recently as last year, that the world's flipping back down to credibility. Customers in the enterprise want to see credibility and track record. And they're going to evaluate the suppliers based upon key fundamentals in their business. Can they make money? Can they deliver SLAs? These are going to be key requirements, not the shiny new toy from Silicon Valley. Or the cool machine learning algorithm. It has to apply to their product, their value, and they're going to look to companies on the scoreboard and say, "are you profitable?" As a proxy for relevance. >> Well I want to keep it, but I do want to, we've been kind of critical of some of the Hadoop players. Cloudera and Hortonworks specifically. But I want to give them props 'cuz you remember well John, when the legacy enterprise guys started coming into the Hadoop market they all said that they had the same messaging, "we're going to make Hadoop enterprise ready." You remember that well, and I have to say that Hortonworks, Cloudera, I would say MapR as well and the ecosystem, have done a pretty good job of making Hadoop and Big Data enterprise ready. They were already working on it very hard, I think they took it seriously and I think that that's why they are in the mix and they are growing as they are. Shaun Connolly talked about them being operating cashflow positive. Eking out some plus cash. On the next earnings call, pressures on. But we want to see, you know, rocket ships. >> I think they've done a good job, I mean, I don't think anyone's been asleep at the switch. At all, enterprise ready. The questions always been "can they get there fast enough?" I think everyone's recognized that cost of ownership's down. We still solicit on the OpenStack ecosystem, and that they move right from the valley properties. So we'll keep an eye on it, tomorrow we'll be checking in. We got a great day tomorrow. Live coverage here in Munich, Germany for DataWorks 2017. More coverage tomorrow, stay with us. I'm John Furrier with Dave Vallente. Be right back with more tomorrow, day two. Keep following us.

Published Date : Apr 6 2017

SUMMARY :

Brought to you by Hortonworks. Europe is on the Internet of Things And I liken it to the early web. the founders of this industry if you will, on Big Data are the practitioners, prominent display at the show. This relates to those guys, which is, That is the ecosystems of open source the silver-lining to what some are saying on one hand, on the haircut, You remember the internet hyped up, All the e-commerce played out. the ecosystem is going to the Hortonworks, the Clouderas, et cetera, Bought the weather company, IBM is not going to leave you high and dry the red stack to the cloud, go." Now that puts the VMware clearly in bed I think you missed that last night I mean de-leveraging the focus, right? It's undisclosed. 'Cuz it wasn't a big number. like the enterprise where you now have By the way, that's not all totally true. and if you want to roll out G Suite I think Google has to start speaking the language. And I think, to you're point, that the world's flipping of some of the Hadoop players. We still solicit on the

ENTITIES

Entity	Category	Confidence
Dave Vallente	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Cisco	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Michael	PERSON	0.99+
Dianne Green	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Shaun Connolly	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Jeff Hammerbacher	PERSON	0.99+
Alan Nance	PERSON	0.99+
Europe	LOCATION	0.99+
two	QUANTITY	0.99+
Pat Gelsinger	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Apache	ORGANIZATION	0.99+
John	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
Christopher Vercelli	PERSON	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Thintech	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
billion dollar	QUANTITY	0.99+
VVMware	ORGANIZATION	0.99+
three billion	QUANTITY	0.99+
last year	DATE	0.99+
Silicon Valley	LOCATION	0.99+
Sun	ORGANIZATION	0.99+
Mike Olson	PERSON	0.99+
Two days	QUANTITY	0.99+
North America	LOCATION	0.99+
2010	DATE	0.99+
Neosoft	ORGANIZATION	0.99+
Talon	ORGANIZATION	0.99+

Chandra Mukhyala, IBM - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: theCUBE covering, DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Welcome back to the DataWorks Summit in Munich everybody. This is The Cube, the leader in live tech coverage. Chandra Mukhyala is here. He's the offering manager for IBM Storage. Chandra, good to see you. It always comes back to storage. >> It does, it's the foundation. We're here at a Data Show, and you got to put the data somewhere. How's the show going? What are you guys doing here? >> The show's going good. We have lots of participation. I didn't expect this big a crowd, but there is good crowd. Storage, people don't look at it as the most sexy thing but I still see a lot of people coming and asking. "What do you have to do with Hadoop?" kind of questions which is exactly the kind of question I expect. So, going good, we're able to-- >> It's interesting, in the early days of Hadoop and big data, I remember we interviewed, John and I interviewed Jeff Hammerbacher, founder of Cloudera and he was at Facebook and he said, "My whole goal at Facebook "when we're working with Hadoop was to "eliminate the storage container "and the expensive storage container." They succeeded, but now you see guys like you coming in and saying, "Hey, we have better storage." Why does the world need anything different than HDFS? >> This has been happening for the last two decades, right? In storage, every few years a startup comes, they address one problem very well. They address one problem and create a whole storage solution around that. Everybody understands the benefit of it and that becomes part of the main storage. When I say main storage, because these new point solutions address one problem but what about all the rest of the features storage has been developing for decades. Same thing happened with other solutions, for example, deduplication. Very popular, right at one point, dedupe appliances. Nowadays, every storage solution has dedupe in. I think same thing with HDFS right? HDFS's purpose is built for Hadoop. It solves that problem in terms of giving local access storage, scalable storage, big plural storage. But, it's missing out many things you know. One of the biggest problems they have with HDFS is it's siloed storage, meaning that data is only available, the data in HDFS is only for Hadoop. You can't, what about the rest of the applications in the organizations, who may need it through traditional protocols like NFS, or SMB or they maybe need it through new applications like S3 interfaces or Swift interfaces. So, you don't want that siloed storage. That's one of the biggest problems we have. >> So, you're putting forth a vision of some kind horizontal infrastructure that can be leveraged across your application portfolio... >> Chandra: Yes. >> How common is that? And what's the value of that? >> It's not really common, that's one of the stories, messages we're trying to get out. And I've been talking to data scientists in the last one year, a lot of them. One of the first things they do when they are implementing a Hadoop project is, they have to copy a lot data into HDFS Because before they could enter it just as HDFS they can't on any set. That copy process takes days. >> Dave: That's a big move, yeah. >> It's not only wasting time from a data scientist, but it also makes the data stale. I tell them you don't have to do that if your data was on something like IBM Spectrum Scale. You can run Hadoop straight off that, why do you even have to copy into HDFS. You can use the same existing applications map, and just applications with zero change to it and pour in them at Spectrum Scale it can still use the HSFS API. You don't have to copy that. And every data scientists I talk to is like, "Really?" "I don't know how to do this, I'm wasting time?" Yes. So, it's not very well known that, you know, most people think that there's only one way to do Hadoop applications, in sometimes HDFS. You don't have to. And advantages there is, one, you don't have to copy, you can share the data with the rest of the applications but its no more stale data. But also, one other big difference between the HDFS type of storage versus shared storages. In the shared, which is what HDFS is, the various scale is by adding new nodes, which adds both compute and storage. What if our applications, which don't necessarily need need more compute, all they need is more throughput. You're wasting computer resources, right? So there are certain applications where a share nothing is a better architecture. Now the solution which IBM has, will allow you to deploy it in either way. Share nothing or shared storage but that's one of the main reasons, people want to, data scientists especially, want to look at these alternative solutions for storage. >> So when I go back to my Hammerbacher example, it worked for a Facebook of the early days because they didn't have a bunch of legacy data hanging around, they could start with, pretty much, a blank piece of paper. >> Yes. >> Re-architect, plus they had such scale, they probably said, "Okay, we don't want to go to EMC "and NetApp or IBM, or whomever and buy storage, "we want to use commodity components." Not every enterprise can do that, is what you're saying. >> Yes, exactly. It's probably okay for somebody like a very large search engine, when all they're doing is analytics, nothing else. But if you to any large commercial enterprise, they have lots of, the whole point around analytics is they want to pool all of the data and look at that. So, find the correlations, right? It's not about analyzing one small, one dataset from one business function. It's about pooling everything together and see what insights can I get out of it. So that's one of the reasons it's very important to have support to access the data for your legacy enterprise applications, too, right? Yeah, so NFS and SMB are pretty important, so are S3 and Swift, but also for these analytics applications, one of the advantage of IBM Solution here is we provide local access for file system. Not necessarily through mass protocols like an access, we do that, but we also have PO SIX access to have data local access to the file system. With that, HDFS you have to first copy the file into HDFS, you had to bring it back to do anything with that. All those copy operations go away. And this is important, again in enterprise, not just for data sharing but also to get local access. >> You're saying your system is Hadoop ready. >> Chandra: It is. >> Okay. And then, the other thing you hear a lot from IT practitioners anyway, not so much from from the line of businesses, that when people spin up these Hadoop projects, big data projects, they go outside of the edicts of the organization in terms of governance and compliance, and often, security. How do you solve, do you solve that problem? >> Yeah, that's one of the reason to consider again, the enterprise storage, right? It's not just because you have, you're able to share the data with rest of applications, but also the whole bunch of data management features, including data governance features. You can talk about encryption there, you can talk about auditing there, you can talk about features like WAN, right, WAN, so data is, especially archival data, once you write you can't modify that. There are a whole bunch of features around data retention, data governance, those are all part of the data management stack we have. You get that for free. You not only get universal access, unified access, but you also get data governance. >> So is this one of the situations where, on the face of it, when you look at the CapEx, you say, "Oh, wow, I cause use commodity components, save a bunch of money." You know, you remember the client server days. "Oh, wow, cheap, cheap, cheep, "microprocessor based solution," and then all the sudden, people realize we have to manage this. Have we seen a similar sort of trend with Hadoop, with the ability to or the complexity of managing all of this infrastructure? It's so high than it actually drives costs up. >> Actually there are two parts to it, right? There is actually value in utilizing commodity hardware, industry standards. That does reduce your costs right? If you can just buy a standard XL6 server we can, a storage server and utilize that, why not. That is kind of just because. But the real value in any kind of a storage data manage solution is in the software stack. Now you can reduce CapEx by using industry standards. It's a good thing to do and we should, and we support that but in the end, the data management is there in the software stack. What I'm saying is HDFS is solving one problem by dismissing the whole data management problems, which we just touched on. And that all comes in software which goes down under service. >> Well, and you know, it's funny, I've been saying for years, that if you peel back the onion on any storage device, the vast majority anyway, they're all based on standard components. It's the software that you're paying for. So it's sort of artificial in that a company like IBM will say, "Okay, we've got all this value in here, "but it's on top of commodity components, "we're going to charge for the value." >> Right. >> And so if you strip that out, sure, you do it yourself. >> Yeah, exactly. And it's all standard service. It's been like that always. Now one difference is ten years ago people used propriety array controllers. Now all of the functionalities coming into software-- >> ASICs, >> Recording. >> Yeah, 3PAR still has an ASIC, but most don't. >> Right, that's funny, they only come in like.. Almost everybody has some kind of a software-based recording and they're able to utilize sharing server. Now the reason advantage in appliance more over, because, yes it can run on industry's standard, but this is storage, this is where, that's a foundation of all of your inter sectors. And you want RAS, or you want reliability and availability. The only way to get that is a fully integrated, tight solution, where you're doing a lot of testing on the software and the hardware. Yes, it's supposed to work, but what really happens when it fails, how does the sub react. And that's where I think there is still a value for integrated systems. If you're a large customer, you have a lot of storage saving, source of the administrators and they know to build solutions and validate it. Yes, software based storage is the right answer for you. And you're the offering manager for Spectrum Scale, which is the file offering, right, that's right? >> Yes, right yes. >> And it includes object as well, or-- >> Spectrum Sale is a file and object storage pack. It supports both file and protocols. It also supports object protocols. The thing about object storage is it means different things to different people. To some people, it's the object interface. >> Yeah, to me it means get put. >> Yeah, that's what the definition is, then it is objectivity. But the fact is that everybody's supposed to stay in now. But to some of the people, it's not about the protocol, because they're going to still access by finding those protocols, but to them, it's about the object store, which means it's a flat name space and there's no hierarchical name structure, and you can get into billions of finites without having any scalable issues. That's an object store. But to some other people it's neither of those, it's about a range of coding which object storage, so it's cheap storage. It allows you to run on storage and service, and you get cheap storage. So it's three different things. So if you're talking about protocols yes, but their skill is by their definition is object storage, also. >> So in thinking about, well let's start with Spectrum Scale generally. But specifically, your angle in big data and Hadoop, and we talked about that a little bit, but what are you guys doing here, what are you showing, what's your partership with Hortonworks. Maybe talk about that a little bit. >> So we've been supporting this, what we call as Hadoop connector on Spectrum Scale for almost a year now, which is allowing our existing Spectrum Scale customers to run Hadoop straight on it. But if you look at the Hadoop distributions, there are two or three major ones, right? Cloudera, Hortonworks, maybe MapArt. One of the first questions we get is, we tell our customers you can run Hadoop on this. "Oh, is this supported by my distribution?" So that has been a problem. So what we announced is, we found a partnership with Hortonworks, so now Hortonwords is certifying IBM Spectrum Scale. It's not new code changes, it's not new features, but it's a validation and a stamp from Hortonworks, that's in the process. The result of is, Hortonworks certified reference architecture, which is what we announced. We announced it about a month ago. We should be publishing that soon. Now customers can have more confidence in the joint solutions. It's not just IBM saying that it's Hadoop ready, but it's Hortonworks backing that up. >> Okay, and your scope, correct me if I'm wrong, is sort of on prem and hybrid, >> Chandra: Yes. >> Not cloud services. That's kind of you might sell your technology internally, but-- >> Correct so IBM storage is primarily focused on on prem storage. We do have a separate cloud division, but almost every IBM storage production, especially Spectrum Scale, is what I can speak of, we treat them as hybrid cloud storage. What we mean that is we have built in capabilities, we have feature. Most of our products call transfer in cloud tiering, it allows you to set a policy on when data should be automatically tiered to the cloud. Everybody wants public, everybody wants on prem. Obviously there are pros and cons of on primary storage, versus off primary storage, but basially, it boils down to, if you want performance and security, you want to be on premises. But there's always some which is better to be in the cloud, and we try to automate that with our feature called transfer and cloud data. You set a policy based on age, based on the type of data, based on the ownership. The system will automatically tier the data to the cloud, and when a user access that cloud, it comes back automatically, too. It's all transferred to the end. So yes, we're a non primary storage business but our solutions are hybrid cloud storage. >> So, as somebody who knows the file business pretty well, let's talk about kind of the business file and sort of where it's headed. There's some mega trends and dislocations. There's obviously software defined. You guys have made a big investment in software defined a year and a half, two years ago. There's cloud, Amazon with S3 sort of shook up the world. I mean, at first it was sort of small, but then now, it's really catching on. Object obviously fits in there. What do you see as the future of file. >> That's a great question. When it comes to data layout, there's really a block file of object. Software defined and cloud are various ways of consuming storage. If you're large service probably, you would prefer a software based solution so you can run it on your existing service. But who are your preferred solutions? Depending on the organization's preferences for security, and how concerned they are about security and performance needs, they will prefer to run some of the applications on cloud. These are different ways of consuming storage. But coming back to file, an object right? So object is perfect if you are not going to modify the data. You're done writing that data, and you're not going to change. It just belongs an object store, right? It's more scalable storage, I say scalable because file systems are hierarchical in nature. Because it's a file system tree, you have travels through the various subtype trees. Beyond a few million subtype trees, it slows you down. But file systems have a strength. When you want to modify the file, any application which is going to edit the file, which is going to modify the file, that application belongs on file storage, not on object. But let's say you are dealing with medical images. You're not going to modify an x-ray once it's done. That's better suited on an object storage. So file storage will always have a place. Take video editing and all these videos they are doing, you know video, we do a lot of video editing. That belongs on file storage, not on object. If you care about file modifications and file performance, file is your answer, but if you're done and you just want to archive it, you know, you want a scalable storage, billions of objects, then object is answer. Now either of these can be software based storage or it could be appliance. That's again an organization's preference for do you want to integrate a robust ready, ready made solution, then appliance is an answer. "Ah, no I'm a large organization. "I have a lot of storage administered," as they can build something on their own, then software based is answer. Having most windows will give you a choice. >> What brought you to IBM. You used to be at NetApp. IBM's buying the weather company. Dell's buying EMC. What attracted you to IBM? Storage is the foundation which we have, but it's really about data, and it's really about making sense of it, right? And everybody saying data is the new oil, right? And IBM is probably the only company I can think of, which has the tools and the IT to make sense of all this. NetApp, it was great in early 2000s. Even as a storage foundation, they have issues, with scale out and a true scale out, not just a single name space. EMC is pure storage company. In the future it's all about, the reason we are here at this conference is about analyzing the data. What tools do you have to make sense of that. And that's where machine learning, then deep learning comes. Watson is very well-known for that. IBM has the IT and it has a rightful research going on behind that, and I think storage will make more sense here. And also, IBM is doing the right thing by investing almost a billion dollars in software defined storage. They are one of the first companies who did not hesitate to take the software from the integrated systems, for example, XIV, and made the software available as software only. We did the same thing with Store-Wise. We took the software off it and made available as Spectrum Virtualize. We did not hesitate at all to take the same software which was available, to some other vendors, "I can't do that. "I'm going to lose all my margins." We didn't hesitate. We made it available as software. 'Cause we believe that's an important need for our customers. >> So the vision of the company, cognitive, the halo effect of that business, that's the future, is going to bring a lot of storage action, is sort of the premise there. >> Chandra: Yes. >> Excellent, well Chandra, thanks very much for coming to theCUBE. It was great to have you, and good luck with attacking the big data world. >> Thank you, thanks for having me. >> You're welcome. Keep it right there everybody. We'll be back with our next guest. We're live from Munich. This is DataWorks 2017. Right back. (techno music)

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. This is The Cube, the leader It does, it's the foundation. at it as the most sexy thing in the early days of Hadoop and big data, and that becomes part of the main storage. of some kind horizontal infrastructure One of the first things they do but it also makes the data stale. of legacy data hanging around, that, is what you're saying. So that's one of the You're saying your of the organization in terms of governance but also the whole bunch of the client server days. It's a good thing to do and we should, It's the software that you're paying for. And so if you strip that Now all of the functionalities an ASIC, but most don't. is the right answer for you. To some people, it's the object interface. it's not about the protocol, but what are you guys doing One of the first questions we get is, That's kind of you might sell based on the type of data, let's talk about kind of the business file of the applications on cloud. And also, IBM is doing the right thing is sort of the premise there. to theCUBE. This is DataWorks 2017.

ENTITIES

Entity	Category	Confidence
Jeff Hammerbacher	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonwords	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Munich	LOCATION	0.99+
Chandra Mukhyala	PERSON	0.99+
Facebook	ORGANIZATION	0.99+
Chandra	PERSON	0.99+
two parts	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
billions	QUANTITY	0.99+
EMC	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Swift	TITLE	0.99+
early 2000s	DATE	0.99+
One	QUANTITY	0.99+
one problem	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
Cloudera	ORGANIZATION	0.99+
S3	TITLE	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
MapArt	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Spectrum Scale	TITLE	0.97+
ten years ago	DATE	0.97+
two years ago	DATE	0.97+
first questions	QUANTITY	0.96+
first companies	QUANTITY	0.96+
billions of objects	QUANTITY	0.95+
Hadoop	TITLE	0.95+
#DW17	EVENT	0.95+
one point	QUANTITY	0.95+
2017	EVENT	0.94+
decades	QUANTITY	0.94+
one business function	QUANTITY	0.94+
zero	QUANTITY	0.94+
a year and a half	DATE	0.93+
DataWorks Summit Europe 2017	EVENT	0.92+
one dataset	QUANTITY	0.92+
one way	QUANTITY	0.92+
three different things	QUANTITY	0.92+
DataWorks 2017	EVENT	0.91+
SMB	TITLE	0.91+
CapEx	ORGANIZATION	0.9+
last one year	DATE	0.89+

Alan Nance, Virtual Clarity– DataWorks Summit Europe 2017 #DW17 #theCUBE

>> Narrator: At the DataWorks Summit, Europe 2017. Brought to you by Hortonworks. >> Hey, welcome back everyone. We're here live from Munich, Germany at DataWorks 2017, Hadoop Summit formerly, the conference name before it changed to DataWorks. I'm John Furrier with my cohost Dave Vellante. Our next guest, we're excited to have Alan Nance who flew in, just for the CUBE interview today. Executive Vice President with Virtual Clarity. Former star, I call practitioner of the Cloud, knows the Cloud business. Knows the operational aspects of how to use technology. Alan, it's great to see you. Thanks for coming on the CUBE. >> Thank you for having me again. >> Great to see you, you were in the US recently, we had a chance to catch up. And one of the motivations that we talked with you today was, a little bit about some of the things you're looking at, that are transformative. Before we do that, let's talk a little about your history. And what your role is at Virtual Clarity. >> So, as you guys have, basically, followed that career, I started out in the transformation time with ING Bank. And started out, basically, technology upwards. Looking at converged infrastructure, converged infrastructure into VDI. When you've got that, you start to look at Clouds. Then you start to experiment with Clouds. And I moved from ING, from earlier experimentation, into Phillips. So, while Phillips, at that time had both the health care and lighting group. And then you start to look at consumption based Cloud propositions. And you remember the big thing that we were doing at that time, when we identified that 80% of the IT spend was non differentiating. So the thing was, how do we get away from almost a 900 million a year spend on legacy? How do we turn that into something that's productive for the Enterprise? So we spent a lot of time creating the consumption based infrastructure operating platform. A lot of things we had to learn. Because let's be honest, Amazon was still trying to become the behemoth it is now. IBM still didn't get the transition, HP didn't get it. So there was a lot of experimentation on which of the operating model-- >> You're the first mover on the operating model, The Cloud, that has scaled to it. And really differentiated services for your business, for also, cost reductions. >> Cost reductions have been phenomenal. And we're talking about halving the budget over a three year period. We're talking about 500 million a year savings. So these are big, big savings. The thing I feel we still need to tackle, is that when we re-platform your business, it should leave to agile acceleration of your growth path. And I think that's something that we still haven't conquered. So I think we're getting better and better at using platforms to save money, to suppress the expenditure. What we now need to do is to convert that into growth platform business. >> So, how about the data component? Because you were CIO of infrastructure at Phillips. But lately, you've been really spending a lot of time thinking about the data, how data adds value. So talk about your data journey. >> Well if I look at the data journey, the journey started for me, with, basically, a meeting with Tom Ritz in 2013. And he came with a very, very simple proposition. "You guys need to learn how to create "and store, and reason over data, "for the benefit of the Enterprise." And I think, "Well that's cool." Because up until that point, nobody had really been talking about data. Everyone was talking about the underlying technologies of the Cloud, but not really of the data element. And then we had a session with JP Rangaswami, who was at Salesforce, who basically, also said, "Well don't just think "about data lakes, but think also "about data streams and data rivers. "Because the other thing that's "going to happen here is that data's "not going to be stagnant in a company like yours." So we took that, and what happened, I think, in Phillips, which I think you see in a lot of companies, is an explosion across the Enterprise. So you've got people in social doing stuff. You got CDO's appearing. You've got the IOT. You've got the old, legacy systems, the systems of record. And so you end up with this enormous fragmentation of data. And with that you get a Wild West of what I call data stewardship. So you have a CDO who says, "Well I'm in charge of data." And you got a CMO who says, "Well I'm in charge of marketing data." Or you've got a CSO, says, "Yeah, "but I'm the security data guy." And there's no coherence, in terms of moving the Enterprise forward. Because everybody's focused on their own functionality around that data and not connecting it. So where are we now? I think right now we have a huge proliferation of data that's not connected, in many organizations. And I think we're going to hybrid but I don't think that's a future proof thing for most organizations. >> John: What do you mean by that? >> Well, if I look at what a lot of those suppliers are saying, they're really saying, "The solution "that you need, is to have a hybrid solution "between the public Cloud and your own Cloud." I thought, "But that's not the problem "that we need to solve." The problem that we need to solve is first of all, data gravity. So if I look at all the transformations that are running into trouble, what do they forget? When we go out and do IOT, when we go out and do social media analysis, it all has to flow back into those legacy systems. And those legacy systems are all going to be in the old world. And so you get latency issues, you get formatting issues. And so, we have to solve the data gravity issue. And we have to also solve this proliferation of stewardship. Somebody has to be in charge of making this work. And it's not going to be, just putting in a hybrid solution. Because that won't change the operating model. >> So let me ask the question, because on one of the things you're kind of dancing around, Dave brought up the data question. Something that I see as a problem in the industry, that hasn't yet been solved, and I'm just going to throw it out there. The CIO has always been the guy managing IT. And then he would report to the CFO, get the budget, blah, blah, blah. We know that's kind of played out its course. But there's no operational playbook to take the Cloud, mobile data at scale, that's going to drive the transformative impact. And I think there's some people doing stuff here and there, pockets. And maybe there's some organizations that have a cadence of managers, that are doing compliance, security, blah, blah, blah. But you have a vision on this. And some information that you're tracking around. An architecture that would bring it to scale. Could you share your thoughts on this operational model of Cloud, at a management level? >> Well, part of this is also based on your own analyst, Peter Boris. When he says, "The problem with data "is that its value is inverse to its half life." So, what the Enterprise has to do is it has to get to analyzing and making this data valuable, much, much faster then it is right now. And Chris Sellender of Unifi recently said, "You know, the problem's not big data. "The problem's fast data." So, now, who is best positioned in the organization to do this? And I believe it's the COO. >> John: Chief Operating Officer? >> Chief Operating Officer. I don't think it's going to be the CIO. Because I'm trying to figure out who's got the problem. Who's got the problem of connecting the dots to improving the operation of the company? Who is in charge of actually creating an operating platform that the business can feed off of? It's the C Tower. >> John: Why not the CFO? >> No, I think the CFO is going to be a diminishing value, over time. Because a couple of reasons. First of all, we see it in Phillips. There's always going to be a fiduciary role for the CFO. But we're out of the world of capex. We're out of the world of balancing assets. Everything is now virtual. So really, the value of a CFO, as sitting on the tee, if I use the racquetball, the CFO standing on the tee is not going to bring value to the Enterprise. >> And the CIO doesn't have the business juice, is your argument? Is that right? >> It depends on the CIO. There are some CIO's out there-- >> Dave: But in general, we're generalizing. >> Generally not. Because they've come through the ranks of building applications, which now has to be thrown away. They've come through the ranks of technology, which is now less relevant. And they've come through the ranks of having huge budgets and huge people to deploy certain projects. All of that's going away. And so what are you left with? Now you're left with somebody who absolutely has to understand how to communicate with the business. And that's what they haven't done for 30 years. >> John: And stream line business process. >> Well, at least get involved in the conversation. At least get involved in the conversation. Now if I talk to business people today, and you probably do too, most of them will still say there's this huge communication gulf. Between what we're trying to achieve and what the technology people are doing with our goals. I mean, I was talking to somebody the other day. And this lady heads up the sales for a global financial institution. She's sitting on the business side of this. And she's like, "The conversation should be "about, if our company wants to improve "our cost income ratio, and they ask me, "as sales to do it, I have to sell 10 times "more to make a difference. "Then if IT would save money. "So for every Euro they save. "And give me an agile platform, "is straight to the bottom line. "Every time I sell, because of our "cost income ratio, I just can't sell against that. "But I can't find on the IT side, "anybody who, sort of, gets my problem. "And is trying to help me with it." And then you look at her and what? You think a hybrid solution's going to help her? (laughs) I have no idea what you're talking about. >> Right, so the business person here then says, "I don't really care where it runs." But to your point, you care about the operational model? >> Alan: Absolutely. >> And that's really what Cloud should be, right? >> I think everybody who's going to achieve anything from an investment in Cloud, will achieve it in the operating world. They won't just achieve it on the cost savings side. Or on making costs more transparent, or more commoditized. Where it has to happen is in the operating model. In fact, we actually have data of a very large, transportation, logistics company, who moved everything that they had, in an attempt to be in a zero Cloud. And on the benchmark, saved zero. And they saved zero because they weren't changing the operating model. So they were still-- >> They lifted and shifted, but didn't change the operational mindset. >> Not at all. >> But there could have been business value there. Maybe things went faster? >> There could have been. >> Maybe simpler? >> But I'm not seeing it. >> Not game changing. >> Not game changing, certainly yes. >> Not as meaningful, it was a stretch. >> Give an example of a game changing scenario. >> Well for me, and I think this is the next most exciting thing. Is this idea of platforms. There's been an early adoption of this in Telco. Where we've seen people coming in and saying, "If you stock all of this IT, as we've known it, "and you leverage the ideas of Cloud computing, "to have scalable, invisible, infrastructure. "And you put a single platform on top of it "to run your business, you can save money." Now, I've seen business cases where people who are about to embark on this program are taking a billion a year out of their cost base. And in this company, it's 1/7th of their total profit. That's a game changer, for me. But now, who's going to help them do that? Who's going to help them-- >> What's the platform look like? >> And a million's a lot of money. >> Let's go, grab a sheet of paper how we-- >> So not everybody will even have a billion-- >> But that gets the attention of certainly, the CEO, the COO, CFO says, "Tell me more." >> You're alluding to it, Dave. You need to build a layer to punch, to doing that. So you need to fix the data stewardship problem. You have to create the invisible infrastructure that enables that platform. And you have to have a platform player who is prepared to disrupt the industry. And for me-- >> Dave: A Cloud player. >> A Cloud player, I think it's a born in the Cloud player. I think, you know, we've talked about it privately. >> So who are the forces to attract? You got Microsoft, you got AWS, Google, maybe IBM, maybe Oracle. >> See, I think it's Google. >> Dave: Why, why do you think it's Google? >> I think it's because, the platforms that I'm thinking of, and if I look in retail, if I look in financial services, it's all about data. Because that's the battle, right. We all agree, the battle's on data. So it's got to be somebody who understands data at scale, understands search at scale, understands deep learning at scale. And understands technology enough to build that platform and make it available in a consumption model. And for me, Google would be the ideal player, if they would make that step. Amazon's going to have a different problem because their strategy's not going down that route. And I think, for people like IBM or Oracle, it would require cannibalizing too much of their existing business. But they may dally with it. And they may do it in a territory where they have no install base. But they're not going to be disrupting the industry. I just don't think it's going to be possible for them. >> And you think Google has the Enterprise chops to pull it off? >> I think Google has the platform. I would agree with Alan on this. Something, I've been very critical on Google. Dave brings this up because he wants me to say it now, and I will. Google is well positioned to be the platform. I am very bullish on Google Cloud with respect to their ability to moon shot or slingshot to the future faster, than, potentially others. Or as they say in football, move the goal posts and change the game. That being said, where I've been critical of Google, and this is where, I'll be critical, is their dogma is very academic, very, "We're the technology leader, "therefore you should use Google G Suite." I think that they have to change their mindset, to be more Enterprise focused, in the sense of understand not the best product will always win, but the B chip they have to develop, have to think about the Enterprise. And that's a lot of white glove service. That's a lot of listening. That's not being too arrogant. I mean, there's a borderline between confidence and arrogance. And I think Google crosses it a little bit too much, Dave. And I think that's where Google recognizes, some people in Google recognize that they don't have the Enterprise track record, for sure on the sales side. You could add 1,000 sales reps tomorrow but do they have experience? So there's a huge translation issue going on between Google's capability and potential energy. And then the reality of them translating that into an operational footprint. So for them to meet the mark of folks like you, you can't be speaking Russian and English. You got to speak the same language. So, the language barrier, so to speak, the linguistics is different. That's my only point. >> I sense in your statements, there's a frustration here. Because we know that the key to some really innovative, disruption is with Google. And I think what we'd all like to do, even while I was addressing the camera. I'd love to see Diane, who does understand Enterprise, who's built a whole career servicing Enterprises extremely well, I'd like to see a little bit of a glimpse of, "We are up for this." And I understand when you're part of the bigger Google, the numbers are a little bit skewered against you to make a big impact and carry the firm with you. But I do believe there's an enormous opportunity in the Enterprise space. And people are just waiting for this. >> Well Diane Greene knows the Enterprise. So she came in, she's got to change the culture. And I know she's doing it. Because I have folks at Google, that I know that work there, that tell me privately, that it's happening, maybe not fast enough. But here's the thing. If you walked in the front door at Google, Alan Nance, this is my point, and he said, "I have experience and I have a plan "to build a platform, to knock a billion "dollars off seven companies, that I know, personally. "That I can walk in and win. "And move a billion dollars to their "bottom line with your platform." They might not understand what that means. >> I don't know, you know I was at Google Next a few weeks ago, last month. And I thought they were more, to your point, open to listening. Maybe not as arrogant as you might be presenting. And somewhat more humble. Still pretty ballsy. But I think Google recognizes that it needs help in the Enterprise. And here's why. Something that we've talked about in the past, is, you've got top down initiatives. You've got bottom up initiatives. And you've got middle out. What frequently happens, and I'd love for you to describe your experiences. The leaders say, the top CXO's say, "Okay we're going." And they take off and the organization doesn't follow them. If it's bottoms up, you don't have the top down in premature. So how do you address that? What are you seeing and how do you address that problem? >> So I think that's a really, really good observation. I mean, what I see in a lot of the big transformations that I've been involved in, is that speed is of the essence. And I think when CEO's, because usually it's the CEO. CEO comes in and they think they've got more time than they actually have to make the impact in the Enterprise. And it doesn't matter if they're coming in from the outside or they've grown up. They always underestimate their ability to do change, in time. And now what's changed over the past few years, is the average tenure of a CEO is six years. You know, I mean, Jack Welch was 20 years at GE. You can do a lot of damage in 20 years. And he did a lot of great things at GE over a 20 year period. You've only got six years now. And what I see in these big transformation programs is they start with a really good vision. I mean Mackenzie, Bain, Boston. They know the essence of what needs to happen. >> Dave: They can sell the dream. >> They can sell the dream. And the CEO sort of buys into it. And then immediately you get into the first layer, "Okay, okay, so we've got to change the organization." And so you bring in a lot of these companies that will run 13 work streams over three years, with hundreds of people. And at the end of that time, you're almost halfway through your tenure. And all you've got is a new design. Or a new set of job descriptions or strategies. You haven't actually achieved anything. And then the layer down is going to run into real problems. One of the problems that we had at the company I worked at before, was in order to support these platforms you needed really good master data management. And we suddenly realized that. And so we had to really put in an accelerated program to achieve that, with Impatica. We did it, but it cost us a year and 1/2. At a bank I know, they can't move forward because they're looking at 700 million of technology debt, they can't get past. So they end up going down a route of, "Maybe one of these big suppliers "can buy our old stuff. "And we can tag on some transformational "deal at the back end of that." None of those are working. And then what happens is, in my mind, if the CEO, from what I see, has not achieved escape velocity at the end of year three. So he's showing the growth, or she's showing the digital transformation, it's kind of game over. The Enterprise has already figured out they've stalled it long enough, not intentionally. And then we go back into an austerity program. Because you got to justify the millions you've spent in the last three years. And you've got nothing to show for it. >> And you're preparing three envelopes. >> So you got to accelerate those layers. You got to take layers out and you've got to have a really, I would say almost like, 90 day iteration plans that show business outcomes. >> But the technology layer, you can put in an abstraction layer, use APIs and infrastructure as code, all that cool stuff. But you're saying it's the organizational challenges. >> I think that's the real problem. It is the real problem, is the organization. And also, because what you're really doing in terms of the Enterprise, is you're moving from a more traditional supply chain that you own. And you've matriculated with SAP or with Oracle. Now you're talking about creating a digital value chain. A digital value chain that's much more based on a more mobile ecosystem, where you would have thin text in one area or insurance text, that have to now fit into an agile supply chain. It's all about the operating model. If you don't have people who know how to drive that, the technology's not going to help you. So you've got to have people on the business side and the technology side coming together to make this work. >> Alan, I have a question for you. What's you're prediction, okay, knowing what you know. And kind of, obviously, you have some frustrations in platforms with trying to get the big players to listen. And I think they should listen to you. But this is going to happen. So I would believe that what you're saying with the COO, operational things radically changing differently. Obviously, the signs are all there. Data centers are moving into the Cloud. I mean this is radical stuff, in a good way. And so, what's your prediction for how this plays out vis a vis Amazon Web Services, Google Cloud Platform Azure, IBM Cloud SoftLayer. >> Well here's my concern a little bit. I think if Google enters the fray I think everybody will reconfigure. Because if we'd assume that Google plays to its strengths and goes out there and finds the right partners. It's going to reconfigure the industry. If they don't do that, then what the industry's going to do is what it's done. Which means that the platforms are going to be hybrid platforms that are dominated by the traditional players. By the SOPs, by the Oracles, by the IBMs. And what I fear is that there may actually be a disillusionment. Because they will not bring the digital transformation and all the wonderful things that we all know, are out there to be gained. So you may get, "We've invested all this money." You see it a little bit with big data. "I've got this huge layer. "I've got petabytes. "Why am I not smarter? "Why is my business not going so much better? "I've put everything in there." I think we've got to address the operating problem. And we have to find a dialogue at the C Suite. >> Well to your point, and we talked about this. You know, you look at the core of Enterprise apps, the Oracle stuff is not moving in droves, to the Cloud. Oracle's freezing the market right now. Betting that it can get there before the industry gets there. And if it does-- >> Alan: It's not. >> And it might, but if it does, it's not going to be that radical transformation you're prescribing. >> They have too much to lose. Let's be honest, right. So Oracle is a victim of it's own success, pretty much like SAP. It has to go to the Cloud as a defensive play. Because the last thing either of those want is to be disintermediated by Amazon. Which may or may not happen anyway. Because a lot of companies will disintermediate if they can. Because the licensing is such a painful element for most enterprises, when they deal with these companies. So they have to believe that the platform is not going to look like that. >> And they're still trying to figure out the pricing models, and the margin models, and Amazon's clearly-- >> You know what's driving the pricing models is not the growth on the consumer side. >> Right, absolutely. >> That's not what's driving it. So I think we need another player. I really think we need another player. If it's not Google, somebody else. I can't think who would have the scale, the money to-- >> The only guys who have the scale, you got 10 cents, maybe a couple China Clouds, maybe one Japan Cloud and that's it. >> To be honest, you raise a good point. I haven't really looked at the Ali Baba's and the other people like that who may pick up that mantle. I haven't looked at them. Ali Baba's interesting, because just like Amazon, they have their own business that runs on platforms. And a very diverse business, which is growing faster than Amazon and is more profitable than Amazon. So they could be interesting. But I'm still hopeful. We should figure this out. >> Google should figure it out. You're absolutely right. They're investing, and I thought they put forth a pretty good messaging at the Google Next. You covered it remotely but I think they understand the opportunity. And I think they have the stomach for it. >> We had reporters there as well, at the event. We just did, they came to our studio. Google is self aware that they need to work on the Enterprise. I think the bigger thing that you're highlighting is the operational model is shifting to a scale point where it's going to change stewardship and COO meaning to be, I like that. The other thing I want to get your reaction to is something I heard this morning, on the CUBE from Sean Connelly. Which that goes with some of the things that we're seeing where you're seeing Cloud becoming a more centralized view. Where IOT is an Edge case. So you have now, issues around architectural things. Your thoughts and reaction to this balance between Edge and Cloud. >> Well I think this is where you're also going to have your data gravity challenge. So, Dave McCrory has written a lot about the concept of data gravity. And in my mind, too many people in the Enterprise don't understand it. Which is basically, that data attracts more data. And more data you have, it'll attract more. And then you create all these latency issues when you start going out to the Edge. Because when we first went out to the Edge I think, even at Phillips, we didn't realize how much interaction needed to come back. And that's going to vary from company to company. So some company's are going to want to have that data really quickly because they need to react to it immediately. Others may not have that. But what you do have is you have this balancing act. About, "What do I keep central? "And what do I put at the Edge?" I think Edge Technology is amazing. And when we first looked at it, four years ago, I mean, it's come such a long way. And what I am encouraged by is that, that data layer, so the layer that Sean talks about, there's a lot of exciting things happening. But again, my problem is what's the Enterprise going to do with that? Because it requires a different operating model. If I take an example of a manufacturing company, I know a manufacturing company right now that does work in China. And it takes all the data back to its central mainframes for processing. Well if you've got the Edge, you want to be changing the way you process. Which means that the decision makers on the business need to be insitu. They need to be in China. And we need to be bringing, systems of record data and combining it with local social data and age data, so we get better decisions. So we can drive growth in those areas. If I just enable it with technology but don't change the business model the business is not going to grow. >> So Alan, we always loved having you on. Great practitioner, but now you've kind of gone over to the dark side. We've heard of a company called Virtual Clarity. Tell us about what you're doing there. >> So what we're vested in, what I am very much vested in, with my team at Virtual Clarity, is creating this concept of precision guided transformation. Where you work on the business, on what are the outcomes we really need to get from this? And then we've combined, I would say it's like a data nerve center. So we can quickly analyze, within a matter of weeks, where we are with the company, and what routes to value we can create. And then we'll go and do it. So we do it in 90 day increments. So the business now starts to believe that something's really going to happen. None of these big, insert miracle here after three year programs. But actually going out and doing it. The second thing that I think that we're doing that I'm excited about is bringing in enlightened people who represent the Enterprise. So, one of my colleagues, former COO of Unilever, we just brought on a very smart lady, Dessa Grassa, who was the CDO at JP Morgan Chase. And the idea is to combine the insights that we have on the demand side, the buy side, with the insights that we have on the technology side to create better operating models. So that combination of creating a new view that is acceptable to the C Suite. Because these people understand how you talk to them. But at the same time, runs on this concept of doing everything quickly. That's what we're about right now. >> That's awesome, we should get you hooked up with our new analyst we just hired, James Corbelius, from IBM. Was focusing on exactly that. The intersections of developers, Cloud, AI machine learning and data, all coming together. And IOT is going to be a key application that we're going to see coming out of that. So, congratulations. Alan thank you for spending the time to come in. >> Thanks for allowing me. >> To see us in the CUBE. It's the CUBE, bringing you more action. Here from DataWorks 2017. I'm John Furrier with my cohost Dave Vallante, here on the CUBE, SiliconANGLE Media's flagship program. Where we've got the events, straight from SiliconANGLE. Stay with us for more great coverage. Day one of two days of coverage at DataWorks 2017. We'll be right back.

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. Thanks for coming on the CUBE. And one of the motivations that So the thing was, how do we get away from that has scaled to it. And I think that's something that we So, how about the data component? of moving the Enterprise forward. And it's not going to be, just So let me ask the question, because on And I believe it's the COO. I don't think it's going to be the CIO. So really, the value of a CFO, as sitting It depends on the CIO. Dave: But in general, And so what are you left with? "But I can't find on the IT side, Right, so the business And on the benchmark, saved zero. change the operational mindset. But there could have Give an example of a And in this company, it's But that gets the And you have to have a platform player a born in the Cloud player. You got Microsoft, you got AWS, Google, So it's got to be somebody who understands So, the language barrier, so to speak, And I think what we'd all like to do, But here's the thing. The leaders say, the top CXO's say, is that speed is of the essence. And at the end of that time, you're almost You got to take layers But the technology It is the real problem, And I think they should listen to you. the industry's going to in droves, to the Cloud. it's not going to be that radical So they have to believe that the platform is not the growth on the consumer side. the scale, the money to-- you got 10 cents, maybe I haven't really looked at the Ali Baba's And I think they have the stomach for it. is the operational model is shifting the business is not going to grow. kind of gone over to the dark side. And the idea is to combine the insights the time to come in. It's the CUBE, bringing you more action.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Virtual Clarity	ORGANIZATION	0.99+
Chris Sellender	PERSON	0.99+
Diane Greene	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Dave Vallante	PERSON	0.99+
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Jack Welch	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Dave McCrory	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dessa Grassa	PERSON	0.99+
Peter Boris	PERSON	0.99+
Google	ORGANIZATION	0.99+
2013	DATE	0.99+
China	LOCATION	0.99+
JP Rangaswami	PERSON	0.99+
10 times	QUANTITY	0.99+
James Corbelius	PERSON	0.99+
HP	ORGANIZATION	0.99+
Unifi	ORGANIZATION	0.99+
Sean	PERSON	0.99+
Diane	PERSON	0.99+
six years	QUANTITY	0.99+
Alan Nance	PERSON	0.99+
Phillips	ORGANIZATION	0.99+
US	LOCATION	0.99+
John Furrier	PERSON	0.99+
ING Bank	ORGANIZATION	0.99+
700 million	QUANTITY	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
10 cents	QUANTITY	0.99+
20 years	QUANTITY	0.99+
Sean Connelly	PERSON	0.99+
Unilever	ORGANIZATION	0.99+
90 day	QUANTITY	0.99+
30 years	QUANTITY	0.99+
80%	QUANTITY	0.99+
GE	ORGANIZATION	0.99+
IBMs	ORGANIZATION	0.99+
last month	DATE	0.99+
ING	ORGANIZATION	0.99+
Tom Ritz	PERSON	0.99+
three year	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for DataWorks Summit 2017: