Jeff Bettencourt, DataTorrent & Nathan Trueblood, DataTorrent - DataWorks Summit 2017

>> Narrator: Live, from San Jose, in the heart of Silicon Valley, it's The Cube. Covering, DataWorks Summit, 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live on day two of the DataWorks Summit. From the heart of Silicon Valley. I am Lisa Martin, my co-host is George Gilbert. We're very excited to be joined by our next guest from DataTorrent, we've got Nathan Trueblood, VP of Product, hey Nathan. >> Hi. >> Lisa: And, the man who gave me my start in high tech, 12 years ago, the SVP of Marketing, Jeff Bettencourt. Welcome, Jeff. >> Hi, Lisa, good to see ya. >> Lisa: Great to see you, too, so. Tell us about the SVP of Marketing, who is DataTorrent, what do you guys do, what are doing in the big data space? >> Jeff: So, DataTorrent is all about real time streaming. So, it's really taken a different paradigm to handling information as it comes from the different sources that are out there, so you think, big IOT, you think, all of these different new things that are creating pieces of information. It could be humans, it could be machines. Sensors, whatever it is. And taking that in realtime, rather than putting it traditionally just in a data lake and then later on coming back and investigating the data that you stored. So, we started about 2011, started by some of the early founders, people that started Yahoo. And, we're pioneers in Hadoop with Hadoop yarn. This is one of the guys here, too. And so we're all about building realtime analytics for our customers, making sure that they can get business decisions done in realtime. As the information is created. And, Nathan will talk a little bit about what we're doing on the application side of it, as well. Building these hard application pipelines for our customers to assist them to get started faster. >> Lisa: Excellent. >> So, alright, let's turn to those realtime applications. Umm, my familiarity with DataTorrent started probably about five years ago, I think, where it was, I think the position is, I don't think that there was so much talk about streaming but it was like, you know, realtime data feed, but, now we have, I mean, streaming is sort of center of gravity. Sort of, appear to big data. >> Nathan: Yeah. >> So, tell us how someone whose building apps, should think about the two solution categories how they compliment each other and what sort of applications we can build now that we couldn't build before? >> So, I think the way I look at it, is not so much two different things that compliment each other, but streaming analytics and realtime data processing and analytics is really just a natural progression of where big data has been going. So, you know, when we were at Yahoo and we're running Hadoop in scale, you know, first thing on the scene was just simply the ability to produce insight out of a massive amount of data. But then there was this constant pressure, well, okay, now we've produced that insight in a day, can you do it in an hour? You know, can you do it in half an hour? And particularly at Yahoo at the time that Ah-mol, our CTO and I were there, there was just constant pressure of can you produce insight from a huge volume of data more quickly? And, so we kind of saw at that time, two major trends. One, was that we were kind of reaching a limit of where you could go with the Hadoop and batch architecture at that time. And so a new approach was required. And that's what really was sort of, the foundation of the Apache Apex project and of DataTorrent the company, was simply realizing that a new approach was required because the more that Yahoo or other businesses can take information from the world around them and take action on that as quickly as possible, that's going to make you more competitive. So I'd look at streaming as really just a natural progression. Where, now it's possible to get inside and take action on data as close to the time of data creation as possible and if you can do that, then, you're going to be competitive. And so we see this coming across a whole bunch of different verticals. So that's how I kind of look at the sort of it's not too much complimentary, as a trend in where big data is going. Now, the kinds of things that weren't possible before this, are, you know, the kinds of applications where now you can take insight whether it's from IOD or from sensors or from retail, all the things that are going on. Whereas before, you would land this in a data lake, do a bunch of analysis, produce some insight, maybe change your behavior, but ultimately, you weren't being as responsive as you could be to customers. So now what we are seeing, why I think the center of mass is moved into realtime and streaming, is that now it's possible to, you know, give the customer an offer the second they walk into a store. Based on what you know about them and their history. This was always something that the internet properties were trying to move towards, but now we see, that same technology is being made available across a whole bunch of different verticals. A whole bunch of different industries and that's why you know, when you look at Apex and DataTorrent, we're involved not only in things like adtech, but in industrial automation and IOT, and we're involved in, you know, retail and customer 360 because in every one of these cases, insurance, finance, security and fraud prevention, it's a huge competitive advantage if you can get insight and make a decision, close to the time of the data creation. So, I think that's really where the shift is coming from. And then the other thing I would mention here, is that a big thrust of our company, and of Apache Apex and this is, so we saw streaming was going to be something that every one was going to need. The other thing we saw from our experience at Yahoo, was that, really getting something to work at a POC level, showing that something is possible, with streaming analytics is really only a small part of the problem. Being able to take and put something into production at scale and run a business on it, is a much bigger part of the problem. And so, we put into both the Apache Apex problem as well as into our product, the ability to not only get insight out of this data in motion, but to be able to put that into production at scale. And so, that's why we've had quite a few customers who have put our product, in production at scale and have been running that way, you know, in some cases for years. And so that's another sort of key area where we're forging a path, which is, it's not enough to do POC and show that something is possible. You have to be able to run a business on it. >> Lisa: So, talk to us about where DataTorrent sits within a modern data architecture. You guys are kind of playing in a couple of, integrated in a couple of different areas. What goes through what that looks like? >> So, in terms of a modern data architecture, I mean part of it is what I just covered in that, we're moving sort of from a batch to streaming world where the notion of batch is not going away, but now when you have something, you know a streaming application, that's something that's running all the time, 24/7, there's no concept of batch. Batch is really more the concept of how you are processing data through that streaming application so, what we're seeing in the modern data architecture, is that, you know, typically you have people taking data, extracting it and eventually loading it into some kind of a data lake, right? What we're doing is, shifting left of the data lake. You know, analyzing information when it's created. Produce insight from it, take action on it, and then, yes, land it in the data lake, but once you land it in the data lake, now, all of the purposes of what you're doing with that data have shifted. You know, we're producing insight, taking action to the left of the data lake and then we use that data lake to do things, like train your you know, your machine learning model that we're then going to use to the left of the data lake. Use the data lake to do slicing and dicing of your data to better understand what kinds of campaigns you want to run, things like that. But ultimately, you're using the realtime portion of this to be able to take those campaigns and then measure the impacts you're having on your customers in realtime. >> So, okay, cause that was going to be my followup question, which is, there does seem to be a role, for a historical repository for richer context. >> Nathan: Absolutely. >> And you're acknowledging that. Like, did the low legacy analytics happen first? Then, store up for a richer model, you know, later? >> Nathan: Correct. >> Umm. So, there are a couple things then that seem to be like requirements, next steps, which is, if you're doing the modeling, the research model, in the cloud, how do you orchestrate its distribution towards the sources of the realtime data, umm, and in other words, if you do training up in the cloud where you have, the biggest data or the richest data. Is DataTorrent or Apex a part of the process of orchestrating the distribution and coherence of the models that should be at the edge, or closer to where the data sources are? >> So, I guess there's a couple different ways we can think about that problem. So, you know we have customers today who are essentially providing into the streaming analytics application, you know, the models that have been trained on the data from the data lake. And, part of the approach we take in Apex and DataTorrent, is that you can reload and be changing those models all of the time. So, our architecture is such that it's full tolerant it stays up all the time so you can actually change the application and evolve it over time. So, we have customers that are reloading models on a regular basis, so that's whether it's machine learning or even just a rules engine, we're able to reload that on a regular basis. The other part of your question, if I understood you, was really about the distribution of data. And the distribution of models, and the distribution of data and where do you train that. And I think that you're going to have data in the cloud, you're going to have data on premises, you're going to have data at the edge, again, what we allow customers to do, is to be able to take and integrate that data and make decisions on it, regardless kind of where it lives, so we'll see streaming applications that get deployed into the cloud. But they may be synchronized in some portion of the data, to on premises or vis versa. So, certainly we can orchestrate all of that as part of an overall streaming application. >> Lisa: I want to ask Jeff, now. Give us a cross section of your customers. You've got customers ranging from small businesses, to fortune 10. >> Jeff: Yep. >> Give us some, kind of used cases that really took out of you, that really showcased the great potential that DataTorrent gives. >> Jeff: So if you think about the heritage of our company coming out of the early guys that were in Yahoo, adtech is obviously one that we hit hard and it's something we know how to do really really well. So, adtech is one of those things where they're constantly changing so you can take that same model and say, if I'm looking at adtech and saying, if I applied that to a distribution of products, in a manufacturing facility, it's kind of all the same type of activities, right? I'm managing a lot of inventory, I'm trying to get that inventory to the right place at the right time and I'm trying to fill that aspect of it. So that's kind of where we kind of started but we've got customers in the financial sector, right, that are really looking at instantaneous type of transactions that are happening. And then how do you apply knowledge and information to that while you're bringing that source data in so that you can make decisions. Some of those decisions have people involved with them and some of them are just machine based, right, so you take the people equation out. We kind of have this funny thing that Guy Churchward our CEO talks about, called the do loop and the do loop is where the people come in and how do we remove people out of that do loop and really make it easier for companies to act, prevent? So then if you take that aspect of it, we've got companies like in the publishing space. We've got companies in the IOT space, so they're doing interview management, stuff like that, so, we go from very you know, medium sized customers all the way up to very very large enterprises. >> Lisa: You're really turning up a variety of industries and to tech companies, because they have to be these days. >> Nathan: Right, well and one other thing I would mention, there, which is important, especially as we look at big data and a lot of customer concern about complexity. You know, I mentioned earlier about the challenge of not just coming up with an idea but being able to put that into production. So, one of the other big ares of focus for DataTorrent, as a company, is that not only have we developed platform for streaming analytics and applications but we're starting to deliver applications that you can download and run on our platform that deliver an outcome to a customer immediately. So, increasingly as we see in different verticals, different applications, then we turn those into applications we can make available to all of our customers that solve business problems immediately. One of the challenges for a long time in IT is simply how do you eliminate complexity and there's no getting away from the fact that this is big data in its complex systems. But to drive mass adoption, we're focused on how can we deliver outcomes for our customers as quickly as possible and the way to do that is by making applications available across all these different verticals. >> Well you guys, this has been so educational. We wish you guys continued success, here. It sounds like you're really being quite disruptive in an of yourselves, so if you haven't heard of them, DataTorrent.com, check them out. Nathan, Jeff, thanks so much for giving us your time this afternoon. >> Great, thanks for the opportunity. >> Lisa: We look forward to having you back. You've been watching The Cube, live from day two of the DataWorks Summit, from the heart of Silicon Valley, for my co-host George Gilbert, I'm Lisa Martin, stick around, we'll be right back. (upbeat music)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. From the heart of Silicon Valley. 12 years ago, the SVP of Marketing, Jeff Bettencourt. who is DataTorrent, what do you guys do, the data that you stored. but it was like, you know, realtime data feed, is that now it's possible to, you know, Lisa: So, talk to us about where DataTorrent Batch is really more the concept of how you are So, okay, cause that was going to be my followup question, Then, store up for a richer model, you know, later? in the cloud, how do you orchestrate its distribution and DataTorrent, is that you can reload to fortune 10. showcased the great potential that DataTorrent gives. so that you can make decisions. of industries and to tech companies, that you can download and run on our platform We wish you guys continued success, here. Lisa: We look forward to having you back.

ENTITIES

Entity	Category	Confidence
Jeff	PERSON	0.99+
Nathan	PERSON	0.99+
George Gilbert	PERSON	0.99+
Jeff Bettencourt	PERSON	0.99+
Lisa	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
adtech	ORGANIZATION	0.99+
Nathan Trueblood	PERSON	0.99+
Apex	ORGANIZATION	0.99+
DataTorrent	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Guy Churchward	PERSON	0.99+
The Cube	TITLE	0.99+
half an hour	QUANTITY	0.99+
one	QUANTITY	0.99+
an hour	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two different things	QUANTITY	0.98+
One	QUANTITY	0.98+
Apache	ORGANIZATION	0.97+
today	DATE	0.97+
both	QUANTITY	0.97+
Ah-mol	ORGANIZATION	0.96+
first thing	QUANTITY	0.96+
DataTorrent.com	ORGANIZATION	0.96+
a day	QUANTITY	0.95+
Hortonworks	ORGANIZATION	0.95+
day two	QUANTITY	0.94+
12 years ago	DATE	0.93+
this afternoon	DATE	0.92+
DataWorks Summit 2017	EVENT	0.92+
2011	DATE	0.91+
first	QUANTITY	0.91+
two solution	QUANTITY	0.9+
about five years ago	DATE	0.88+
Apache Apex	ORGANIZATION	0.88+
SVP	PERSON	0.83+
Hadoop	ORGANIZATION	0.77+
two major trends	QUANTITY	0.77+
2017	DATE	0.74+
second	QUANTITY	0.68+
360	QUANTITY	0.66+
The Cube	ORGANIZATION	0.63+

Itamar Ankorion, Attunity & Arvind Rajagopalan, Verizon - #DataWorks - #theCUBE

>> Narrator: Live from San Jose in the heart of Silicon Valley, it's the CUBE covering DataWorks Summit 2017 brought to you by Hortonworks. >> Hey, welcome back to the CUBE live from the DataWorks Summit day 2. We've been here for a day and a half talking with fantastic leaders and innovators, learning a lot about what's happening in the world of big data, the convergence with Internet of Things Machine Learning, artificial intelligence, I could go on and on. I'm Lisa Martin, my co-host is George Gilbert and we are joined by a couple of guys, one is a Cube alumni, Itamar Ankorion, CMO of Attunity, Welcome back to the Cube. >> Thank you very much, good to be here, thank you Lisa and George. >> Lisa: Great to have you. >> And Arvind Rajagopalan, the Director of Technology Services for Verizon, welcome to the Cube. >> Thank you. >> So we were chatting before we went on, and Verizon, you're actually going to be presenting tomorrow, at the DataWorks summit, tell us about building... the journey that Verizon has been on building a Data Lake. >> Oh, Verizon is over the last 20 years, has been a large corporation, made up of a lot of different acquisitions and mergers, and that's how it was formed in 20 years back, and as we've gone through the journey of the mergers and the acquisitions over the years, we had data from different companies come together and form a lot of different data silos. So the reason we kind of started looking at this, is when our CFO started asking questions around... Being able to answer One Verizon questions, it's as simple as having Days Payable, or Working Capital Analysis across all the lines of businesses. And since we have a three-major-ERP footprint, it is extremely hard to get that data out, and there was a lot of manual data prep activities that was going into bringing together those One Verizon views. So that's really what was the catalyst to get the journey started for us. >> And it was driven by your CFO, you said? >> Arvind: That's right. >> Ah, very interesting, okay. So what are some of the things that people are going to hear tomorrow from your breakout session? >> Arvind: I'm sorry, say that again? >> Sorry, what are some of the things that the people, the attendees from your breakout session, are going to learn about the steps and the journey? >> So I'm going to primarily be talking about the challenges that we ran into, and share some around that, and also talk about some of the factors, such as the catalysts and what drew us to sort of moving in that direction, as well as getting to some architectural components, from high-level standpoint, talk about certain partners that we work with, the choices we made from an architecture perspective and the tools, as well as to kind of close the loop on, user adoption and what users are seeing in terms of business value, as we start centralizing all of the data at Verizon from a backoff as Finance and Supply Chains standpoint. So that's kind of what I'm looking at talking tomorrow. >> Arvind, it's interesting to hear you talk about sort of collecting data from essentially backoff as operational systems in a Data Lake. Were there... I assume that the state is sort of more refined and easily structured than the typical stories we hear about Data Lakes. Were there challenges in making it available for exploration and visualization, or were all the early-use cases really just Production Reporting? >> So standard reporting across the ERP systems is very mature and those capabilities are there, but then you look at across-ERP systems and we have three major ERP systems for each of the lines of businesses, when you want to look at combining all of the data, it's very hard, and to add to that, you pointed on self-service discovery, and visualization across all three datas, that's even more challenging, because it takes a lot of heavy lift, to normalize all of the data and bring it into one centralized platform, and we started off the journey with Oracle, and then we had SAP HANA, we were trying to bring all the data together, but then we were looking at systems in our non-SAP ERP systems and bringing that data into a SAP-kind of footprint, one, the cost was tremendously high, also there was a lot of heavy lift and challenges in terms of manually having to normalize the data and bring it into the same kind of data models. And even after all of that was done, it was not very self-service oriented for our users and Finance and Supply Chain. >> Let me drill into two of those things. So it sounds like the ETL process of converting it into a consumable format was very complex, and then it sounds like also, the discoverability, like where a tool, perhaps like Elation, might help, which is very, very immature right now, or maybe not immature, it's still young. Is that what was missing, or why was the ETL process so much more heavyweight than with a traditional data warehouse? >> The ETL processes, there's a lot of heavy lifting there involved, because of the proprietary data structures of the ERP systems, especially SAP is... The data structures and how the data is used across clustered and pool tables, is very proprietary. And on top of that, bringing the data formats and structures from a PeopleSoft ERP system which are supporting different lines of businesses, so there are a lot of customization that's gone into place, there are specific things that we use in the ERPs, in terms of the modules and how the processes are modeled in each of the lines of businesses, complicates things a lot. And then you try and bring all these three different ERPs, and the nuances that they have over the years, try and bring them together, it actually makes it very complex. >> So tell us then, help us understand, how the Data Lake made that easier. Was it because you didn't have to do all the refinement before it got there. And tell us how Attunity helped make that possible. >> Oh absolutely, so I think that's one of the big things, why we picked the Hortonworks as one of our key partners in terms of buidling out the Data Lake, it just came on greed, you aren't necessarily worried about doing a whole lot of ETL before you bring the data in, and it also provides with the tools and the technologies from a lot other partners. We have a lot of maturity now, better provided self-service discovery capabilities for ad hoc analysis and reporting. So this is helpful to the users because now they don't have to wait for prolonged IT development cycles to model the data, do the ETL and build reports for the to consume, which sometimes could take weeks and months. Now in a matter of days, they're able to see the data they're looking for and they're able to start the analysis, and once they start the analysis and the data is accessible, it's a matter of minutes and seconds looking at the different tools, how they want to look at it, how they want to model it, so it's actually being a huge value from the perspective of the users and what they're looking to do. >> Speaking of value, one of the things that was kind of thematic yesterday, we see enterprises are now embracing big data, they're embracing Hadoop, it's got to coexist within our ecosystem, and it's got to inter-operate, but just putting data in a Data Lake or Hadoop, that's not the value there, it's being able to analyze that data in motion, at rest, structured, unstructured, and start being able to glean or take actionable insights. From your CFO's perspective, where are you know of answering some of the questions that he or she had, from an insights perspective, with the Data Lake that you have in place? >> Yeah, before I address that, I wanted to quickly touch upon and wrap up George's question, if you don't mind. Because one of the key challenges, and I do talk about how Attunity helped. I was just about to answer the question before we moved on, so I just want to close the loop on that a little bit. So in terms of bringing the data in, the data acquisition or ingestion is key aspect of it, and again, looking at the proprietary data structures from the ERP systems is very complex, and involves a multi-step process to bring the data into a strange environment, and be able to put it in the swamp bring it into the Lake. And what Attunity has been able to help us with is, it has the intelligence to look at and understand the proprietary data structures of the ERPs, and it is able to bring all the data from the ERP source systems directly into Hadoop, without any stops, or staging data bases along the way. So it's been a huge value from that standpoint, I'll get into more details around that. And to answer your question, around how it's helping from a CFO standpoint, and the users in Finance, as I said, now all the data is available in one place, so it's very easy for them to consume the data, and be able to do ad hoc analysis. So if somebody's looking to, like I said earlier, want to look at and calculate base table, as an example, or they want to look at working capital, we are actually moving data using Attunity, CDC replicate product, we're getting data in real-time, into the Data Lake. So now they're able to turn things around, and do that kind of analysis in a matter of hours, versus overnight or in a matter of days, which was the previous environment. >> And that was kind of one of the things this morning, is it's really about speed, right? It's how fast can you move and it sounds like together with Attunity, Verizon is really not only making things simpler, as you talked about in this kind of model that you have, with different ERP systems, but you're also really able to get information into the right hands much, much faster. >> Absolutely, that's the beauty of the near real-time, and the CDC architecture, we're able to get data in, very easily and quickly, and Attunity also provides a lot of visibility as the data is in flight, we're able to see what's happening in the source system, how many packets are flowing through, and to a point, my developers are so excited to work with a product, because they don't have to worry about the changes happening in the source systems in terms of DDL and those changes are automatically understood by the product and pushed to the destination of Hadoop. So it's been a game-changer, because we have not had any downtime, because when there are things changing on the source system side, historically we had to take downtime, to change those configurations and the scripts, and publish it across environments, so that's been huge from that standpoint as well. >> Absolutely. >> Itamar, maybe, help us understand where Attunity can... It sounds like there's greatly reduced latency in the pipeline between the operational systems and the analytic system, but it also sounds like you still need to essentially reformat the data, so that it's consumable. So it sounds like there's an ETL pipeline that's just much, much faster, but at the same time, when it's like, replicate, it sounds like that goes without transformations. So help us sort of understand that nuance. >> Yeah, that's a great question, George. And indeed in the past few years, customers have been focused predominantly on getting the data to the Lake. I actually think it's one of the changes in the fame, we're hearing here in the show and the last few months is, how do we move to start using the data, the great applications on the data. So we're kind of moving to the next step, in the last few years we focused a lot on innovating and creating the solutions that facilitate and accelerate the process of getting data to the Lake, from a large scope of systems, including complex ones like SAP, and also making the process of doing that easier, providing real-time data that can both feed streaming architectures as well as batch ones. So once we got that covered, to your question, is what happens next, and one of the things we found, I think Verizon is also looking at it now and are being concomitant later. What we're seeing is, when you bring data in, and you want to adopt the streaming, or a continuous incremental type of data ingestion process, you're inherently building an architecture that takes what was originally a database, but you're kind of, in a sense, breaking it apart to partitions, as you're loading it over time. So when you land the data, and Arvind was referring to a swamp, or some customers refer to it as a landing zone, you bring the data into your Lake environment, but at the first stage that data is not structured, to your point, George, in a manner that's easily consumable. Alright, so the next step is, how do we facilitate the next step of the process, which today is still very manual-driven, has custom development and dealing with complex structures. So we actually are very excited, we've introduced, in the show here, we announced a new product by Attunity, Compose for Hive, which extends our Data Lake solutions, and what Compose of Hive is exactly designed to do, is address part of the problem you just described, where's when the data comes in and is partitioned, what Compose for Hive does, is it reassembles these partitions, and it then creates analytic-ready data sets, back in Hive, so it can create operational data stores, it can create historical data stores, so then the data becomes formatted, in a matter that's more easily accessible for users, who want to use analytic tools, VI-tools, Tableau, Qlik, any type of tool that can easily access a database. >> Would there be, as a next step, whether led by Verizon's requirements or Attunity's anticipation of broader customer requirements, something where, there's a, if not near real-time, but a very low latency landing and transformation, so that data that is time-sensitive can join the historical data. >> Absolutely, absolutely. So what we've done, is focus on real-time availability of data. So when we feed the data into the Data Lake, we fit it into ways, one is directly into Hive, but we also go through a streaming architecture, like Kafka, in the case of Hortonworks, can also fit also very well into HDF. So then the next step in the process, is producing those analytic data sets, or data source, out of it, which we enable, and what we do is design it together with our partners, with our inner customers. So again when we work on Replicate, then we worked on Compose, we worked very close with Fortune companies trying to deal with these challenges, so we can design a product. In the case of Compose for Hive for example, we have done a lot of collaboration, at a product engineering level, with Hortonworks, to leverage the latest and greatest in Hive 2.2, Hive LLAP, to be able to push down transformations, so those can be done faster, including real-time, so those datasets can be updated on a frequent basis. >> You talked about kind of customer requirements, either those specific or not, obviously talking to telecommunications company, are you seeing, Itamar, from Attunity's perspective, more of this need to... Alright, the data's in the Lake, or first it comes to the swamp, now it's in the Lake, to start partitioning it, are you seeing this need driven in specific industries, or is this really pretty horizontal? >> That's a good question and this is definitely a horizontal need, it's part of the infrastructure needs, so Verizon is a great customer, and we even worked similarly in telecommunications, we've been working with other customers in other industries, from manufacturing, to retail, to health care, to automotive and others, and in all of those cases it's on a foundation level, it's very similar architectural challenges. You need to ingest the data, you want to do it fast, you want to do it incrementally or continuously, even if you're loading directly into Hadoop. Naturally, when you're loading the data through a Kafka, or streaming architecture, it's a continuous fashon, and then you partition the data. So the partitioning of the data is kind of inherent to the architecture, and then you need to help deal with the data, for the next step in the process. And we're doing it both with Compose for Hive, but also for customers using streaming architectures like Kafka, we provide the mechanisms, from supporting or facilitating things like schema unpollution, and schema decoding, to be able to facilitate the downstream process of processing those partitions of data, so we can make the data available, that works both for analytics and streaming analytics, as well as for scenarios like microservices, where the way in which you partition the data or deliver the data, allows each microservice to pick up on the data it needs, from the relevant partition. >> Well guys, this has been a really informative conversation. Congratulations, Itamar, on the new announcement that you guys made today. >> Thank you very much. >> Lisa: Arvin, great to hear the use case and how Verizon really sounds quite pioneering in what you're doing, wish you continued success there, we look forward to hearing what's next for Verizon, we want to thank you for watching the CUBE, we are again live, day two, of the DataWorks summit, #DWS17, before me my co-host George Gilbert, I am Lisa Martin, stick around, we'll be right back. (relaxed techno music)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, and we are joined by a couple of guys, Thank you very much, good to be here, the Director of Technology Services for Verizon, at the DataWorks summit, So the reason we kind of started looking at this, that people are going to hear tomorrow and the tools, as well as to kind of close the loop on, than the typical stories we hear about Data Lakes. and bring it into the same kind of data models. So it sounds like the ETL process and the nuances that they have over the years, how the Data Lake made that easier. do the ETL and build reports for the to consume, and it's got to inter-operate, and it is able to bring all the data and it sounds like together with Attunity, and the CDC architecture, we're able to get data in, and the analytic system, getting the data to the Lake. can join the historical data. like Kafka, in the case of Hortonworks, Alright, the data's in the Lake, You need to ingest the data, you want to do it fast, Congratulations, Itamar, on the new announcement Lisa: Arvin, great to hear the use case

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Arvind Rajagopalan	PERSON	0.99+
Arvind	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Itamar Ankorion	PERSON	0.99+
Lisa	PERSON	0.99+
George	PERSON	0.99+
Itamar	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
tomorrow	DATE	0.99+
Kafka	TITLE	0.99+
three	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Cube	ORGANIZATION	0.99+
Arvin	PERSON	0.99+
DataWorks Summit	EVENT	0.99+
SAP HANA	TITLE	0.99+
One	QUANTITY	0.99+
each	QUANTITY	0.99+
yesterday	DATE	0.99+
#DWS17	EVENT	0.99+
one	QUANTITY	0.98+
a day and a half	QUANTITY	0.98+
CDC	ORGANIZATION	0.98+
first stage	QUANTITY	0.98+
Tableau	TITLE	0.98+
DataWorks Summit 2017	EVENT	0.98+
Attunity	ORGANIZATION	0.98+
Hive	TITLE	0.98+
both	QUANTITY	0.98+
Attunity	PERSON	0.98+
DataWorks	EVENT	0.97+
today	DATE	0.97+
Compose for Hive	ORGANIZATION	0.97+
Compose	ORGANIZATION	0.96+
Hive 2.2	TITLE	0.95+
Qlik	TITLE	0.94+
Hadoop	TITLE	0.94+
one place	QUANTITY	0.93+
day two	QUANTITY	0.92+
each microservice	QUANTITY	0.9+
first	QUANTITY	0.9+
20 years back	DATE	0.89+
#DataWorks	ORGANIZATION	0.87+
three major ERP systems	QUANTITY	0.83+
last 20 years	DATE	0.82+
PeopleSoft	ORGANIZATION	0.8+
Data Lake	COMMERCIAL_ITEM	0.8+
SAP	ORGANIZATION	0.79+

Ron Bodkin, Teradata - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, It's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. We are live at the DataWorks Summit on day two. We have had a great day and a half learning a lot about the next generation of big data, machine learning, artificial intelligence, I'm Lisa Martin, and my co-host is George Gilbert. We are next joined by a CUBE alumni, Ron Bodkin, the VP and General Manager of Artificial Intelligence for Teradata. Welcome back to theCUBE! >> Well thank you Lisa, it's nice to be here. >> Yeah, so talk to us about what you're doing right now. Your keynote is tomorrow. >> Ron: Yeah. >> What are you doing, what is Teradata doing in helping customers to be able to leverage artificial intelligence? >> Sure, yeah so as you may know, I ha`ve been involved in this conference and the big data space for a long time as the founding CEO of Think Big Analytics. We were involved in really helping customers in the beginning of big data in the enterprise. And so, we are seeing a very similar trend in the space of artificial intelligence, right? The rapid advances in recent years in deep learning have opened up a lot of opportunity to really create value from all the data the customers have in their data ecosystems, right? So Teradata has a big role to play in having high quality product, Teradata database, analytic ecosystem products such as Hadoop, such as QueryGrid for connecting these systems together, right? So what we're seeing is our customers are very excited by artificial intelligence, but what we're really focused on is how do they get to the value, right? What can they do that's really going to get results, right? And we bring this perspective of having this strong solutions approach inside of Teradata, and so we have Think Big Analytics consulting for data science, we now have been building up experts in deep learning in that organization, working with customers, right? We've brought product functionality so we're innovating around how do we keep pushing the Teradata product family forward with functionality around streaming with listeners. Functionality like the ability to, how do you take GPU and start to think about how can we add that and make that deploy efficiently inside our customer's data center. How can you take advantage of innovation in open source with projects like TensorFlow and Keras becoming important for our customers. So we're seeing is a lot of customers are excited about use cases for artificial intelligence. And tomorrow in the keynote I'm going to touch on a few of them, ranging from applications like preventative maintenance, anti-fraud in banking, to e-commerce recommendations and we're seeing those are some of the examples of use cases where customers are saying hey, there's a lot of value in combining traditional machine learning, wide learning, with deep learning using neural nets to generalize. >> Help us understand if there's an arc where there's the mix of what's repeatable and what's packagable, or what's custom, how that changes over time, or whether it's just by solution. >> Yeah, it's a great question. Right, I mean I think there's a lot of infrastructure that any of these systems need to rest on. So having data infrastructure, having quality data that you can rely on is foundational, and so you need to get that installed and working well as a beginning point. Obviously having repeatable products that manage data with high SLAs and supporting not use production use, but also how do you let data scientists analyze data in a lab and make that work well. So there's that foundational data layer. Then there's the whole integration of the data science into applications, which is critical, analytics, ops, agile ways of making it possible to take the data and build repeatable processes, and those are very horizontal, right? There's some variation, but those work the same in a lot of use cases. At this stage, I'd say, in deep learning, just like in machine learning generally, you still have a lot of horizontal infrastructure. You've got Spark, you've got TensorFlow, those are support use case across many industries. But then you get to the next level, you get specific problems, and there's a lot of nuance. What modeling techniques are going to work, what data sets matter? Okay, you've got time series data and a problem like fraud. What techniques are going to make that work well? And recommendations, you may have a long tail of items to think about recommending. How do you generalize across the long tail where you can't learn. People who use some relatively small thing or go to an obscure website, or buy an obscure product, there's not enough data to say are they likely to buy something else or do something else, but how do you categorize them so you get statistical power to make useful recommendations, right? Those are things that are very specific that there's a lot of repeatability and a specific solution area of. >> This is, when you talk about the data assets that might be specific to a customer and then I guess some third party or syndicated sources. If you have an outcome in mind, but not every customer has the same inventory of data, so how do you square that circle? >> That's a great question. And I really think that's a lot of the opportunity in the enterprise of applying analytics, so this whole summit DataWorks is about hey, the power of your data. What you can get by collecting your data in a well-managed ecosystem and creating value. So, there's always a nuance. It's like what's happening in your customers, what's your business process, what's special about how you interact, what's the core of your business? So I guess my view is that the way anybody that wants to be a winner in this new digital era and have processes that take advantage of artificial intelligence is going to have to use data as a competitive advantage and build on their unique data. So because we see a lot of times enterprises struggle with this. There's a tendency to say hey, can we just buy a package off the shelf SaaS solution and do that? And for context, for things that are the same for everybody in an industry, that's a great choice. But if you're doing that for your core differentiation of your business, you're in deep trouble in this digital era. >> And that's a great place, sorry George, really quickly. That this day and age, every company is a technology company. You mentioned a use case in banking, fraud detection, which is huge. There's tremendous value that can be gleaned from artificial intelligence, and there's also tremendous risk to them. I'm curious, maybe just kind of a generalization. Where are your customers on this journey in terms of have they, are you going out to customers that have already embraced Hadoop and have a significant amount of data that they say, all right, we've got a lot of data here, we need to understand the context. Where are customers in that maturity evolution? >> Sure, so I'd say that we're really fast-approaching the slope of enlightenment for Hadoop, which is to say the enthusiasm of three years ago when people thought Hadoop was going to do everything have kind of waned and there's now more of an appreciation, like there's a lot of value in having a data warehouse for high value curated data for large-scale use. There's a lot of value in having a data lake of fairly raw data that can be used for exploration in the data science arena. So there's emerging, like what is the best architecture for streaming and how do you drive realtime decisions, and that's still very much up in the air. So I'd say that most of our customers are somewhere on that journey, I think that a lot of them have backed off from their initial ambitions that they bought a little too much of the hype of all that Hadoop might do and they're realizing what it is good for, and how they really need to build a complementary ecosystem. The other thing I think is exciting though is I see the conversation is moving from the technology to the use cases. People are a lot more excited about how can we drive value and analytics, and let's work backwards from the analytics value to the data that's going to support it. >> Absolutely. >> So building on that, we talk about sort of what's core and if you can't have something completely repeatable that's going to be core to your sustainable advantage, but if everyone is learning from data, how does a customer achieve a competitive advantage or even sustain a competitive advantage? Is it orchestrating learning that feeds, that informs processes all across the business, or is it just sort of a perpetual Red Queen effect? >> Well, that's a great question. I mean, I think there's a few things, right? There's operational excellence in every discipline, so having good data scientists, having the right data, collecting data, thinking about how do you get network effects, those are all elements. So I would say there's a table-stakes aspect that if you're not doing this, you're in trouble, but then if you are it's like how do you optimize and lift your game and get better at it? So that's an important fact that you see companies that say how do we acquire data? Like one of the things that you see digital disruptors, like a Tesla, doing is changing the game by saying we're changing the way we work with our customers to get access to the data. Think of the difference between every time you buy a Tesla you sign over the rights for them to collect and use all your data, when the traditional auto OEMs are struggling to get access to a lot of the data because they have intermediaries that control the relationship and aren't willing to share. And a similar thing in other industries, you see in consumer packaged goods. You see a lot of manufacturers there are saying how do we get partnerships, how do we get more accurate data? The old models of going out to the Nielsens of the world and saying give us aggregates, and we'll pay you a lot to give us a summary report, that's not working. How do we learn directly in a digital world about our consumers so we can be more relevant? So one of the things is definitely that control of data and access to data, as well as we see a lot of companies saying what are the acquisitions we can make? What are start ups and capabilities that we can plug in, and complement to get data, to get analytic capability that we can then tailor for our needs? >> It's funny that you mention Tesla having more cars on the road, collecting more data than pretty much anyone else at this point. But then there's like Stanford's sort of luminary for AI, Fei-Fei Li. She signed on I think with Toyota, because she said they sell 10 million cars a year, I'm going to be swimming in data compared to anyone else, possible exception of GM or maybe some Chinese manufacturer. So where does, how can you get around scale when using data at scale to inform your models? How would someone like a Tesla be able to get an end run around that? So that's the battle, the disruptor comes in, they're not at scale, but they maybe change the game in some way. Like having different terms that give them access to different kinds of data, more complete data. So that's sort of part of the answer, is to disrupt an industry you need a strategy what's different, right, like in Tesla's case an electric vehicle. And they've been investing in autonomous vehicles with AI, of course everybody in the industry is seeing that and is racing. I mean, Google really started that whole wave going a long time ago as another potential disruptor coming in with their own unique data asset. So, I think it's all about the combination of capabilities that you need. Disruptors often bring a commitment to a different business process, and that's a big challenge is a lot of times the hardest things are the business processes that are entrenched in existing organizations and disruptors can say we're rethinking the way this gets done. I mean, the example of that in ride sharing, the Ubers and Lyfts of the world, deities where they are re-conceiving what does it mean to consume automobile services. Maybe you don't want to own a car at all if you're a millennial, maybe you just want to have access to a car when you need to go somewhere. That's a good example of a disruptive business model change. >> What are some things that are on the intermediate-term horizon that might affect how you go about trying to create a sustainable advantage? And here I mean things like where deep learning might help data scientists with feature engineering so there's less need for, you can make data scientists less of a scarce resource. Or where there's new types of training for models where you need less data? Those sorts of things might disrupt the practice of achieving an advantage with current AI technology. >> You know, that's a great question. So near-term, the ability to be more efficient in data science is a big deal. There's no surprise that there's a big talent gap, big shortage of qualified data scientists in the enterprise and one of the things that's exciting is that deep learning lets you get more information out of the data, so it learns more so that you'd have to do less future engineering. It's not like a magic box you just pour in raw data to deep learning and out comes the answers, so you still need qualified data scientists, but it's a force multiplier. There's less work to do in future engineering, and therefore you get better results. So that's a factor, you're starting to see things like a hyperparameter search where people will create neural networks that search for the best machine learning model, and again get another level of leverage. Now, today doing that is very expensive. The amount of hardware to do that, very few organizations are going to spend millions of dollars to sort of automate the discovery of models, but things are moving so fast. I mean, even just in the last six weeks to have Nvidia and Google both announce significant breakthroughs in hardware. And I just had a colleague forward me a paper for recent research that says hey this technique could produce a hundred times faster results in deep learning convergence. So you've got rapid advances in investment in the hardware and the software. Historically software improvements have outstripped hardware improvements throughout the history of computing, so it's quite reasonable to expect you'll have 10 thousand times the price performance for deep learning in five years. So things that today might cost a hundred million dollars and no one would do, could cost 10 thousand dollars in five years, and suddenly it's a no-brainer to apply a technique like that to automate something instead of hiring more scarce data scientists that are hard to find, and make the data scientists more productive so they're spending more time thinking about what's going on and less time trying out different variations of how do I configure this thing, does this work, does this, right? >> Oh gosh, Ron, we could keep chatting away. Thank you so much for stopping by theCUBE again, we wish you the best of luck in your keynote tomorrow. I think people are going to be very inspired by your passion, your energy, and also the tremendous opportunity that is really sitting right in front of us. >> Thank you, Lisa, it's a very exciting time to be in the data industry, and the emergence of AI and the enterprise, I couldn't be more excited by it. >> Oh, excellent, well your excitement is palpable. We want to thank you for watching. We are live on theCUBE at the DataWorks Summit day 2, #dws17. For my cohost George Gilbert, I'm Lisa Martin, stick around. We'll be right back. (upbeat electronic melody)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at the DataWorks Summit on day two. Yeah, so talk to us about what you're doing right now. Functionality like the ability to, how do you take GPU and what's packagable, or what's custom, how that changes of infrastructure that any of these systems need to rest on. that might be specific to a customer There's a tendency to say hey, can we just buy a package are you going out to customers that have already embraced conversation is moving from the technology to the use cases. Like one of the things that you see digital disruptors, So that's sort of part of the answer, is to disrupt horizon that might affect how you go about So near-term, the ability to be more efficient we wish you the best of luck in your keynote tomorrow. and the emergence of AI and the enterprise, We want to thank you for watching.

ENTITIES

Entity	Category	Confidence
Toyota	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Nvidia	ORGANIZATION	0.99+
Lisa Martin	PERSON	0.99+
George	PERSON	0.99+
Ron Bodkin	PERSON	0.99+
Google	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
Tesla	ORGANIZATION	0.99+
Ron	PERSON	0.99+
San Jose	LOCATION	0.99+
five years	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
10 thousand dollars	QUANTITY	0.99+
GM	ORGANIZATION	0.99+
Stanford	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Ubers	ORGANIZATION	0.99+
Think Big Analytics	ORGANIZATION	0.99+
10 thousand times	QUANTITY	0.99+
tomorrow	DATE	0.99+
one	QUANTITY	0.99+
DataWorks Summit	EVENT	0.98+
CUBE	ORGANIZATION	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
three years ago	DATE	0.98+
DataWorks Summit 2017	EVENT	0.97+
Hadoop	TITLE	0.97+
Lyfts	ORGANIZATION	0.97+
#dws17	EVENT	0.96+
10 million cars a year	QUANTITY	0.96+
theCUBE	ORGANIZATION	0.95+
millions of dollars	QUANTITY	0.94+
hundred times	QUANTITY	0.92+
Nielsens	ORGANIZATION	0.91+
last six weeks	DATE	0.89+
Spark	TITLE	0.88+
day two	QUANTITY	0.86+
Hortonworks	ORGANIZATION	0.83+
a hundred million dollars	QUANTITY	0.81+
Fei-Fei Li	COMMERCIAL_ITEM	0.8+
TensorFlow	TITLE	0.77+
Chinese	OTHER	0.75+
Teradata -	EVENT	0.67+
QueryGrid	ORGANIZATION	0.64+
DataWorks	ORGANIZATION	0.63+
things	QUANTITY	0.61+
a half	QUANTITY	0.55+
Keras	TITLE	0.53+
Hadoop	ORGANIZATION	0.44+
2	DATE	0.4+
day	QUANTITY	0.35+

Raj Verma, Hortonworks - DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to by Hortonworks. >> Welcome back to theCUBE, we are live, on day two of the DataWorks Summit. I'm Lisa Martin. #DWS17, join the conversation. We've had a great day and a half. We have learned from a ton of great influencers and leaders about really what's going on with big data, data science, how things are changing. My cohost is George Gilbert. We're joined by my old buddy, the COO of Hortonworks, Rajnish Verma. Raj, it's great to have you on theCUBE. >> It's great to be here, Lisa. Great to see you as well, it's been a while. >> It has, so yesterday on the customer panel, the Raj I know had great conversation with customers from, Duke Energy was one. You also had Black Knight on the financial services side. >> Rajnish: And HSC. >> Yes, on the insurance side, and one of the things that, a couple things that really caught my attention, one was when Duke said, kind of, where they were using data and moving to Hadoop, but they are now a digital company. They're now a technology company that sells electricity and products, which I thought was fantastic. Another thing that I found really interesting about that was they all talked about the need to leverage big data, and glean insights and monetize that, really requires this cultural shift. So I know you love customer interactions. Talk to us about what you're seeing. Those are three great industry examples. What are you seeing? Where are customers on this sort of maturity model where big data and Hadoop are concerned? >> Sure, happy to. So one thing that I enjoy the most about my job is meeting customers and talking to them about the art of the possible. And some of the stuff that they're doing, and, which was only science fiction, really, about two or three years ago. And they're a couple of questions that you've just asked me as to where they are on their journey, what are they trying to accomplish, et cetera. I remember about, five, seven, 10 years ago where Marc Andreessen said "Software is eating the world." And to be honest with you, now, it's now more like every company is a data company. I wouldn't say data is eating the world, but without effective monetization of your data assets, you can't be a force to reckon with as a company. So that is a common theme that we are seeing irrespective of industry, irrespective of customer, irrespective of really the size of the customer. The only thing that sort of varies is the amount and complexity of data, from one company to the other. Now, when, I'm new to Hortonworks as you know. It's really my fifth month here. And one of the things that I've seen and, Lisa, as you know, are coming from TIBCO. So we've been dealing with data. I have been involved with data for over a decade and a half now, right. So the difference was, 15 years ago, we were dealing with really structured data and we actually connected the structured data and gleaned insights into structured data. Now, today, a seminal challenge that every CIO or chief data officer is trying to solve is how do you get actionable insights into semi-structured and unstructured data. Now, so, getting insights into that data first requires ability to aggregate data, right. Once you've aggregated data, you also need a platform to make sense of data in real-time, that is being streamed at you. Now once you do those two things, then you put yourself in a position to analyze that data. So in that journey, as you asked, where our customers are. Some are defining their data aggregation strategy. The others, having defined data aggregation, they're talking about streaming analytics as a platform, and then the others are talking about data science and machine learning and deep learning, as a journey. Now, you saw the customer panel yesterday. But the one point I'd like to make is, it's not only the Duke Energies and the Black Knights of the world, or the HSC, who I believe are big, large firms that are using data. Even a company like, an old agricultural company, or I shouldn't say old but steeped in heritage is probably the right word. 96, 97 year old agricultural company that's in the animal feed business. Animal feed. Multi-billion dollar animal feed business. They use data to monetize their business model. What they say is, they've been feeding animals for the last 70 years. Sp now they go to a farmer and they have enough data about how to feed animals, that they can actually tell the farmer, that this hog that you have right now, which is 17 pounds, I can guarantee you that I will have him or her on a nutrition that, by four months, it'll be 35 pounds. How much are you willing to pay? So even in the animal feed business, data is being used to drive not only insights, but monetization models. >> Wow. >> So. >> That's outstanding. >> Thank you. >> So in getting to that level of sophistication, it's not like every firm sort of has the skills and technology in place to do that. What are some of the steps that you find that they typically have to go through to get to that level of maturity? Like, where do they make mistakes? Where do they find the skills to manage on-prem infrastructure, if it is on-premmed? What about, if they're trying to do a hybrid cloud setup. How complex is that? >> I think that's where the power of the community comes through at multiple levels. So we're committed to the open-source movement. We're committed to the community-based development of data. Now, this community-based business model does a few things. Firstly, it keeps the innovation at the leading edge, bleeding edge, number one. But as you heard the panel talk about yesterday, one of the biggest benefits that our customers see of using open source, is, sure economics is good, but that's not the leading reason. Keeping up with innovation, very high up there. Avoiding when to lock in, again very, very high up there. But one of the biggest reasons that CIOs gave me for choosing open source as a business model is more to do with the fact that they can attract good talent, and without open source, you can't actually attract talent. And I can relate to that because I have a sophomore at home. And it just happened to me that she's 15 now but she's been using open source since she was 11. The iPhone and, she downloads an application for free. She uses it, and if she stretches the limit of that, then she orders something more in a paid model. So the community helps people do a few things. Be able to fail fast if they need to. The second is, it lowers the barriers of entry, right. Because it's really free. You can have the same model. The third is, you can rely on the community for support and methodologies and best practices and lessons learned from implementations. The fourth is, it's a great hiring ground in terms of bringing people in and attracting Millennial talent, young talent, and sought-after talent. So that's really probably the answer that I would have for that. >> When you talk about the business model, the open-source business model and the attraction on the customer side, that sounded like there's this analogy with sort of the agro-business customer in the sense that there are offering data along with their traditional product. If your traditional product is open-source data management, what a room started telling us this morning was the machine learning that goes along with operating not only your own sort of internal workloads but customers, and being to offer prescriptive advice on operations, essentially IT operations. Is that the core, will that become the core of sort of value-add through data for an open-source business model like yours? >> I don't want to be speculative but I'll probably answer it another way. I think our vision, which was set by our founder Rob Bearden, and he took you guys through that yesterday, was way back when, we did say that our mission in life is to manage the world's data. So that mission hasn't changed. And the second was, we would do it as a open-source community or as a big contributing part of that community. And that has really not changed. Now, we feel that machine learning and data science and deep learning are areas that we're very, very excited about, our customers are very, very excited about. Now, the one thing that we did cover yesterday and I think earlier today as well, I'm a computer science engineer. And when I was in college, way back when, 25 years ago, I was interested in AI and ML. And it has existed for 50 years. The reason why it hasn't been available to the common man, so as to speak, is because of two reasons. One is, it did not have a source of data that it could sit on top of, that makes machine learning and AI effective. Or at least not a commercially-viable option to do so. Now, there is one. The second is, the compute power required to run some of the large algorithms that really give you insights into machine learning and AI. So we've become the platform on which customers can take advantage of excellent machine learning and AI tools to get insights. Now, that is two independent sort of categories. One is the open source community providing the platform. And then what tools the customer has used to apply data science and machine learning, so. >> So, all right. I'm thinking something that is slightly different and maybe the nuance is making it tough to articulate. But it's how can Hortonworks take the data platform and data science tools that you use to help understand how to operate important works, whether it's on a customer prem, or in the cloud. In other words, how can you use machine learning to make it a sort of a more effective and automated manage service? >> Yeah, and I think that's, the nuance's not lost in me. I think what I'm trying to sort of categorize is, for that to happen, you require two things. One is data aggregator across on-prem and cloud. Because when you have data which is multi-tenancy, you have a lot of issues with data security, data governance, all the rest of it. Now, that is what we plan to manage for the world, so as to speak. Now, on top of that, customers who require to have data science or deep learning to be used, we provide that platform. Now, whether that is used as a service by the customer, which we would be happy to provide, or it is used inhouse, on-prem, on various cloud models, that's more a customer decision. We don't want to force that decision. However, from the art of the possible perspective, yes it's possible. >> I love the mission to manage the world's data. >> Thank you. >> That's a lofty goal, but yesterday's announcements with IBM were pretty, pretty transformative. In your opinion as chief operating officer, how do you see this extension of this technology and strategic partnership helping Hortonworks on the next level of managing the world's data? >> Absolutely, it's game-changing for us. We're very, very excited. Our colleagues are very, very excited about the opportunity to partner. It's also a big validation of the fact that we now have a pretty large open-source community that contributes to this cause. So we're very excited about that. The opportunity is in actually our partnering with a leader in data science, machine learning, and AI, a company that has steeped in heritage, is known for game-changing, next technology moves. And the fact that we're powering it from a data perspective is something that we're very, very excited and pleased about. And the opportunities are limitless. >> I love that, and I know you are a game-changer, in your fifth month. We thank you so much, Raj, for joining us. It was great to see you. Continued success, >> Thank you. >> at managing the world's data and being that game-changer, yourself, and for Hortonworks as well. >> Thank you Lisa, good to see you. >> You've been watching theCUBE. Again, we're live, day two of the DataWorks Summit, #DWS17. For my cohost, George Gilbert, I'm Lisa Martin. Stick around guys, we'll be right back with more great content. (jingle)

Published Date : Jun 14 2017

SUMMARY :

in the heart of Silicon Valley, Raj, it's great to have you on theCUBE. Great to see you as well, it's been a while. You also had Black Knight on the financial services side. Yes, on the insurance side, and one of the things that, But the one point I'd like to make is, What are some of the steps that you find is more to do with the fact that they can attract and the attraction on the customer side, Now, the one thing that we did cover yesterday and maybe the nuance is making it tough to articulate. for that to happen, you require two things. on the next level of managing the world's data? about the opportunity to partner. I love that, and I know you are a game-changer, at managing the world's data of the DataWorks Summit, #DWS17.

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Marc Andreessen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
Lisa	PERSON	0.99+
TIBCO	ORGANIZATION	0.99+
Duke Energies	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
35 pounds	QUANTITY	0.99+
Raj	PERSON	0.99+
Rob Bearden	PERSON	0.99+
50 years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
17 pounds	QUANTITY	0.99+
fifth month	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Rajnish Verma	PERSON	0.99+
HSC	ORGANIZATION	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
15	QUANTITY	0.99+
four months	QUANTITY	0.99+
One	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Black Knights	ORGANIZATION	0.99+
Duke	ORGANIZATION	0.99+
two reasons	QUANTITY	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Firstly	QUANTITY	0.99+
second	QUANTITY	0.99+
third	QUANTITY	0.99+
one company	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
DataWorks Summit	EVENT	0.98+
three	QUANTITY	0.98+
#DWS17	EVENT	0.98+
Multi-billion dollar	QUANTITY	0.98+
fourth	QUANTITY	0.98+
one thing	QUANTITY	0.98+
today	DATE	0.97+
15 years ago	DATE	0.97+
11	QUANTITY	0.96+
this morning	DATE	0.95+
25 years ago	DATE	0.95+
one point	QUANTITY	0.94+
day two	QUANTITY	0.93+
Rajnish	PERSON	0.93+
first	QUANTITY	0.93+
five	DATE	0.91+
three years ago	DATE	0.91+
theCUBE	ORGANIZATION	0.9+
96, 97 year old	QUANTITY	0.89+
Hortonworks - DataWorks Summit 2017	EVENT	0.87+
earlier today	DATE	0.87+
COO	PERSON	0.86+
10 years ago	DATE	0.86+
about two	DATE	0.84+
seven	DATE	0.8+
couple	QUANTITY	0.8+
Hadoop	ORGANIZATION	0.75+
over a decade and a half	QUANTITY	0.72+
last 70 years	DATE	0.69+

Josh Klahr & Prashanthi Paty | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. Day two of the DataWorks Summit, I'm Lisa Martin with my cohost, George Gilbert. We've had a great day and a half so far, learning a ton in this hyper-growth, big data world meets IoT, machine learning, data science. George and I are excited to welcome our next guests. We have Josh Klahr, the VP of Product Management from AtScale. Welcome George, welcome back. >> Thank you. >> And we have Prashanthi Paty, the Head of Data Engineering for GoDaddy. Welcome to theCUBE. >> Thank you. >> Great to have you guys here. So, wanted to kind of talk to you guys about, one, how you guys are working together, but two, also some of the trends that you guys are seeing. So as we talked about, in the tech industry, it's two degrees of Kevin Bacon, right. You guys worked together back in the day at Yahoo. Talk to us about what you both visualized and experienced in terms of the Hadoop adoption maturity cycle. >> Sure. >> You want to start, Josh? >> Yeah, I'll start, and you can chime in and correct me. But yeah, as you mentioned, Prashanthi and I worked together at Yahoo. It feels like a long time ago. In our central data group. And we had two main jobs. First job was, collect all of the data from our ad systems, our audience systems, and stick that data into a Hadoop cluster. At the time, we were kind of doing it while Hadoop was kind of being developed. And the other thing that we did was, we had to support a bunch of BI consumers. So we built cubes, we built data marts, we used MicroStrategy, Tableau, and I would say the experience there was a great experience with Hadoop in terms of the ability to have low-cost storage, scale out data processing of all of, what were really, billions and billions, tens of billions of events a day. But when it came to BI, it felt like we were doing stuff the old way. And we were moving data off cluster, and making it small. In fact, you did a lot of that. >> Well, yeah, at the end of the day, we were using Hadoop as a staging layer. So we would process a whole bunch of data there, and then we would scale it back, and move it into, again, relational stores or cubes, because basically we couldn't afford to give any accessibility to BI tools or to our end users directly on Hadoop. So while we surely did a large-scale data processing in Hadoop layer, we failed to turn on the insights right there. >> Lisa: Okay. >> Maybe there's a lesson in there for folks who are getting slightly more mature versions of Hadoop now, but can learn from also some of the experiences you've had. Were there issues in terms of, having cleaned and curated data, were there issues for BI with performance and the lack of proper file formats like Parquet? What was it that where you hit the wall? >> It was both, you have to remember this, we were probably one of the first teams to put a data warehouse on Hadoop. So we were dealing with Pig versions of like, 0.5, 0.6, so we were putting a lot of demand on the tooling and the infrastructure. Hadoop was still in a very nascent stage at that time. That was one. And I think a lot of the focus was on, hey, now we have the ability to do clickstream analytics at scale, right. So we did a lot of the backend stuff. But the presentation is where I think we struggled. >> So would that mean that you did do, the idea is that you could do full resolution without sampling on the backend, and then you would extract and presumably sort of denormalize so that you could, essentially run data match for subject matter interests. >> Yeah, and that's exactly what we did is, we took all of this big data, but to make it work for BI, which were two things, one was performance. It was really, can you get an interactive query and response time. And the other thing was the interface. Can a Tableau user connect and understand what they're looking at. You had to make the data small again. And that was actually the genesis of AtScale, which is where I am today, was, we were frustrated with this, big data platform and having to then make the data small again in order to support BI. >> That's a great transition, Josh. Let's actually talk about AtScale. You guys saw BI on Hadoop as this big white space. How have you succeeded there, and then let's talk about what GoDaddy is doing with AtScale and big data. >> Yeah, I think that we definitely learned, we took the learnings from our experience at Yahoo, and we really thought about, if we were to start from scratch, and solve the problem the way we wanted it to be solved, what would that system look like. And it was a few things. One was an interface that worked for BI. I don't want to date myself, but my experience in the software space started with OLAP. And I can tell you OLAP isn't dead. When you go and talk to an enterprise, a fortune 1000 enterprise and you talk about OLAP, that's how they think. They think in terms of measures and dimensions and hierarchies. So one important thing for us was to project an OLAP interface on top of data that's Hadoop native. It's Hive tables, Parquet, ORC, you kind of talk about all of the mess that may sit underneath the covers. So one thing was projecting that interface, the other thing was delivering performance. So we've invested a lot in using the Hadoop cluster natively to deliver performing queries. We do this by creating aggregate tables and summary tables and being smart about how we route queries. But we've done it in a way that makes a Hadoop admin very happy. You don't have to buy a bunch of AtScale servers in addition to your Hadoop cluster. We scale the way the Hadoop cluster scales. So we don't require separate technology. So we fit really nicely into that Hadoop ecosystem. >> So how do you make, making the Hadoop admin happy is a good thing. How do you make the business user happy, who needs now, as we were here yesterday, to kind of merge more with the data science folks to be able to understand or even have the chance to articulate, "These are the business outcomes "we want to look for and we want to see." How do you guys, maybe, under the hood, if you will, AtScale, make the business guys and gals happy? >> I'll share my opinion and then Prashanthi can comment on her experience but, as I've mentioned before, the business users want an interface that's simple to use. And so that's one thing we do, is, we give them the ability to just look at measures and dimensions. If I'm a business, I grew up using Excel to do my analysis. The thing I like most as an analyst is a big fat wide table. And so that's what, we make an underlying Hadoop cluster and what could be tens or hundreds of tables look like a single big fat wide table for a data analyst. You talk to a data scientist, you talk to a business analyst, that's the way they want to view the world. So that's one thing we do. And then, we give them response times that are fast. We give them interactivity, so that you could really quickly start to get a sense of the shape of the data. >> And allowing them to get that time to value. >> Yes. >> I can imagine. >> Just a follow-up on that. When you have to prepare the aggregates, essentially like the cubes, instead of the old BI tools running on a data mart, what is the additional latency that's required from data coming fresh into the data lake and then transforming it into something that's consumption ready for the business user? >> Yeah, I think I can take that. So again, if you look at the last 10 years, in the initial period, certainly at Yahoo, we just threw engineering resources at that problem, right. So we had teams dedicated to building these aggregates. But the whole premise of Hadoop was the ability to do unstructured optimizations. And by having a team find out the new data coming in and then integrating that into your pipeline, so we were adding a lot of latency. And so we needed to figure out how we can do this in a more seamless way, in a more real-time way. And get the, you know, the real premise of Hadoop. Get it at the hands of our business users. I mean, I think that's where AtScale is doing a lot of the good work in terms of dynamically being able to create aggregates based on the design that you put in the cube. So we are starting to work with them on our implementation. We're looking forward to the results. >> Tell us a little bit more about what you're looking to achieve. So GoDaddy is a customer of AtScale. Tell us a little bit more about that. What are you looking to build together, and kind of, where are you in your journey right now? >> Yeah, so the main goal for us is to move beyond predefined models, dashboards, and reports. So we want to be more agile with our schema changes. Time to market is one. And performance, right. Ability to put BI tools directly on top of Hadoop, is one. And also to push as much of the semantics as possible down into the Hadoop layer. So those are the things that we're looking to do. >> So that sounds like a classic business intelligence component, but sort of rethought for a big data era. >> I love that quote, and I feel it. >> Prashanthi: Yes. >> Josh: Yes. (laughing) >> That's exactly what we're trying to do. >> But that's also, some of the things you mentioned are non-trivial. You want to have this, time goes in to the pre-processing of data so that it's consumable, but you also wanted it to be dynamic, which is sort of a trade-off, which means, you know, that takes time. So is that a sort of a set of requirements, a wishlist for AtScale, or is that something that you're building on your own? >> I think there's a lot happening in that space. They are one of the first people to come out with their product, which is solving a real problem that we tried to solve for a long time. And I think as we start using them more and more, we'll surely be pushing them to bring in more features. I think the algorithm that they have to dynamically generate aggregates is something that we're giving quite a lot of feedback to them on. >> Our last guest from Pentaho was talking about, there was, in her keynote today, the quote from I think McKinsey report that said, "40% of machine learning data is either not fully "exploited or not used at all." So, tell us, kind of, where is big daddy regarding machine learning? What are you seeing? What are you seeing at AtScale and how are you guys going to work together to maybe venture into that frontier? >> Yeah, I mean, I think one of the key requirements we're placing on our data scientists is, not only do you have to be very good at your data science job, you have to be a very good programmer too to make use of the big data technologies. And we're seeing some interesting developments like very workload-specific engines coming into the market now for search, for graph, for machine learning, as well. Which is supposed to give the tools right into the hands of data scientists. I personally haven't worked with them to be able to comment. But I do think that the next realm on big data is this workload-specific engines, and coming on top of Hadoop, and realizing more of the insights for the end users. >> Curious, can you elaborate a little more on those workload-specific engines, that sounds rather intriguing. >> Well, I think interactive, interacting with Hadoop on a real-time basis, we see search-based engines like Elasticsearch, Solr, and there is also Druid. At Yahoo, we were quite a bit shop of Druid actually. And we were using it as an interactive query layer directly with our applications, BI applications. This is our JavaScript-based BI applications, and Hadoop. So I think there are quite a few means to realize insights from Hadoop now. And that's the space where I see workload-specific engines coming in. >> And you mentioned earlier before we started that you were using Mahout, presumably for machine learning. And I guess I thought the center of gravity for that type of analytics has moved to Spark, and you haven't mentioned Spark yet. We are not using Mahout though. I mentioned it as something that's in that space. But yeah, I mean, Spark is pretty interesting. Spark SQL, doing ETL with Spark, as well as using Spark SQL for queries is something that looks very, very promising lately. >> Quick question for you, from a business perspective, so you're the Head of Engineering at GoDaddy. How do you interact with your business users? The C-suite, for example, where data science, machine learning, they understand, we have to have, they're embracing Hadoop more and more. They need to really, embracing big data and leveraging Hadoop as an enabler. What's the conversation like, or maybe even the influence of the GoDaddy business C-suite on engineering? How do you guys work collaboratively? >> So we do have very regular stakeholder meeting. And these are business stakeholders. So we have representatives from our marketing teams, finance, product teams, and data science team. We consider data science as one of our customers. We take requirements from them. We give them peek into the work we're doing. We also let them be part of our agile team so that when we have something released, they're the first ones looking at it and testing it. So they're very much part of the process. I don't think we can afford to just sit back and work on this monolithic data warehouse and at the end of the day say, "Hey, here is what we have" and ask them to go get the insights from it. So it's a very agile process, and they're very much part of it. >> One last question for you, sorry George, is, you guys mentioned you are sort of early in your partnership, unless I misunderstood. What has AtScale help GoDaddy achieve so far and what are your expectations, say the next six months? >> We want the world. (laughing) >> Lisa: Just that. >> Yeah, but the premise is, I mean, so Josh and I, we were part of the same team at Yahoo, where we faced problems that AtScale is trying to solve. So the premise of being able to solve those problems, which is, like their name, basically delivering data at scale, that's the premise that I'm very much looking forward to from them. >> Well, excellent. Well, we want to thank you both for joining us on theCUBE. We wish you the best of luck in attaining the world. (all laughing) >> Josh: There we go, thank you. >> Excellent, guys. Josh Klahr, thank you so much. >> My pleasure. Prashanthi, thank you for being on theCUBE for the first time. >> No problem. >> You've been watching theCUBE live at the day two of the DataWorks Summit. For my cohost George Gilbert, I am Lisa Martin. Stick around guys, we'll be right back. (jingle)

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. George and I are excited to welcome our next guests. And we have Prashanthi Paty, Talk to us about what you both visualized and experienced And the other thing that we did was, and then we would scale it back, and the lack of proper file formats like Parquet? So we were dealing with Pig versions of like, the idea is that you could do full resolution And the other thing was the interface. How have you succeeded there, and solve the problem the way we wanted it to be solved, So how do you make, And so that's one thing we do, is, that's consumption ready for the business user? based on the design that you put in the cube. and kind of, where are you in your journey right now? So we want to be more agile with our schema changes. So that sounds like a classic business intelligence Josh: Yes. of data so that it's consumable, but you also wanted And I think as we start using them more and more, What are you seeing at AtScale and how are you guys and realizing more of the insights for the end users. Curious, can you elaborate a little more And we were using it as an interactive query layer and you haven't mentioned Spark yet. machine learning, they understand, we have to have, and at the end of the day say, "Hey, here is what we have" you guys mentioned you are sort of early We want the world. So the premise of being able to solve those problems, Well, we want to thank you both for joining us on theCUBE. Josh Klahr, thank you so much. for the first time. of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Josh	PERSON	0.99+
George	PERSON	0.99+
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Josh Klahr	PERSON	0.99+
Prashanthi Paty	PERSON	0.99+
Prashanthi	PERSON	0.99+
Lisa	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Kevin Bacon	PERSON	0.99+
San Jose	LOCATION	0.99+
Excel	TITLE	0.99+
Silicon Valley	LOCATION	0.99+
GoDaddy	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
yesterday	DATE	0.99+
AtScale	ORGANIZATION	0.99+
tens	QUANTITY	0.99+
Spark	TITLE	0.99+
Druid	TITLE	0.99+
First job	QUANTITY	0.99+
Hadoop	TITLE	0.99+
two	QUANTITY	0.99+
Spark SQL	TITLE	0.99+
today	DATE	0.99+
two degrees	QUANTITY	0.99+
both	QUANTITY	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
two things	QUANTITY	0.98+
Elasticsearch	TITLE	0.98+
first time	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.97+
first teams	QUANTITY	0.96+
Solr	TITLE	0.96+
Mahout	TITLE	0.95+
hundreds of tables	QUANTITY	0.95+
two main jobs	QUANTITY	0.94+
One last question	QUANTITY	0.94+
billions and	QUANTITY	0.94+
McKinsey	ORGANIZATION	0.94+
Day two	QUANTITY	0.94+
One	QUANTITY	0.94+
Parquet	TITLE	0.94+
Tableau	TITLE	0.93+

Arun Murthy, Hortonworks | DataWorks Summit 2017

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Good morning, welcome to theCUBE. We are live at day 2 of the DataWorks Summit, and have had a great day so far, yesterday and today, I'm Lisa Martin with my co-host George Gilbert. George and I are very excited to be joined by a multiple CUBE alumni, the co-founder and VP of Engineering at Hortonworks Arun Murthy. Hey, Arun. >> Thanks for having me, it's good to be back. >> Great to have you back, so yesterday, great energy at the event. You could see and hear behind us, great energy this morning. One of the things that was really interesting yesterday, besides the IBM announcement, and we'll dig into that, was that we had your CEO on, as well as Rob Thomas from IBM, and Rob said, you know, one of the interesting things over the last five years was that there have been only 10 companies that have beat the S&P 500, have outperformed, in each of the last five years, and those companies have made big bets on data science and machine learning. And as we heard yesterday, these four meta-trains IoT, cloud streaming, analytics, and now the fourth big leg, data science. Talk to us about what Hortonworks is doing, you've been here from the beginning, as a co-founder I've mentioned, you've been with Hadoop since it was a little baby. How is Hortonworks evolving to become one of those big users making big bets on helping your customers, and yourselves, leverage machine loading to really drive the business forward? >> Absolutely, a great question. So, you know, if you look at some of the history of Hadoop, it started off with this notion of a data lake, and then, I'm talking about the enterprise side of Hadoop, right? I've been working for Hadoop for about 12 years now, you know, the last six of it has been as a vendor selling Hadoop to enterprises. They started off with this notion of data lake, and as people have adopted that vision of a data lake, you know, you bring all the data in, and now you're starting to get governance and security, and all of that. Obviously the, one of the best ways to get value over the data is the notion of, you know, can you, sort of, predict what is going to happen in your world of it, with your customers, and, you know, whatever it is with the data that you already have. So that notion of, you know, Rob, our CEO, talks about how we're trying to move from a post-transactional world to a pre-transactional world, and doing the analytics and data sciences will be, obviously, with me. We could talk about, and there's so many applications of it, something as similar as, you know, we did a demo last year of, you know, of how we're working with a freight company, and we're starting to show them, you know, predict which drivers and which routes are going to have issues, as they're trying to move, alright? Four years ago we did the same demo, and we would say, okay this driver has, you know, we would show that this driver had an issue on this route, but now, within the world, we can actually predict and let you know to take preventive measures up front. Similarly internally, you know, you can take things from, you know, mission-learning, and log analytics, and so on, we have a internal problem, you know, where we have to test two different versions of HDP itself, and as you can imagine, it's a really, really hard problem. We have the support, 10 operating systems, seven databases, like, if you multiply that matrix, it's, you know, tens of thousands of options. So, if you do all that testing, we now use mission-learning internally, to look through the logs, and kind of predict where the failures were, and help our own, sort of, software engineers understand where the problems were, right? An extension of that has been, you know, the work we've done in Smartsense, which is a service we offer our enterprise customers. We collect logs from their Hadoop clusters, and then they can actually help them understand where they can either tune their applications, or even tune their hardware, right? They might have a, you know, we have this example I really like where at a really large enterprise Financial Services client, they had literally, you know, hundreds and, you know, and thousands of machines on HDP, and we, using Smartsense, we actually found that there were 25 machines which had bad NIC configuration, and we proved to them that by fixing those, we got a 30% to put back on their cluster. At that scale, it's a lot of money, it's a lot of cap, it's a lot of optics So, as a company, we try to ourselves, as much as we, kind of, try to help our customers adopt it, that make sense? >> Yeah, let's drill down on that even a little more, cause it's pretty easy to understand what's the standard telemetry you would want out of hardware, but as you, sort of, move up the stack the metrics, I guess, become more custom. So how do you learn, not just from one customer, but from many customers especially when you can't standardize what you're supposed to pull out of them? >> Yeah so, we're sort of really big believers in, sort of, doctoring your own stuff, right? So, we talk about the notion of data lake, we actually run a Smartsense data lake where we actually get data across, you know, the hundreds of of our customers, and we can actually do predictive mission-learning on that data in our own data lake. Right? And to your point about how we go up the stack, this is, kind of, where we feel like we have a natural advantage because we work on all the layers, whether it's the sequel engine, or the storage engine, or, you know, above and beyond the hardware. So, as we build these models, we understand that we need more, or different, telemetry right? And we put that back into the product so the next version of HDP will have that metrics that we wanted. And, now we've been doing this for a couple of years, which means we've done three, four, five turns of the crank, obviously something we always get better at, but I feel like, compared to where we were a couple of years ago when Smartsense first came out, it's actually matured quite a lot, from that perspective. >> So, there's a couple different paths you can add to this, which is customers might want, as part of their big data workloads, some non-Hortonworks, you know, services or software when it's on-prem, and then can you also extend this management to the Cloud if they want to hybrid setup where, in the not too distant future, the Cloud vendor will be also a provider for this type of management. >> So absolutely, in fact it's true today when, you know, we work with, you know, Microsoft's a great partner of ours. We work with them to enable Smartsense on HDI, which means we can actually get the same telemetry back, whether you're running the data on an on-prem HDP, or you're running this on HDI. Similarly, we shipped a version of our Cloud product, our Hortonworks Data Cloud, on Amazon and again Smartsense preplanned there, so whether you're on an Amazon, or a Microsoft, or on-prem, we get the same telemetry, we get the same data back. We can actually, if you're a customer using many of these products, we can actually give you that telemetry back. Similarly, if you guys probably know this we have, you were probably there in an analyst when they announced the Flex Support subscription, which means that now we can actually take the support subscription you have to get from Hortonworks, and you can actually use it on-prem or on the Cloud. >> So in terms of transforming, HDP for example, just want to make sure I'm understanding this, you're pulling in data from customers to help evolve the product, and that data can be on-prem, it can be in a Microsoft lesur, it can be an AWS? >> Exactly. The HDP can be running in any of these, we will actually pull all of them to our data lake, and they actually do the analytics for us and then present it back to the customers. So, in our support subscription, the way this works is we do the analytics in our lake, and it pushes it back, in fact to our support team tickets, and our sales force, and all the support mechanisms. And they get a set of recommendations saying Hey, we know this is the work loads you're running, we see these are the opportunities for you to do better, whether it's tuning a hardware, tuning an application, tuning the software, we sort of send the recommendations back, and the customer can go and say Oh, that makes sense, the accept that and we'll, you know, we'll update the recommendation for you automatically. Then you can have, or you can say Maybe I don't want to change my kernel pedometers, let's have a conversation. And if the customer, you know, is going through with that, then they can go and change it on their own. We do that, sort of, back and forth with the customer. >> One thing that just pops into my mind is, we talked a lot yesterday about data governance, are there particular, and also yesterday on stage were >> Arun: With IBM >> Yes exactly, when we think of, you know, really data-intensive industries, retail, financial services, insurance, healthcare, manufacturing, are there particular industries where you're really leveraging this, kind of, bi-directional, because there's no governance restrictions, or maybe I shouldn't say none, but. Give us a sense of which particular industries are really helping to fuel the evolution of Hortonworks data lake. >> So, I think healthcare is a great example. You know, when we started off, sort of this open-source project, or an atlas, you know, a couple of years ago, we got a lot of traction in the healthcare sort of insurance industry. You know, folks like Aetna were actually founding members of that, you know, sort of consortium of doing this, right? And, we're starting to see them get a lot of leverage, all of this. Similarly now as we go into, you know, Europe and expand there, things like GDPR, are really, really being pardoned, right? And, you guys know GDPR is a really big deal. Like, you pay, if you're not compliant by, I think it's like March of next year, you pay a portion of your revenue as fines. That's, you know, big money for everybody. So, I think that's what we're really excited about the portion with IBM, because we feel like the two of us can help a lot of customers, especially in countries where they're significantly, highly regulated, than the United States, to actually get leverage our, sort of, giant portfolio of products. And IBM's been a great company to atlas, they've adopted wholesale as you saw, you know, in the announcements yesterday. >> So, you're doing a Keynote tomorrow, so give us maybe the top three things, you're giving the Keynote on Data Lake 3.0, walk us through the evolution. Data Lakes 1.0, 2.0, 3.0, where you are now, and what folks can expect to hear and see in your Keynote. >> Absolutely. So as we've, kind of, continued to work with customers and we see the maturity model of customers, you know, initially people are staying up a data lake, and then they'd want, you know, sort of security, basic security what it covers, and so on. Now, they want governance, and as we're starting to go to that journey clearly, our customers are pushing us to help them get more value from the data. It's not just about putting the data lake, and obviously managing data with governance, it's also about Can you help us, you know, do mission-learning, Can you help us build other apps, and so on. So, as we look to there's a fundamental evolution that, you know, Hadoop legal system had to go through was with advance of technologies like, you know, a Docker, it's really important first to help the customers bring more than just workloads, which are sort of native to Hadoop. You know, Hadoop started off with MapReduce, obviously Spark's went great, and now we're starting to see technologies like Flink coming, but increasingly, you know, we want to do data science. To mass market data science is obviously, you know, people, like, want to use Spark, but the mass market is still Python, and R, and so on, right? >> Lisa: Non-native, okay. >> Non-native. Which are not really built, you know, these predate Hadoop by a long way, right. So now as we bring these applications in, having technology like Docker is really important, because now we can actually containerize these apps. It's not just about running Spark, you know, running Spark with R, or running Spark with Python, which you can do today. The problem is, in a true multi-tenant governed system, you want, not just R, but you want specifics of a libraries for R, right. And the libraries, you know, George wants might be completely different than what I want. And, you know, you can't do a multi-tenant system where you install both of them simultaneously. So Docker is a really elegant solution to problems like those. So now we can actually bring those technologies into a Docker container, so George's Docker containers will not, you know, conflict with mine. And you can actually go to the races, you know after the races, we're doing data signs. Which is really key for technologies like DSX, right? Because with DSX if you see, obviously DSX supports Spark with technologies like, you know, Zeppelin which is a front-end, but they also have Jupiter, which is going to work the mass market users for Python and R, right? So we want to make sure there's no friction whether it's, sort of, the guys using Spark, or the guys using R, and equally importantly DSX, you know, in the short map will also support things like, you know, the classic IBM portfolio, SBSS and so on. So bringing all of those things in together, making sure they run with data in the data lake, and also the computer in the data lake, is really big for us. >> Wow, so it sounds like your Keynote's going to be very educational for the folks that are attending tomorrow, so last question for you. One of the themes that occurred in the Keynote this morning was sharing a fun-fact about these speakers. What's a fun-fact about Arun Murthy? >> Great question. I guess, you know, people have been looking for folks with, you know, 10 years of experience on Hadoop. I'm here finally, right? There's not a lot of people but, you know, it's fun to be one of those people who've worked on this for about 10 years. Obviously, I look forward to working on this for another 10 or 15 more, but it's been an amazing journey. >> Excellent. Well, we thank you again for sharing time again with us on theCUBE. You've been watching theCUBE live on day 2 of the Dataworks Summit, hashtag DWS17, for my co-host George Gilbert. I am Lisa Martin, stick around we've got great content coming your way.

Published Date : Jun 14 2017

SUMMARY :

Brought to you by Hortonworks. We are live at day 2 of the DataWorks Summit, and Rob said, you know, one of the interesting and we're starting to show them, you know, when you can't standardize what you're or the storage engine, or, you know, some non-Hortonworks, you know, services when, you know, we work with, you know, And if the customer, you know, Yes exactly, when we think of, you know, Similarly now as we go into, you know, Data Lakes 1.0, 2.0, 3.0, where you are now, with advance of technologies like, you know, And the libraries, you know, George wants One of the themes that occurred in the Keynote this morning There's not a lot of people but, you know, Well, we thank you again for sharing time again

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
30%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
25 machines	QUANTITY	0.99+
10 operating systems	QUANTITY	0.99+
hundreds	QUANTITY	0.99+
Arun Murthy	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
two	QUANTITY	0.99+
Aetna	ORGANIZATION	0.99+
10 years	QUANTITY	0.99+
Arun	PERSON	0.99+
today	DATE	0.99+
Spark	TITLE	0.99+
yesterday	DATE	0.99+
AWS	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.99+
last year	DATE	0.99+
Four years ago	DATE	0.99+
15	QUANTITY	0.99+
tomorrow	DATE	0.99+
CUBE	ORGANIZATION	0.99+
three	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
seven databases	QUANTITY	0.98+
four	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
United States	LOCATION	0.98+
Dataworks Summit	EVENT	0.98+
10	QUANTITY	0.98+
Europe	LOCATION	0.97+
10 companies	QUANTITY	0.97+
One	QUANTITY	0.97+
one customer	QUANTITY	0.97+
thousands of machines	QUANTITY	0.97+
about 10 years	QUANTITY	0.96+
GDPR	TITLE	0.96+
Docker	TITLE	0.96+
Smartsense	ORGANIZATION	0.96+
about 12 years	QUANTITY	0.95+
this morning	DATE	0.95+
each	QUANTITY	0.95+
two different versions	QUANTITY	0.95+
five turns	QUANTITY	0.94+
R	TITLE	0.93+
four meta-trains	QUANTITY	0.92+
day 2	QUANTITY	0.92+
Data Lakes 1.0	COMMERCIAL_ITEM	0.92+
Flink	ORGANIZATION	0.91+
first	QUANTITY	0.91+
HDP	ORGANIZATION	0.91+

Jamie Engesser, Hortonworks & Madhu Kochar, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering DataWorks Summit 2017, brought to you by Hortonworks. (digitalized music) >> Welcome back to theCUBE. We are live at day one of the DataWorks Summit, in the heart of Silicon Valley. I'm Lisa Martin with theCUBE; my co-host George Gilbert. We're very excited to be joined by our two next guests. Going to be talking about a lot of the passion and the energy that came from the keynote this morning and some big announcements. Please welcome Madhu Kochar, VP of analytics and product development and client success at IBM, and Jamie Engesser, VP of product management at Hortonworks. Welcome guys! >> Thank you. >> Glad to be here. >> First time on theCUBE, George and I are thrilled to have you. So, in the last six to eight months doing my research, there's been announcements between IBM and Hortonworks. You guys have been partners for a very long time, and announcements on technology partnerships with servers and storage, and presumably all of that gives Hortonworks Jamie, a great opportunity to tap into IBM's enterprise install base, but boy today? Socks blown off with this big announcement between IBM and Hortonworks. Jamie, kind of walk us through that, or sorry Madhu I'm going to ask you first. Walk us through this announcement today. What does it mean for the IBM-Hortonworks partnership? Oh my God, what an exciting, exciting day right? We've been working towards this one, so three main things come out of the announcement today. First is really the adoption by Hortonworks of IBM data sciences machine learning. As you heard in the announcement, we brought the machine learning to our mainframe where the most trusted data is. Now bringing that to the open source, big data on Hadoop, great right, amazing. Number two is obviously the whole aspects around our big sequel, which is bringing the complex-query analytics, where it brings all the data together from all various sources and making that as HDP and Hadoop and Hortonworks and really adopting that amazing announcement. Number three, what we gain out of this humongously, obviously from an IBM perspective is the whole platform. We've been on this journey together with Hortonworks since 2015 with ODPI, and we've been all champions in the open source, delivering a lot of that. As we start to look at it, it makes sense to merge that as a platform, and give to our clients what's most needed out there, as we take our journey towards machine learning, AI, and enhancing the enterprise data warehousing strategy. >> Awesome, Jamie from your perspective on the product management side, what is this? What's the impact and potential downstream, great implications for Hortonworks? >> I think there's two things. I think Hortonworks has always been very committed to the open source community. I think with Hortonworks and IBM partnering on this, number one is it brings a much bigger community to bear, to really push innovation on top of Hadoop. That innovation is going to come through the community, and I think that partnership drives two of the biggest contributors to the community to do more together. So I think that's number one is the community interest. The second thing is when you look at Hadoop adoption, we're seeing that people want to get more and more value out of Hadoop adoption, and they want to access more and more data sets, to number one get more and more value. We're seeing the data science platform become really fundamental to that. They're also seeing the extension to say, not only do I need data science to get and add new insights, but I need to aggregate more data. So we're also seeing the notion of, how do I use big sequel on top of Hadoop, but then I can federate data from my mainframe, which has got some very valuable data on it. DB2 instances and the rest of the data repositories out there. So now we get a better federation model, to allow our customers to access more of the data that they can make better business decisions on, and they can use data science on top of that to get new learnings from that data. >> Let me build on that. Let's say that I'm a Telco customer, and the two of you come together to me and say, we don't want to talk to you about Hadoop. We want to talk to you about solving a problem where you've got data in applications and many places, including inaccessible stuff. You have a limited number of data scientists, and the problem of cleaning all the data. Even if you build models, the challenge of integrating them with operational applications. So what do the two of you tell me the Telco customer? >> Yeah, so maybe I'll go first. So the Telco, the main use case or the main application as I've been talking to many of the largest Telco companies here in U.S. and even outside of U.S. is all about their churn rate. They want to know when the calls are dropping, why are they dropping, why are the clients going to the competition and such? There's so much data. The data is just streaming and they want to understand that. I think if you bring the data science experience and machine learning to that data. That as said, it doesn't matter now where the data resides. Hadoop, mainframes, wherever, we can bring that data. You can do a transformation of that, cleanup the data. The quality of the data is there so that you can start feeding that data into the models and that's when the models learn. More data it is, the better it is, so they train, and then you can really drive the insights out of it. Now data science the framework, which is available, it's like a team sport. You can bring in many other data scientists into the organization who could have different analyst reports to go render for or provide results into. So being a team support, being a collaboration, bringing together with that clean data, I think it's going to change the world. I think the business side can have instant value from the data they going to see. >> Let me just test the edge conditions on that. Some of that data is streaming and you might apply the analytics in real time. Some of it is, I think as you were telling us before, sort of locked up as dark data. The question is how much of that data, the streaming stuff and the dark data, how much do you have to land in a Hadoop repository versus how much do you just push the analytics out too and have it inform a decision? >> Maybe I can take a first thought on it. I think there's a couple things in that. There's the learnings, and then how do I execute the learnings? I think the first step of it is, I tend to land the data, and going to the Telecom churn model, I want to see all the touch points. So I want to see the person that came through the website. He went into the store, he called into us, so I need to aggregate all that data to get a better view of what's the chain of steps that happened for somebody to churn? Once I end up diagnosing that, go through the data science of that, to learn the models that are being executed on that data, and that's the data at rest. What I want to do is build the model out so that now I can take that model, and I can prescriptively run it in this stream of data. So I know that that customer just hung up off the phone, now he walked in the store and we can sense that he's in the store because we just registered that he's asking about his billing details. The system can now dynamically diagnose by those two activities that this is a churn high-rate, so notify that teller in the store that there's a chance of him rolling out. If you look at that, that required the machine learning and data science side to build the analytical model, and it required the data-flow management and streaming analytics to consume that model to make a real-time insight out of it, to ultimately stop the churn from happening. Let's just give the customer a discount at the end of the day. That type of stuff; so you need to marry those two. >> It's interesting, you articulated that very clearly. Although then the question I have is now not on the technical side, but on the go-to market side. You guys have to work very very closely, and this is calling at a level that I assume is not very normal for Hortonworks, and it's something that is a natural sales motion for IBM. >> So maybe I'll first speak up, and then I'll let you add some color to that. When I look at it, I think there's a lot of natural synergies. IBM and Hortonworks have been partnered since day one. We've always continued on the path. If you look at it, and I'll bring up community again and open source again, but we've worked very well in the community. I think that's incubated a really strong and fostered a really strong relationship. I think at the end of the day we both look at what's going to be the outcome for the customer and working back from that, and we tend to really engage at that level. So what's the outcome and then how do we make a better product to get to that outcome? So I think there is a lot of natural synergies in that. I think to your point, there's lots of pieces that we need to integrate better together, and we will join that over time. I think we're already starting with the data science experience. A bunch of integration touchpoints there. I think you're going to see in the information governance space, with Atlas being a key underpinning and information governance catalog on top of that, ultimately moving up to IBM's unified governance, we'll start getting more synergies there as well and on the big sequel side. I think when you look at the different pods, there's a lot of synergies that our customers will be driving and that's what the driving factors, along with the organizations are very well aligned. >> And VPF engineering, so there's a lot of integration points which were already identified, and big sequel is already working really well on the Hortonworks HDP platform. We've got good integration going, but I think more and more on the data science. I think in end of the day we end up talking to very similar clients, so going as a joined go-to market strategy, it's a win-win. Jamie and I were talking earlier. I think in this type of a partnership, A our community is winning and our clients, so really good solutions. >> And that's what it's all about. Speaking of clients, you gave a great example with Telco. When we were talking to Rob Thomas and Rob Bearden earlier on in the program today. They talked about the data science conversation is at the C-suite, so walk us through an example of whether it's a Telco or maybe a healthcare organization, what is that conversation that you're having? How is a Telco helping foster what was announced today and this partnership? >> Madhu: Do you want to take em? >> Maybe I'll start. When we look in a Telco, I think there's a natural revolution, and when we start looking at that problem of how does a Telco consume and operate data science at a larger scale? So at the C-suite it becomes a people-process discussion. There's not a lot of tools currently that really help the people and process side of it. It's kind of an artist capability today in the data science space. What we're trying to do is, I think I mentioned team sport, but also give the tooling to say there's step one, which is we need to start learning and training the right teams and the right approach. Step two is start giving them access to the right data, etcetera to work through that. And step three, giving them all the tooling to support that, and tooling becomes things like TensorFlow etcetera, things like Zeppelin, Jupiter, a bunch of the open source community evolved capabilities. So first learn and training. The second step in that is give them the access to the right data to consume it, and then third, give them the right tooling. I think those three things are helping us to drive the right capabilities out of it. But to your point, elevating up to the C-suite. It's really they think people-process, and I think giving them the right tooling for their people and the right processes to get them there. Moving data science from an art to a science, is I would argue at a top level. >> On the client success side, how instrumental though are your clients, like maybe on the Telco side, in actually fostering the development of the technology, or helping IBM make the decision to standardize on HDP as their big data platform? >> Oh, huge, huge, a lot of our clients, especially as they are looking at the big data. Many of them are actually helping us get committers into the code. They're adding, providing; feet can't move fast enough in the engineering. They are coming up and saying, "Hey we're going to help" "and code up and do some code development with you." They've been really pushing our limits. A lot of clients, actually I ended up working with on the Hadoop site is like, you know for example. My entire information integration suite is very much running on top of HDP today. So they are saying, OK what's next? We want to see better integration. So as I called a few clients yesterday saying, "Hey, under embargo this is something going to get announced." Amazing, amazing results, and they're just very excited about this. So we are starting to get a lot of push, and actually the clients who do have large development community as well. Like a lot of banks today, they write a lot of their own applications. We're starting to see them co-developing stuff with us and becoming the committers. >> Lisa: You have a question? >> Well, if I just were to jump in. How do you see over time the mix of apps starting to move from completely custom developed, sort of the way the original big data applications were all written, down to the medal-ep in MapReduce. For shops that don't have a lot of data scientists, how are we going to see applications become more self-service, more pre-packaged? >> So maybe I'll give a little bit of perspective. Right now I think IBM has got really good synergies on what I'll call vertical solutions to vertical organizations, financial, etcetera. I would say, Hortonworks has took a more horizontal approach. We're more of a platform solution. An example of one where it's kind of marrying the two, is if you move up the stack from Hortonworks as a platform to the next level up, which is Hortonworks as a solution. One of the examples that we've invested heavily in is cybersecurity, and in an Apache project called Metron. Less about Metron and more about cybersecurity. People want to solve a problem. They want to defend an attacker immediately, and what that means is we need to give them out-of-the-box models to detect a lot of common patterns. What we're doing there, is we're investing in some of the data science and pre-packaged models to identify attack vectors and then try to resolve that or at least notify you that there's a concern. It's an example where the data science behind it, pre-packaging that data science to solve a specific problem. That's in the cybersecurity space and that case happens to be horizontal where Hortonwork's strength is. I think in the IBM case, there's a lot more vertical apps that we can apply to. Fraud, adjudication, etcetera. >> So it sounds like we're really just hitting the tip of the iceberg here, with the potential. We want to thank you both for joining us on theCUBE today, sharing your excitement about this deepening, expanding partnership between Hortonworks and IBM. Madhu and Jamie, thank you so much for joining George and I today on theCUBE. >> Thank you. >> Thank you Lisa and George. >> Appreciate it. >> Thank you. >> And for my co-host George Gilbert, I am Lisa Martin. You're watching us live on theCUBE, from day one of the DataWorks Summit in Silicon Valley. Stick around, we'll be right back. (digitalized music)

Published Date : Jun 14 2017

SUMMARY :

brought to you by Hortonworks. that came from the keynote this morning So, in the last six to eight months doing my research, of the biggest contributors to the community and the two of you come together to me and say, from the data they going to see. and you might apply the analytics in real time. and data science side to build the analytical model, and it's something that is a natural sales motion for IBM. and on the big sequel side. I think in end of the day we end up talking They talked about the data science conversation is of the open source community evolved capabilities. and actually the clients who do have sort of the way the original big data applications of the data science and pre-packaged models of the iceberg here, with the potential. from day one of the DataWorks Summit in Silicon Valley.

ENTITIES

Entity	Category	Confidence
Jamie	PERSON	0.99+
Telco	ORGANIZATION	0.99+
Madhu	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jamie Engesser	PERSON	0.99+
Madhu Kochar	PERSON	0.99+
Rob Bearden	PERSON	0.99+
George	PERSON	0.99+
Lisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
U.S.	LOCATION	0.99+
second step	QUANTITY	0.99+
First	QUANTITY	0.99+
third	QUANTITY	0.99+
yesterday	DATE	0.99+
first step	QUANTITY	0.99+
two activities	QUANTITY	0.99+
San Jose	LOCATION	0.99+
second thing	QUANTITY	0.99+
Hortonwork	ORGANIZATION	0.99+
2015	DATE	0.99+
first	QUANTITY	0.99+
first thought	QUANTITY	0.98+
two things	QUANTITY	0.98+
eight months	QUANTITY	0.98+
three things	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
DataWorks Summit	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.97+
two next guests	QUANTITY	0.97+
both	QUANTITY	0.97+
Hadoop	TITLE	0.97+
Apache	ORGANIZATION	0.97+

George Chow, Simba Technologies - DataWorks Summit 2017

>> (Announcer) Live from San Jose, in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi everybody, this is George Gilbert, Big Data and Analytics Analyst with Wikibon. We are wrapping up our show on theCUBE today at DataWorks 2017 in San Jose. It has been a very interesting day, and we have a special guest to help us do a survey of the wrap-up, George Chow from Simba. We used to call him Chief Technology Officer, now he's Technology Fellow, but when we was explaining the different in titles to me, I thought he said Technology Felon. (George Chow laughs) But he's since corrected me. >> Yes, very much so >> So George and I have been, we've been looking at both Spark Summit last week and DataWorks this week. What are some of the big advances that really caught your attention? >> What's caught my attention actually is how much manufacturing has really, I think, caught into the streaming data. I think last week was very notable that both Volkswagon and Audi actually had case studies for how they're using streaming data. And I think just before the break now, there was also a similar session from Ford, showcasing what they are doing around streaming data. >> And are they using the streaming analytics capabilities for autonomous driving, or is it other telemetry that they're analyzing? >> The, what is it, I think the Volkswagon study was production, because I still have to review the notes, but the one for Audi was actually quite interesting because it was for managing paint defect. >> (George Gilbert) For paint-- >> Paint defect. >> (George Gilbert) Oh. >> So what they were doing, they were essentially recording the environmental condition that they were painting the cars in, basically the entire pipeline-- >> To predict when there would be imperfections. >> (George Chow) Yes. >> Because paint is an extremely high-value sort of step in the assembly process. >> Yes, what they are trying to do is to essentially make a connection between downstream defect, like future defect, and somewhat trying to pinpoint the causes upstream. So the idea is that if they record all the environmental conditions early on, they could turn around and hopefully figure it out later on. >> Okay, this sounds really, really concrete. So what are some of the surprising environmental variables that they're tracking, and then what's the technology that they're using to build model and then anticipate if there's a problem? >> I think the surprising finding they said were actually, I think it was a humidity or fan speed, if I recall, at the time when the paint was being applied, because essentially, paint has to be... Paint is very sensitive to the condition that is being applied to the body. So my recollection is that one of the finding was that it was a narrow window during which the paint were, like, ideal, in terms of having the least amount of defect. >> So, had they built a digital twin style model, where it's like a digital replica of some aspects of the car, or was it more of a predictive model that had telemetry coming at it, and when it's an outside a certain bounds they know they're going to have defects downstream? >> I think they're still working on the predictive model, or actually the model is still being built, because they are essentially trying to build that model to figure out how they should be tuning the production pipeline. >> Got it, so this is sort of still in the development phase? >> (George Chow) Yeah, yeah >> And can you tell us, did they talk about the technologies that they're using? >> I remember the... It's a little hazy now because after a couple weeks of conference, so I don't remember the specifics because I was counting on the recordings to come out in a couples weeks' time. So I'll definitely share that. It's a case study to keep an eye on. >> So tell us, were there other ones where this use of real-time or near real-time data had some applications that we couldn't do before because we now can do things with very low latency? >> I think that's the one that I was looking forward to with Ford. That was the session just earlier, I think about an hour ago. The session actually consisted of a demo that was being done live, you know. It was being streamed to us where they were showcasing the data that was coming off a car that's been rigged up. >> So what data were they tracking and what were they trying to anticipate here? >> They didn't give enough detail, but it was basically data coming off of the CAN bus of the car, so if anybody is familiar with the-- >> Oh that's right, you're a car guru, and you and I compare, well our latest favorite is the Porche Macan >> Yes, yes. >> SUV, okay. >> But yeah, they were looking at streaming the performance data of the car as well as the location data. >> Okay, and... Oh, this sounds more like a test case, like can we get telemetry data that might be good for insurance or for... >> Well they've built out the system enough using the Lambda Architecture with Kafka, so they were actually consuming the data in real-time, and the demo was actually exactly seeing the data being ingested and being acted on. So in the case they were doing a simplistic visualization of just placing the car on the Google Map so you can basically follow the car around. >> Okay so, what was the technical components in the car, and then, how much data were they sending to some, or where was the data being sent to, or how much of the data? >> The data was actually sent, streamed, all the way into Ford's own data centers. So they were using NiFi with all the right proxy-- >> (George Gilbert) NiFi being from Hortonworks there. >> Yeah, yeah >> The Hortonworks data flow, okay >> Yeah, with all the appropriate proxys and firewall to bring it all the way into a secure environment. >> Wow >> So it was quite impressive from the point of view of, it was life data coming off of the 4G modem, well actually being uploaded through the 4G modem in the car. >> Wow, okay, did they say how much compute and storage they needed in the device, in this case the car? >> I think they were using a very lightweight platform. They were streaming apparently from the Raspberry Pi. >> (George Gilbert) Oh, interesting. >> But they were very guarded about what was inside the data center because, you know, for competitive reasons, they couldn't share much about how big or how large a scale they could operate at. >> Okay, so Simba has been doing ODBC and JDBC drivers to standard APIs, to databases for a long time. That was all about, that was an era where either it was interactive or batch. So, how is streaming, sort of big picture, going to change the way applications are built? >> Well, one way to think about streaming is that if you look at many of these APIs, into these systems, like Spark is a good example, where they're trying to harmonize streaming and batch, or rather, to take away the need to deal with it as a streaming system as opposed to a batch system, because it's obviously much easier to think about and reason about your system when it is traditional, like in the traditional batch model. So, the way that I see it also happening is that streaming systems will, you could say will adapt, will actually become easier to build, and everyone is trying to make it easier to build, so that you don't have to think about and reason about it as a streaming system. >> Okay, so this is really important. But they have to make a trade-off if they do it that way. So there's the desire for leveraging skill sets, which were all batch-oriented, and then, presumably SQL, which is a data manipulation everyone's comfortable with, but then, if you're doing it batch-oriented, you have a portion of time where you're not sure you have the final answer. And I assume if you were in a streaming-first solution, you would explicitly know whether you have all the data or don't, as opposed to late arriving stuff, that might come later. >> Yes, but what I'm referring to is actually the programming model. All I'm saying is that more and more people will want streaming applications, but more and more people need to develop it quickly, without having to build it in a very specialized fashion. So when you look at, let's say the example of Spark, when they focus on structured streaming, the whole idea is to make it possible for you to develop the app without having to write it from scratch. And the comment about SQL is actually exactly on point, because the idea is that you want to work with the data, you can say, not mindful, not with a lot of work to account for the fact that it is actually streaming data that could arrive out of order even, so the whole idea is that if you can build applications in a more consistent way, irrespective whether it's batch or streaming, you're better off. >> So, last week even though we didn't have a major release of Spark, we had like a point release, or a discussion about the 2.2 release, and that's of course very relevant for our big data ecosystem since Spark has become the compute engine for it. Explain the significance where the reaction time, the latency for Spark, went down from several hundred milliseconds to one millisecond or below. What are the implications for the programming model and for the applications you can build with it. >> Actually, hitting that new threshold, the millisecond, is actually a very important milestone because when you look at a typical scenario, let's say with AdTech where you're serving ads, you really only have, maybe, on the order about 100 or maybe 200 millisecond max to actually turn around. >> And that max includes a bunch of things, not just the calculation. >> Yeah, and that, let's say 100 milliseconds, includes transfer time, which means that in your real budget, you only have allowances for maybe, under 10 to 20 milliseconds to compute and do any work. So being able to actually have a system that delivers millisecond-level performance actually gives you ability to use Spark right now in that scenario. >> Okay, so in other words, now they can claim, even if it's not per event processing, they can claim that they can react so fast that it's as good as per event processing, is that fair to say? >> Yes, yes that's very fair. >> Okay, that's significant. So, what type... How would you see applications changing? We've only got another minute or two, but how do you see applications changing now that, Spark has been designed for people that have traditional, batch-oriented skills, but who can now learn how to do streaming, real-time applications without learning anything really new. How will that change what we see next year? >> Well I think we should be careful to not pigeonhole Spark as something built for batch, because I think the idea is that, you could say, the originators, of Spark know that it's all about the ease of development, and it's the ease of reasoning about your system. It's not the fact that the technology is built for batch, so the fact that you could use your knowledge and experience and an API that actually is familiar, should leverage it for something that you can build for streaming. That's the power, you could say. That's the strength of what the Spark project has taken on. >> Okay, we're going to have to end it on that note. There's so much more to go through. George, you will be back as a favorite guest on the show. There will be many more interviews to come. >> Thank you. >> With that, this is George Gilbert. We are DataWorks 2017 in San Jose. We had a great day today. We learned a lot from Rob Bearden and Rob Thomas up front about the IBM deal. We had Scott Gnau, CTO of Hortonworks on several times, and we've come away with an appreciation for a partnership now between IBM and Hortonworks that can take the two of them into a set of use cases that neither one on its own could really handle before. So today was a significant day. Tune in tomorrow, we have another great set of guests. Keynotes start at nine, and our guests will be on starting at 11. So with that, this is George Gilbert, signing out. Have a good night. (energetic, echoing chord and drum beat)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, do a survey of the wrap-up, What are some of the big advances caught into the streaming data. but the one for Audi was actually quite interesting in the assembly process. So the idea is that if they record So what are some of the surprising environmental So my recollection is that one of the finding or actually the model is still being built, of conference, so I don't remember the specifics the data that was coming off a car the performance data of the car for insurance or for... So in the case they were doing a simplistic visualization So they were using NiFi with all the right proxy-- to bring it all the way into a secure environment. So it was quite impressive from the point of view of, I think they were using a very lightweight platform. the data center because, you know, for competitive reasons, going to change the way applications are built? so that you don't have to think about and reason about it But they have to make a trade-off if they do it that way. so the whole idea is that if you can build and for the applications you can build with it. because when you look at a typical scenario, not just the calculation. So being able to actually have a system that delivers but how do you see applications changing now that, so the fact that you could use your knowledge There's so much more to go through. that can take the two of them

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
Scott Gnau	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Audi	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
San Jose	LOCATION	0.99+
George Chow	PERSON	0.99+
Ford	ORGANIZATION	0.99+
last week	DATE	0.99+
Silicon Valley	LOCATION	0.99+
one millisecond	QUANTITY	0.99+
two	QUANTITY	0.99+
next year	DATE	0.99+
100 milliseconds	QUANTITY	0.99+
200 millisecond	QUANTITY	0.99+
today	DATE	0.99+
tomorrow	DATE	0.99+
Volkswagon	ORGANIZATION	0.99+
this week	DATE	0.99+
Google Map	TITLE	0.99+
AdTech	ORGANIZATION	0.99+
DataWorks 2017	EVENT	0.98+
DataWorks Summit 2017	EVENT	0.98+
both	QUANTITY	0.98+
11	DATE	0.98+
Spark	TITLE	0.98+
Wikibon	ORGANIZATION	0.96+
under 10	QUANTITY	0.96+
one	QUANTITY	0.96+
20 milliseconds	QUANTITY	0.95+
Spark Summit	EVENT	0.94+
first solution	QUANTITY	0.94+
SQL	TITLE	0.93+
hundred milliseconds	QUANTITY	0.93+
2.2	QUANTITY	0.92+
one way	QUANTITY	0.89+
Spark	ORGANIZATION	0.88+
Lambda Architecture	TITLE	0.87+
Kafka	TITLE	0.86+
minute	QUANTITY	0.86+
Porche Macan	ORGANIZATION	0.86+
about 100	QUANTITY	0.85+
ODBC	TITLE	0.84+
DataWorks	EVENT	0.84+
NiFi	TITLE	0.84+
about an hour ago	DATE	0.8+
JDBC	TITLE	0.79+
Raspberry Pi	COMMERCIAL_ITEM	0.76+
Simba	ORGANIZATION	0.75+
Simba Technologies	ORGANIZATION	0.74+
couples weeks'	QUANTITY	0.7+
CTO	PERSON	0.68+
theCUBE	ORGANIZATION	0.67+
twin	QUANTITY	0.67+
couple weeks	QUANTITY	0.64+

Sri Raghavan, Teradata - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017. Brought to you by Hortonworks. (electronic music fading) >> Hi everybody, this is George Gilbert. We're watching theCUBE. We're at DataWorks 2017 with my good friend Sri Raghavan from Teradata, and Sri, let's kick this off. Tell us, bring us up to date with what Teradata's been doing in the era of big data and advanced analytics. >> First of all, George, it's always great to be back with you. I've done this before with you, and it's a pleasure coming back, and I always have fun doing this. So thanks for having me and Teradata on theCUBE. So, a lot of things have been going on at Teradata. As you know, we are the pioneer in the enterprise data warehouse space. We've been so for the past 25 plus years, and, you know, we've got an incredible amount of goodwill in the marketplace with a lot of our key customers and all that. And as you also know, in the last, you know, five or seven years or so, between five and seven years, we've actually expanded our portfolio significantly to go well beyond the enterprise data warehouse into advanced analytics. We've got solutions for the quote-unquote the big data, advanced analytics space. We've acquired organizations which have significant amount of core competence with enormous numbers of years of experience of people who can deliver us solutions and services. So it's fair to say, as an understatement, that we have, we've come a long way in terms of being a very formidable competitor in the marketplace with the kinds of, not only our core enterprise data warehouse solutions, but also advanced analytics solutions, both as products and solutions and services that we have developed over time. >> So I was at the Influencer Summit, not this year but the year before, and the thing, what struck me was you guys articulated very consistently and clearly the solutions that people build with the technology as opposed to just the technology. Let's pick one, like Customer Journey that I remember that was used last year. >> Sri: Right. >> And tell us, sort of, what are the components in it, and, sort of, what are the outcomes you get using it? >> Sure. First of all, thanks for picking on that point because it's a very important point that you mentioned, right? It's not- in today's world, it can't just be about the technology. We just can't go on and articulate things around our technology and the core competence, but we also have to make a very legitimate case for delivering solutions to the business. So, our, in fact, our motto is: Business solutions that are technology-enabled. We have a strong technology underpinning to be able to deliver solutions like Customer Journey. Let me give you a view into what Customer Journey is all about, right? So the idea of the Customer Journey, it's actually pretty straightforward. It's about being able to determine the kind of experience a customer is having as she or he engages with you across the various channels that they do business with you at. So it could be directly they come into the store, it could be online, it could be through snail mail, email, what have you. The point is not to look at Customer Journey as a set of disparate channels through which they interact with you, but to look at it holistically. Across the various areas of encounters they have with you and engagements they have with you, how do you determine what their overall experience is, and, more importantly, once you determine what their overall experience is, how can you have certain kinds of treatments that are very specific to the different parts of the experience and make their experience and engagement even better? >> Okay, so let me jump in for a second there. >> We've seen a lot of marketing automation companies come by and say, you know, or come and go having said over many generations, "We can help you track that." And they all seem to, like, target either ads or email. >> Correct. >> There's like, the touchpoints are constrained. How do you capture a broader, you know, a broader journey? >> Yeah, to me it's not just the touchpoints being constrained, although all the touchpoints are constrained. To me, it's almost as if those touchpoints are looked at very independently, and it's very orthogonal too, right? I look at only my online experience versus a store experience versus something else, right? And the assumption in most cases is that they're all not related. You know, sometimes, I may not come directly to the store, right, but the reason why I'm not coming to the store is because, to buy things, because, you know, I have seen an advertisement somewhere which says, "Look, go online and purchase a product." So whatever the case might be, the point is each part of the journey is very interrelated, and you need to understand this is as well. Now, the question that you asked is, "How do you, for instance, collect all this information? "Where do you store it?" >> George: And how do you relate it ... >> And, exactly, and how do you connect the various points of interaction, right? So for one thing, and let me just, sort of, go a little bit tangential and go into some architecture, the marchitecture, if you will that allows us to be able to, first of all, access all of this data. As you can imagine, the types and the sources of data are quite a bit, are pretty disparate, particularly as the number of channels by which you can engage with me as an organization has expanded, so do the number of sources. So, you know, we have to go to place A, where there's a lot of CRM information for instance, or place B, where it's a lot of online information, weblogs and web servers and what have you, right? So, we have to go to, for instance, some of these guys would have put all this information in a big data lake. Or they could have stored it in an EDW, in an enterprise data warehouse. So we've put in place a technology, an architecture, which allows us to be able to connect to all these various sources, be it Teradata products, or non-Terada- third-party sources, we don't care. We have the capability to connect all to, to these different data sources to be able to access information. So that's number one. Number two is how do you normalize all of this information? So as you can well imagine, right, webs logs servers are very different in their data makeup as apposed to CRM solutions, highly structured information. So we need a way to be able to bring them together, to connect a singular user ID across the different sources, so we have filtering, you know, data filters in place that extracts information from weblogs, let's say it's a XML file. So we extract all that information, and we connect it. We, ultimately, all of that information comes to you in a structured manner. >> And can it, can it be realtime reactive? In other words when- >> Sri: Absolutely. >> someone comes to- >> Sri: Absolutely. >> you know, a channel where you need to anticipate and influence. >> Very good question. In fact, I think we will be doing a big disservice to our customers if we did not have realtime decisioning in place. I mean, the whole idea is for us to be able to provide certain treatments based on what we anticipate your reactions are going to be to certain, let's say if it's a retail store, let's say to certain product coupons we've placed, which says, you know, come online, and basically behavior we think there's a 90% chance that tomorrow morning you're going to come back, you know, through our online portal and buy the products. And because of the fact that our analytics allows us to be able to predict your behavior tomorrow morning, as soon as you land on the online portal, we will be able to provide certain treatment to you that takes advantage of that. Absolutely. >> Techy question: because you're anticipating, does that mean you've done the prediction runs, batch, >> Sri: Absolutely. >> And so you're just serving up the answer. >> Yeah, the business level answer is absolutely. In fact, we have, as part of our advanced analytics solution, we have pre-built algorithms that take all this information that I've talked to you about, where it's connected all that information across the different sources, and we apply algorithms on top of that to be able to deliver predictive models. Now, these models, once they are actually applied as and when the data comes in, you know, you can operationalize them. So the thing to be very clear here, a key part of the Teradata story, is that not only are we in a position to be able to provide the infrastructure which allows you to be able to collect all the information, but we provide the analytic capabilities to be able to connect all of the data across the various sources and at scale, to do the analytics on top of all that disparate data, to deliver the model, and, as an important point, to operationalize that model, and then to connect it back in the feedback loop. We do the whole thing. >> That's, there's a lot to unpack in there, and I called our last guest dense. What I was actually trying to say, we had to unpack a dense answer, so it didn't come out quite that, quite right. So I won't make that mistake. >> Sri: That's a very backhanded compliment there. (George laughing) >> So, explain to me though, the, I know from all the folks who are trying to embed predictive analytics in their solutions, the operationalizing of the model is very difficult, you know, to integrate it with the system of record. >> Yeah, yeah, yeah. >> How do, you know, how do you guys do that? >> So a good point. There are two ways by which we do it. One is we have something called the AppCenter. It's called Teradata AppCenter. The AppCenter is a core capability of some of the work we've done so far, in fact we've had it for the last, I don't know, four years or so. We've actually expanded it across, uh, to include a lot of the apps. So the idea behind the AppCenter is that it's a framework for us to be able to develop very specific apps for us to be able to deliver the model so that next time, as and when realtime data comes in, when you connect to a database for instance. So the way the app works is that you set up the app. There's a code that we've created, it's all prebuilt code that he put behind that app, and it runs, the app runs. Every time the data is refreshed, you can run the app, and it automatically comes up with visualizations which allow you to be able to see what's happening with your customers in realtime. So that's one way to operationalize. In fact, you know, if you come by to our booth, we can show you a demo as to how the AppCenter works. The other say by which we've done it is to develop a software development kit where we actually have created an operationalization. So, as an, I'll give you an example, right? We developed an app, a realtime operationalization app where the folks in the call center are assessing whether you should be given a loan to buy a certain kind of car, a used car, brand new car, what have you the case might be. So what happens is the call center person takes information from you, gets information about, you know, what your income level is, you know, how long you've been working in your existing job, what have you. And those are parameters that are passed into the screen- >> By the way, I should just say, on the income level, it's way too low for my taste. >> Those are, um, those are comments I'll take, uh, later. >> Off slide. >> But, I mean, you got a brand new Armani suit, so you're not doing badly. But, uh, so what happens is, you know, as and when the data goes into the parameters, right, the call center person just clicks on the button, and the model which sits behind the app picks up all the parameters, runs it, and spews out a likelihood score saying that this person is 88% likely- >> So an AppCenter is not just a full end to end app, it also can be a model. >> AppCenter can include the model which can be used to operationalize as and when the data comes in. >> George: Okay. >> It's a very core part of our offering. In fact, AppCenter is, I can't stress how important, I can't stress enough how important it is to our ability to operationalize our various analytic models. >> Okay, one more techy question in terms of how that's supported. Is the AppCenter running on Aster or the models, are they running on Aster, uh, the old Aster database or Teradata? >> Well, just to be clear, right, so the Aster solution is called Aster Analytics of which one foreign factor contains a database, but you have Aster which is in Hadoop, you have Aster in the Cloud, you have Aster software only, so there's a lot of difference between these two, right? So AppCenter sits on Aster, but right now, it's not just the Aster AppCenter. It's called the Teradata AppCenter which sits on, with the idea is that it will sit on Teradata products as well. >> George: Okay. >> So again, it's a really core part of our evolution that we've come up with. We're very proud of it. >> On that note, we have to wrap it up for today, but to be continued. >> Sri: Time flies when you're having fun. >> Yes. So this is George Gilbert. I am with Sri Raghavan from Teradata. We are at DataWorks 2017 in San Jose, and we will be back tomorrow with a whole lineup of exciting new guests. Tune in tomorrow morning. Thanks. (electronic music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. in the era of big data and advanced analytics. And as you also know, in the last, you know, the solutions that people build with the technology Across the various areas of encounters they have with you come by and say, you know, or come and go having said How do you capture a broader, you know, a broader journey? is because, to buy things, because, you know, so we have filtering, you know, data filters in place you know, a channel where you need to which says, you know, come online, So the thing to be very clear here, That's, there's a lot to unpack in there, Sri: That's a very backhanded compliment there. you know, to integrate it with the system of record. So the way the app works is that you set up the app. By the way, I should just say, on the income level, But, uh, so what happens is, you know, So an AppCenter is not just a full end to end app, AppCenter can include the model which can be used to I can't stress enough how important it is to our Is the AppCenter running on Aster or the models, you have Aster in the Cloud, you have Aster software only, So again, it's a really core part of our evolution On that note, we have to wrap it up for today, and we will be back tomorrow with a whole lineup

ENTITIES

Entity	Category	Confidence
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Sri Raghavan	PERSON	0.99+
San Jose	LOCATION	0.99+
Teradata	ORGANIZATION	0.99+
tomorrow morning	DATE	0.99+
88%	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
90%	QUANTITY	0.99+
five	QUANTITY	0.99+
tomorrow	DATE	0.99+
two ways	QUANTITY	0.99+
last year	DATE	0.99+
today	DATE	0.99+
seven years	QUANTITY	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
four years	QUANTITY	0.98+
each part	QUANTITY	0.98+
Sri	PERSON	0.97+
AppCenter	TITLE	0.96+
Armani	ORGANIZATION	0.96+
one thing	QUANTITY	0.96+
DataWorks 2017	EVENT	0.96+
two	QUANTITY	0.96+
First	QUANTITY	0.96+
one	QUANTITY	0.96+
both	QUANTITY	0.95+
this year	DATE	0.93+
Aster	TITLE	0.92+
Aster Analytics	TITLE	0.88+
Teradata - DataWorks Summit 2017	EVENT	0.88+
Number two	QUANTITY	0.86+
Tune	DATE	0.82+
Hortonworks	ORGANIZATION	0.82+
one way	QUANTITY	0.81+
Terada	ORGANIZATION	0.8+
Hadoop	TITLE	0.77+
Influencer Summit	EVENT	0.77+
25	QUANTITY	0.68+
Aster AppCenter	COMMERCIAL_ITEM	0.62+
year before	DATE	0.55+
plus years	DATE	0.52+
past	DATE	0.51+
Teradata	TITLE	0.47+
theCUBE	ORGANIZATION	0.47+
second	QUANTITY	0.47+
theCUBE	EVENT	0.44+
Cloud	TITLE	0.4+
AppCenter	COMMERCIAL_ITEM	0.38+

Scott Gnau, Hortonworks & Tendü Yogurtçu, Syncsort - DataWorks Summit 2017

>> Man's Voiceover: Live, from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE, we are live at Day One of the DataWorks Summit, we've had a great day here, I'm surprised that we still have our voices left. I'm Lisa Martin, with my co-host George Gilbert. We have been talking with great innovators today across this great community, folks from Hortonworks, of course, IBM, partners, now I'd like to welcome back to theCube, who was here this morning in the green shoes, the CTO of Hortonworks, Scott Gnau, welcome back Scott! >> Great to be here yet again. >> Yet again! And we have another CTO, we've got CTO corner over here, with CUBE Alumni and the CTO of SyncSort, Tendu Yogurtcu Welcome back to theCUBE both of you >> Pleasure to be here, thank you. >> So, guys, what's new with the partnership? I know that syncsort, you have 87%, or 87 of the Fortune 100 companies are customers. Scott, 60 of the Fortune 100 companies are customers of Hortonworks. Talk to us about the partnership that you have with syncsort, what's new, what's going on there? >> You know there's always something new in our partnership. We launched our partnership, what a year and a half ago or so? >> Yes. And it was really built on the foundation of helping our customers get time to value very quickly, right and leveraging our mutual strengths. And we've been back on theCUBE a couple of times and we continue to have new things to talk about whether it be new customer successes or new feature functionalities or new integration of our technology. And so it's not just something that's static and sitting still, but it's a partnership that was had a great foundation in value and continues to grow. And, ya know, with some of the latest moves that I'm sure Tendu will bring us up to speed on that Syncsort has made, customers who have jumped on the bandwagon with us together are able to get much more benefit than originally they even intended. >> Let me talk about some of the things actually happening with Syncsort and with the partnership. Thank you Scott. And Trillium acquisition has been transformative for us really. We have achieved quite a lot within the last six months. Delivering joint solutions between our data integration, DMX-h, and Trillium data quality and profiling portfolio and that was kind of our first step very much focused on the data governance. We are going to have data quality for Data Lake product available later this year and this week actually we will be announcing our partnership with Collibra data governance platform basically making business rules and technical meta data available through the Collibra dashboards for data scientists. And in terms of our joint solution and joint offering for data warehouse optimization and the bundle that we launched early February of this year that's in production, a large complex production deployment's already happened. Our customers access all their data all enterprise data including legacy data, warehouse, new data sources as well as legacy main frame in the data lake so we will be announcing again in a week or so change in the capture capabilities from legacy data storage into Hadoop keeping that data fresh and giving more choices to our customers in terms of populating the data lake as well as use cases like archiving data into cloud. >> Tendu, let me try and unpack what was a very dense, in a good way, lot of content. Sticking my foot in my mouth every 30 seconds (laughter) >> Scott Voiceover: I think he called you dense. (laughter) >> So help us visualize a scenario where you have maybe DMX-h bringing data in you might have changed it at capture coming from a live data base >> Tendu Voiceover: Yes. and you've got the data quality at work as well. Help us picture how much faster and higher fidelity the data flow might be relative to >> Sure, absolutely. So, our bundle and our joint solution with Hortonworks really focuses on business use cases. And one of those use cases is enterprise data warehouse optimization where we make all data, all enterprise data accessible in the data lake. Now, if you are an insurance company managing claims or you are building a data as a service, Hadoop is a service architecture, there are multiple ways that you can keep that data fresh in the data lake. And you can have changed it at capture by basically taking snap-shots of the data and comparing in the data lake which is a viable method of doing it. But, as the data volumes are growing and the real time analytics requirements of the business are growing we recognize our customers are also looking for alternative ways that they can actually capture the change in real time when the change is just like less than 10% of the data, original data set and keep the data fresh in the data lake. So that enables faster analytics, real time analytics, as well as in the case that if you are doing something from on-premise to the cloud or archiving data, it also saves on the resources like the network bandwidth and overall resource efficiency. Now, while we are doing this, obviously we are accessing the data and the data goes through our processing engines. What Trillium brings to the table is the unmatched capabilities that are on profiling that data, getting better understanding of that data. So we will be focused on delivering products around that because as we understand data we can also help our customers to create the business rules, to cleanse that data, and preserve the fidelity of the data and integrity of the data. >> So, with the change data capture it sounds like near real time, you're capturing changes in near real time, could that serve as a streaming solution that then is also populating the history as well? >> Absolutely. We can go through streaming or message cues. We also offer more efficient proprietary ways of streaming the data to the Hadoop. >> So the, I assume the message cues refers to, probably Kafka and then your own optimized solution for sort of maximum performance, lowest latency. >> Yes, we can do either true Kafka cues which is very efficient as well. We can also go through proprietary methods. >> So, Scott, help us understand then now the governance capabilities that, um I'm having a senior moment (laughter) I'm getting too many of these! (laughter) Help us understand the governance capabilities that Syncsort's adding to the, sort of mix with the data warehouse optimization package and how it relates to what you're doing. >> Yeah, right. So what we talked about even again this morning, right the whole notion of the value of open squared, right open source and open ecosystem. And I think this is clearly an open ecosystem kind of play. So we've done a lot of work since we initially launched the partnership and through the different product releases where our engineering teams and the Syncsort teams have done some very good low-level integration of our mutual technologies so that the Syncsort tool can exploit those horizontal core services like Yarn for multi tendency and workload management and of course Atlas for data governance. So as then the Syncsort team adds feature functionality on the outside of that tool that simply accrete's to the benefit of what we've built together. And so that's why I say customers who started down this journey with us together are now going to get the benefit of additional options from that ecosystem that they can plug in additional feature functionality. And at the same time we're really thrilled because, and we've talked about this on many times right, the whole notion of governance and meta data management in the big data space is a big deal. And so the fact that we're able to come to the table with an open source solution to create common meta data tagging that then gets utilized by multiple different applications I think creates extreme value for the industry and frankly for our customers because now, regardless of the application they choose, or the applications that they choose, they can at least have that common trusted infrastructure where all of that information is tagged and it stays with the data through the data's life cycle. >> So you're partnership sounds very very symbiotic, that there's changes made on one side that reflect the other. Give us an example of where is your common customer, and this might not be, well, they're all over the place, who has got an enterprise data warehouse, are you finding more customers that are looking to modernize this? That have multi-cloud, core edge, IOT devices that's a pretty distributed environment versus customers that might be still more on prem? What's kind of the mix there? >> Can I start and then I will let you build on. I want to add something to what Scott said earlier. Atlas is a very important integration point for us and in terms of the partnership that you mentioned the relation, I think one of the strengths of our partnership is at many different levels it's not just executive level, it's cross functional and also from very close field teams, marketing teams and engineering field teams working together And in terms of our customers, it's really organizations are trying to move toward modern data architecture. And as they are trying to build the modern data architecture there are the data in motion piece I will let Scott talk about, data in rest piece and as we have so much data coming from cloud, originating through mobile and web in the enterprise, especially the Fortune 500, that we talk, Fortune 100 we talked about, insurance, health care, Talco financial services and banking has a lot of legacy data stores. So our, really joint solution and the couple of first use cases, business use cases we targeted were around that. How do we enable these data stores and data in the modern data architecture? I will let Scott >> Yeah, I agree And so certainly we have a lot of customers already who are joint customers and so they can get the value of the partnership kind of cuz they've already made the right decision, right. I also think, though, there's a lot of green field opportunity for us because there are hundreds if not thousands of customers out there who have legacy data systems where their data is kind of locked away. And by the way, it's not to say the systems aren't functioning and doing a good job, they are. They're running business facing applications and all of that's really great, but that is a source of raw material that belongs also in the data lake, right, and can be, can certainly enhance the value of all the other data that's being built there. And so the value, frankly, of our partnership is really creating that easy bridge to kind of unlock that data from those legacy systems and get it in the data lake and then from there, the sky's the limit, right. Is it reference data that can then be used for consistency of response when you're joining it to social data and web data? Frankly, is it an online archive, and optimization of the overall data fabric and off loading some of the historical data that may not even be used in legacy systems and having a place to put it where it actually can be accessed. And so, there are a lot of great use cases. You're right, it's a very symbiotic relationship. I think there's only upside because we really do complement each other and there is a distinct value proposition not just for our existing customers but frankly for a large set of customers out there that have, kind of, the data locked away. >> So, how would you see do you see the data warehouse optimization sort of solution set continuing to expand its functional footprint? What are some things to keep pushing out the edge conditions, the realm of possibilities? >> Some of the areas that we are jointly focused on is we are liberating that data from the enterprise data warehouse or legacy architectures. Through the syncs or DMX-h we actually understand the path that data travel from, the meta data is something that we can now integrate into Atlas and publish into Atlas and have Atlas as the open data governance solution. So that's an area that definitely we see an opportunity to grow and also strengthen that joint solution. >> Sure, I mean extended provenance is kind of what you're describing and that's a big deal when you think about some of these legacy systems where frankly 90% of the costs of implementing them originally was actually building out those business rules and that meta data. And so being able to preserve that and bring it over into a common or an open platform is a really big deal. I'd say inside of the platform of course as we continue to create new performance advantages in, ya know, the latest releases of Hive as an example where we can get low latency query response times there's a whole new class of work loads that now is appropriate to move into this platform and you'll see us continue to move along those lines as we advance the technology from the open community. >> Well, congratulations on continuing this great, symbiotic as we said, partnership. It sounds like it's incredible strong on the technology side, on the strategic side, on the GTM side. I'd loved how you said liberating data so that companies can really unlock its transformational value. We want to thank both of you for Scott coming back on theCUBE >> Thank you. twice in one day. >> Twice in one day. Tendu, thank you as well >> Thank you. for coming back to theCUBE. >> Always a pleasure. For both of our CTO's that have joined us from Hortonworks and Syncsort and my co-host George Gilbert, I am Lisa Martin, you've been watching theCUBE live from day one of the DataWorks summit. Stick around, we've got great guests coming up (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, the CTO of Hortonworks, Scott Gnau, Pleasure to be here, Scott, 60 of the Fortune 100 companies We launched our partnership, what and we continue to have new things and the bundle that we launched early February of this year what was a very dense, in a good way, lot of content. Scott Voiceover: I think he called you dense. and higher fidelity the data flow might be relative to and keep the data fresh in the data lake. We can go through streaming or message cues. So the, I assume the message cues refers to, Yes, we can do either true Kafka cues and how it relates to what you're doing. And so the fact that we're able that reflect the other. and in terms of the partnership and get it in the data lake Some of the areas that we are jointly focused on frankly 90% of the costs of implementing them originally on the strategic side, on the GTM side. Thank you. Tendu, thank you as well for coming back to theCUBE. For both of our CTO's that have joined us

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
hundreds	QUANTITY	0.99+
90%	QUANTITY	0.99+
Twice	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
twice	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trillium	ORGANIZATION	0.99+
Syncsort	ORGANIZATION	0.99+
both	QUANTITY	0.99+
60	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
Data Lake	ORGANIZATION	0.99+
less than 10%	QUANTITY	0.99+
this week	DATE	0.99+
one day	QUANTITY	0.99+
Tendu	ORGANIZATION	0.99+
Collibra	ORGANIZATION	0.99+
87%	QUANTITY	0.99+
first step	QUANTITY	0.99+
thousands of customers	QUANTITY	0.99+
Syncsort	TITLE	0.98+
87	QUANTITY	0.98+
one	QUANTITY	0.98+
Atlas	TITLE	0.98+
later this year	DATE	0.98+
SyncSort	ORGANIZATION	0.98+
DataWorks Summit	EVENT	0.98+
a year and a half ago	DATE	0.97+
Tendu	PERSON	0.97+
DataWorks Summit 2017	EVENT	0.97+
Day One	QUANTITY	0.97+
Fortune 500	ORGANIZATION	0.96+
a week	QUANTITY	0.96+
one side	QUANTITY	0.96+
Fortune 100	ORGANIZATION	0.96+
Scott Voiceover	PERSON	0.95+
Hadoop	TITLE	0.93+
Atlas	ORGANIZATION	0.93+
theCUBE	ORGANIZATION	0.92+
this morning	DATE	0.92+
CTO	PERSON	0.92+
day one	QUANTITY	0.92+
couple	QUANTITY	0.91+
last six months	DATE	0.9+
first use cases	QUANTITY	0.9+
early February of this year	DATE	0.89+
theCube	ORGANIZATION	0.89+
CUBE Alumni	ORGANIZATION	0.87+
DataWorks summit	EVENT	0.86+
today	DATE	0.86+
Talco financial services	ORGANIZATION	0.85+
every 30 seconds	QUANTITY	0.83+
Fortune	ORGANIZATION	0.8+
Kafka	PERSON	0.79+
DMX-h	ORGANIZATION	0.75+
data lake	ORGANIZATION	0.73+
Man's Voiceover	TITLE	0.6+
Kafka	TITLE	0.6+

David Lyle, Informatica - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's the Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to the Cube, I'm Lisa Martin with my co-host, Peter Buress. We are live on day one of the DataWorks Summit in Silicon Valley. We've had a great day so far, talking about innovation across different, different companies, different use cases, it's been really exciting. And now, please welcome our next guest, David Lyle from Informatica. You are driving business transformation services. >> Yes. >> Lisa: Welcome to the Cube. >> Well thank you, it's good to be here. >> It's great to have you here. So, tell us a little about Informatica World, Peter you were there with the Cube. Just recently some of the big announcements that came out of there, Informatica getting more aggressive with cloud movement, extending your master data management strategy, and you also introduce a set of AI capabilities around meta-data. >> David: Exactly. >> So, looking at those three things, and your customer landscape, what's going on with Informatica customers, where are you seeing these great new capabilities be, come to fruition? >> Absolutely, well one of the areas that is really wonderful that we're using in every other aspect of our life is using the computer to do the logical things it should, and could, be doing to help us out. So, in this announcement at Informatica World, we talked about the central aspect of meta-data finally being the true center of Informatica's universe. So bringing in meta-data-- >> And customer's universes. >> Well, and customer's universes, so the, not seeing it as something that sits over here that's not central, but truly the thing that, is where you should be focusing your attention on. And so Informatica has some card carrying PhD artificial intelligence machine learning engineers, scientists, that we have hired, that have been working for several years, that have built this new capability called CLAIRE. That's the marketing term for it, but really what it is, it's helping to apply artificial intelligence against that meta-data, to use the computer to do things for the developer, for the analyst, for the architect, for the business people, whatever, that are dealing with these complex data transformation initiatives that they're doing. Where in the past what's been happening is whatever product you're using, the product is basically keeping track of all the things that the scientist or analyst does, but isn't really looking at that meta-data to help suggest the things, that they, that maybe has already been done before. Or domains of data. Why, how come you have to tell the system that this is an address? Can't the system identify that when data looks like this, it's an address already? We think about Shazam and all these other apps that we have on our phones that can do these fantastic things with music. How come we can't do those same things with data? Well, that's really what CLAIRE can actually do now is discover these things and help. >> Well, I want to push now a little bit. >> David: Sure, sure. >> So, historically meta-data was the thing that you created in the modeling activity. >> David: Right. >> And it wasn't something that you wanted to change, or was expected to change frequently. >> In fact, in the world of transaction processing, you didn't want to change. >> Oh, yeah. And especially you get into finance apps, and things like that, you want to keep that slow. >> Exactly. >> Yeah. >> And meta-data became one of those things that often had to be secured in a different way, and was one of those reasons why IT was always so slow. >> Yeah. >> Because of all these concerns about what's the impact on meta-data. >> Yeah. >> We move into this big data world, and we're bringing forward many of the same perspectives on how we should treat meta-data, and what you guys are doing is saying "that's fine, keep the meta-data of that data, but do a better job of revealing it, and how it connects-- >> David: Exactly. >> and how it could be connected." And we talked about this with Bill Schmarzo just recently-- >> Good friend of mine. >> Yeah, the data that's in that system can also be applied to that system. >> Yeah. >> It doesn't have to be a silo. And what CLAIRE is trying to do is remove some of the artificial barriers-- >> Exactly. >> Of how we get access to data that are founded by organization, or application, or system. >> David: Right. >> And make it easier to find that data, use that data, and trust the data. >> Exactly. >> Peter: I got that right? >> You've totally got that right. So, if we think about all these systems in a organization as this giant complex air ball, that in the past we may have had pockets of meta-data here and there that weren't really exposed, or controlled in the right way in the first place. But now bringing it together. >> But also valuable in the context of the particular database or system-- >> Yep. >> that was running. It wasn't the meta-data that was guarded as valuable-- >> Right. that just provided documentation for what was in the data. >> Exactly, exactly. So, but now with this ability to see it, really for the first time, and understand how it connects and impacts with other systems, that are exchanging data with this, or viewing data with this. We can understand then if I need, occasionally, to make a change to the general ledger, or something, I can now understand what impact on different KPIs, and the calculations stream of tableaux, business objects, cognos, micro strategy, quick, whatever. That, what else do I need to change? What else do I need to test? That's something computers are good at. Something that humans have had to do manually up to this point. And that, that's what computers are for. >> Right. >> So questions for you on the business side. Since we look at-- >> Yeah. >> Businesses are demanding real time access to data to make real time decisions, manage costs, be competitive, and that's driving cloud, it's driving IOTs, it's driving big data and analytics. You talked about CLAIRE, and the implications of it across different people within an organization. >> Right. Meta-data, how does a C-Sweet, or a senior manager care-- >> David: Good point. >> About meta-data? >> They don't, and that's why we don't talk about the word architecture. Typically we see sweet folks we don't use the word meta-data. We see sweet folks, instead we talk about things like solving the problem of time, to get the application, or information that you need, reducing that time by being able to see and change and retest the things that need to be. So we just change the discussion to either dollars, or time, or of course those are really equivalent. >> But really facilitated by this-- >> Exactly. >> Artificial intelligence. >> It's facilitated by this artificial intelligence. It can also then lead to the, when we get into data lakes, ensuring that those data lakes are, understood better, trusted better, that people are being able to see what other people are actually using. And in other words we kind of bring, somewhat, the Amazon.com website model to the data lake, so that people know, okay, if I'm looking of a product, or data set, that looks like this for my, our, processing data science utility, or what I want to do. Then these are the data sets that are out there, that may be useful. This is how many people have used them, or who those other people are, and are those people kind of trusted, valid, people that have done similar stuff to what I want to do before? Anyway, all that information we're used to when we buy products from Amazon, we bring that now to the data lake that you're putting together, so that you can actually prevent it, kind of, from being a swamp and actually get value at it. Once again, it's the meta-data that's the key to that, of getting the value out of that data. >> Have you seen historically that, you're working with customers that, have or are already using hadoop. >> David: That's right. >> They've got data lakes. >> Oh yeah. >> Have you seen that historically they haven't really thought about meta-data as driving this much value before, is this sort of a, not a new problem, but are you seeing that it's not been part of their-- >> It's a new. >> strategic approach. >> That's right, it's a new solution. I think you talk to anybody, and they knew this problem was coming. That with a data lake, and the speed that we're talking about, if you don't back that up with the corresponding information that you need to really digest, you can create a new mess, a new hairball, faster than you ever created the original hairball you're trying to fix in the first place. >> Lisa: Nobody likes a hairball. >> Nobody likes a hairball, exactly. >> Well it also seems as though, for example at the executive level, do I have a question? Can I get this question answered? How do I get this question answered? How can I trust the answer that I get? In many respects that's what you guys are trying to solve. >> David: Exactly, exactly. >> So, it's not, hey what you need to do is invest a whole bunch in the actual data, or copying data, or moving a bunch of data around. You're just starting with the prob, with the observation, with the proposition. Yes, you can answer this question, here's how you're going to do it, and you can trust it because of this trail-- >> David: Exactly. >> Of activities based on the meta-data. >> Exactly, exactly. So, it's about helping to, hate to use the phrase again, but "detangle" that hairball, so that, or at least manage it a bit, so that we can begin to move faster and solve these problems with a hell of a lot more confidence. So we have-- >> Can we switch gears? >> Absolutely. >> Certainly. >> Let's switch gears and talk about transformations. >> Yeah. >> I know that's something that is near and dear to your heart, and something you're spending a lot of time with clients in. >> Yeah. >> How, how do you approach, when a customer comes to you, how are they approaching the transformation, and what are they, what's the conversation that you're having with them? >> Well, it's interesting that the phrase has, and I'm even thinking of changing our group's title to digital transformation services, not just because it's hot, but because, frankly, the fluid or the thing, the glue, that really makes that happen is data in these different environments. But the way that we approach it is by, well understanding what the business capabilities are that are affected by the transformation that is being discussed. Looking at and prioritizing those capabilities based upon the strategic relevance of that capability, along with the opportunity to improve, and multiplying those together, we can then take those and rank those capabilities, and look at it in conjunction with, what we call a business view of the company. And from that we can understand what the effects are on the different parts of the organization, and create the corresponding plans, or roadmaps that are necessary to do this digital transformation. We actually bought a little stealth acquisition of a company two years ago, that's kind of the underpinnings of what my team does, that is extremely helpful in being able to drive these kinds of complex transformations. In fact, big companies, a lot, several in this room in a way, are going through the transformation of moving from a traditional software license sale transaction with the customer to a subscription, monthly transaction. That changes marketing. That changes sales. That changes customer support. That changes R&D. Everything Changes. >> Everything, yeah. >> How do you coordinate that? What is the data that you need in order to calculate a new KPI for how I judge how well I'm doing in my company? Annual recurring revenue, or something. It's a, these are all, they get into data governance. You get into all these different aspects, and that's what our team's tool and approach is actually able to credibly go in, and lay out this road map for folks that is shocking, kind of, in how it's making complex problems manageable. Not necessarily simple. Actually it was Bill Schmarzo, on the, he told me this 15 years ago. Our problem is not to make simple problems mundane, our problem, or what we're trying to do, is make complex problems manageable. I love that. >> Sounds like something-- >> I love that. >> Bill would say. >> That's an important point though about not saying "we're going to make it simple-- >> No. >> we're going to make it manageable." >> David: Exactly. >> Because that's much more realistic. >> David: Right. >> Don't you think? >> David: Exactly, exactly. The fact-- >> I dunno, if we can make them simple, that's good too. >> That would be nice. >> Oh, we'd love that >> Yeah. >> Oh yeah. >> When it happens, it's beautiful. >> That's art. >> Right, right. >> Well, your passion and your excitement for what you guys have just announced is palpable. So, obviously just coming off that announcement, what's next? We look out the rest of the calendar year, what's next for Informatica and transforming digital businesses? >> I think it is, you could say the first 20 years, almost, of Informatica's existence was building that meta-data center of gravity, and allowing people to put stuff in, I guess you could say. So going forward, the future is getting value out. It's continually finding new ways to use, in the same way, for instance, Apple is trying to improve Siri, right? And each release they come out with more capabilities. Obviously Google and Amazon seems to be working a little better, but nevertheless, it's all about continuous improvement. Now, I think, the things that Informatica is doing, is moving that, power of using that meta-data also towards helping our customers more directly with the business aspect of data in a digital transformation. >> Excellent. Well, David, thank you so much for joining us on the Cube. We wish you continued success, I'm sure the Cube be back with Informatica in the next round. >> Excellent. >> Thanks for sharing your passion and your excitement for what you guys are doing. Like I said, it was very palpable, and it's always exciting to have that on the show. So, thank you for watching. I'm Lisa Martin, for my co-host Peter Burress, we thank you for watching the Cube again. And we are live on day one of the Dataworks summit from San Jose. Stick around, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. We are live on day one of the It's great to have you here. and could, be doing to help us out. that we have on our phones that can do that you created in the modeling activity. you wanted to change, In fact, in the world of transaction processing, And especially you get into finance apps, things that often had to be secured in a different way, Because of all these concerns And we talked about this with Bill Schmarzo just recently-- Yeah, the data that's in that system is remove some of the artificial barriers-- that are founded by organization, And make it easier to find that data, that in the past we may have had pockets of that was running. that just provided documentation and the calculations stream of tableaux, So questions for you on the business side. and the implications of it across Meta-data, how does a C-Sweet, or a senior manager care-- and change and retest the things that need to be. it's the meta-data that's the key to that, Have you seen historically that, and the speed that we're talking about, In many respects that's what you guys are trying to solve. and you can trust it because of this trail-- so that we can begin to move faster near and dear to your heart, And from that we can understand what the What is the data that you need in order David: Exactly, exactly. for what you guys have just announced is palpable. and allowing people to put stuff in, I'm sure the Cube be back with and it's always exciting to have that on the show.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Peter Buress	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Lyle	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Peter	PERSON	0.99+
Informatica	ORGANIZATION	0.99+
Peter Burress	PERSON	0.99+
Google	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
Silicon Valley	LOCATION	0.99+
Siri	TITLE	0.99+
San Jose	LOCATION	0.99+
Amazon.com	ORGANIZATION	0.99+
CLAIRE	PERSON	0.99+
first time	QUANTITY	0.99+
one	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
Bill	PERSON	0.97+
first 20 years	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
two years ago	DATE	0.97+
Informatica World	ORGANIZATION	0.96+
Dataworks	EVENT	0.96+
each release	QUANTITY	0.96+
Cube	COMMERCIAL_ITEM	0.95+
first place	QUANTITY	0.93+
day one	QUANTITY	0.92+
three things	QUANTITY	0.92+
Hortonworks	ORGANIZATION	0.9+
Informatica - DataWorks Summit 2017	EVENT	0.89+
15 years ago	DATE	0.88+
Cube	ORGANIZATION	0.88+
Cube	TITLE	0.53+
Shazam	ORGANIZATION	0.51+
years	QUANTITY	0.47+

David Hseih, Qubole - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Hey, welcome back to theCUBE. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host Peter Burgess. Just chatting with our next guest about the Warriors win yesterday we're also pretty excited about that. David Hseih the SVP of Marketing from Qubole, hi David. >> David: Hey, thanks for having me. >> Welcome to theCUBE, we're glad you still have a voice after no doubt cheering on the home team last night. >> It was a close call 'cause I was yelling pretty loud yesterday. >> So talk to us about you the SVP of Marketing for Qubole. Big data platform in the cloud. You guys just had a big announcement a few weeks ago. >> David: Right. >> What are your thoughts, what's going on with Qubole? What's going on with big data? What are you seeing in the market? So you know we're a cloud-native data platform and you know when we talk to customers, we're really, you know, they're really complaining about how they're just struggling with complexity and the barriers to entry and you know, they're really crying out for help. And the good news I suppose is we're in an industry that has a very high pace of innovation. That's great right. Spark has had eight versions now in two years, but that pace of innovation is, you know, making the complexity even harder. I was watching Cloudera bragging about how their new product is a combination of 24 open source projects. You know that's tough stuff, right. So if you're a practitioner trying to get big data operationalized in your company. And trying to scale the use of data and analytics across the company. The nature of open source is it's designed for flexibility. Right, the source codes public, you have all these options, configuration settings et cetera. But moving those into production and then scaling them in a reliable way is just crushing practitioners. And so data teams are suffering, and I think frankly it's bad for our industry, because, you know, Gardner's talking about a, you know, 80% failure rate of big data projects by 2018. Think about that, what industry can survive when 70 or 80% of the projects fail? >> Well I think what's let me push on that a little bit. Because I think that the concern is about, is not about 70 to 80% of the efforts to reach an answer in a complex big data thing, it's going to fail. We can probably accommodate that, but what we can't accommodate is failure in the underlying infrastructure. >> David: Absolutely. >> So the research we've done, suggest something as well that we are seeing an enormous amount of time spent on the underlying infrastructure. And there's a lot of failures there. People would say, I have a question, I want to know if there's an answer and try to get to that answer, and not getting the answer they want, >> David: Yep. or getting a different answer. That kind of failure is still okay. >> David: Right. >> Because that's experience, you get more and more and more. >> David: Absolutely. >> So it's not the failure in the data science side or the application side. >> Actually I would say getting to an answer you don't like, is a form of success. Like you have an idea, you try it out, that's all great. >> So what Gardner is really saying it's failure in the implementation of the infrastructure. >> That's exactly right. >> So it's the administrative and operational sides. >> Correct, it's a project that didn't deliver then resolve. If the end result what you hoped, great. >> You couldn't even answer your question. >> Exactly, couldn't even answer the question. >> So let me test something on you Dave, David. We've been carrying a thesis at Wikibon for awhile that it looks like opensource is proving that it's very good at mimicking, and not quite as good at inventing. >> David: Right. >> So by that I mean if you put an operating, drop an operating system in front of Linus Torvalds he can look at that and say I can do that. >> David: Right. >> And do a great job of it. If you put a development tool same kind of thing. But big data is very complex, a lot of it, an enormous number of usecases. >> David: Correct. >> And open source has done a good job at a tool level and it looks as though the tools are being built to make other tools more valuable, >> David: Ha, right. >> As opposed to making it easy for a business to operationalize data science and the use of big data in their business. Would you agree or disagree with that? >> I yeah, I think that sort of like fundamentally the philosophy of open source. You know I'm going to do my work, something I need for me, but I'm going to share it with everybody else. And they can contribute. But at the end of the day, you know, unlike commercial software, there's sort of no one throat to choke. Right and there's nobody who is going to guarantee the interoperability and the success of the piece of software that you're trying to deploy. >> There's mot even a real coherent vision in many respects. >> David: No, absolutely not. >> What the final product's going to end up looking like. >> So what you have is a lot of really great cutting edge technology that a lot of really smart people, sort of poured their hearts and souls into. But that's a little different than trying to get to an end result. And, you know. Like it or not, commercial software packages are designed to deliver the result you pay for. Open source being sort of philosophically, very different I think breeds you know inherent complexity. And that complexity right now, is I think the root of the problem in our industry. >> So give us an example David, you know, you're a Marketing guy, I'm a marketing gal. >> Sure. >> Give us an example of a customer, maybe one of your favorite examples, where are you helping them? They're struggling here, they've made significant investments from an infrastructure perspective. They know there's value in the data, >> David: Yup. varying degrees as we've talked about before. How does Qubole get in there and start helping this usecase customer start to optimize, and really start making this big data project successful? >> That's a great question. So there's really two things, number one is that we are a SAAS based platform in the cloud and what we do basically is make big data into more of a turnkey service. So actually the other day, I was sort of surfing the internet, and we have a customer from Sonic Drive-In. You know they do hamburgers and stuff. >> Lisa: Oh yeah. >> And they're doing a bunch of big data, and this guy was at a data science meet, talking about. We didn't put him up to this, he just volunteered. He was talking about how we made his life so much easier. Why, because all of the configurations stuff, the settings, and you know, how to manage costs, was basically filling out a form and setting policy and parameters. And not having to write scripts and figure out all these configuration settings. If I set this one this way and that one that way, what happens. You know, we have a sort of more curated environment that makes that easy. But the thing that I'm really excited about is we think this is the time to really look at having data platforms that can you know, build or run autonomously. Today companies have to hire really expensive, really highly skilled, super smart data engineers, and data ops people to run their infrastructure. And you know, if you look at studies, we're about a 180,000 people short of the number of data engineers, data ops this industry needs. So try to scale by adding more smart people is super hard. Right but instead if you could start to get machines to do what people are doing. Just faster, cheaper, more reliably. Then you can scale your data platform. So we basically, made an announcement a couple weeks ago, kind of about the industry's first autonomous data platform. And what we're building, are software agents that can take over certain types of data management tasks so that data engineers don't have to do it. Or don't have to be up at three in the morning making sure everything is going right. >> And from a market segmentation perspective where's your sweet spot for that? Enterprise, SMB, somewhere in the middle? >> The bigger you have to scale. It's not about company size it's really about sort of the scope and scale of your big data efforts. So you know, the more people you have using it, then the more data you have. The more you want automation to make things easier. It's sort of true of any industry, it's certainly going to be true of the big data industry. >> Peter: Yeah more complexity in the question set, >> Correct. >> The more complexity-- >> Or the more users you have, the more it gives. Adds more data sources. >> Which presumable is going to be correlated. >> Absolutely correct. >> Which is we can use a big data project to ascertain that. >> Well in fact that sort of what we're doing. Because we're a SAAS platform we take in the metadata from what our customers are doing. What users, what clusters, what queries, which tables, all that stuff. We basically use machine learning and artificial intelligence to analyze how you're using your data platform. And tell you what you could do better or automates stuff that you don't have to do anymore. >> So we've presumed that the industry at some point of time, the big data industry at some point of time, is going to start moving it's attention to things like machine learning and A.I., you know, up into applications. >> David: Yep. >> Are we going to see the big data industry basically more pretty rapidly into more of inservice or application conversation, or is it going to kind of are we going to see a rebirth, as folks try to bring a more coherent approach to the existing, many of the tools that are here right now. >> David: Right. >> What do you think? >> Well I think, we're going to see some degree of industry consolidation, and you're going to see vendors, you know, and you're seeing it today. Try to simplify and consolidate. Right so some of that is moving stack towards applications some of that is about repackaging their offerings and adding simplicity. It's about using artificial intelligence to make the operational platform itself easier. I think you'll see a variety of those things, because you know, companies have too many places where they can stumble in their deployment. And you know, it's going to be, you know, the vendor community has to step in and simplify those things to basically gain greater adoption. >> So if you think about it, what is, I mean I have my own idea, but what do you think the metric that businesses should be using as they conceive of how to source different tools and invest in different tools, put things together. I think it's increasingly we're going to talk about time to value. What do you think? >> I think time to value is one. I think another one you could look at is the number of people who have access to the data to create insights. Right so you know, you can say a 100% of my company has access to the data and analytics that they need to help their function run better. Whatever it is, that's a pretty awesome accomplishment. And you know, there's a bunch of people who may or may not have 100% but they're pretty close, right. And they've really become a data driven enterprise. And then you have lots of companies what are sort of stuck with, okay we have this usecase running, thank goodness. Took us two years and a couple million bucks and now they're trying to figure out how to get to the next step. And so they have five users who are able to use their data platform successfully. That's you know, I think that's a big measure of success. >> So I want to talk quickly about, if I may about the cloud. >> David: Yeah. >> Because it's pretty clear there are a number of, that there are some very, very large shops. >> David: Yep. >> That are starting to conceive of important parts of their overall approach to data. >> David: Right. >> And putting things into the cloud. There's a lot of advantages of doing it that way. At the same time they're also thinking about, and how I'm going to integrate, the models that I generate out of big data back into applications that might be running in a lot of different places. >> Right. >> That suggests there's going to be a new challenge on the horizon. Of how do we think about end to end bringing applications together with predictable date of movement and control and other types of activities. >> David: Yeah. >> Do you agree that's on the horizon of how we think about end to end performance across multiple different clouds? >> I think that's coming, you know, I think I'm still surprised at how many people have not figured out that the economic and agility advantages of cloud, are so great, that'd you'd be honestly foolish not to, you know, consider cloud and have that proactive way to migrate there. And so there is just you know a shocking amount of companies that are still plotting away, you know, and building their own prime infrastructures et cetera. And they still have hesitancy and questions about the cloud. I do think that you're right, but I think what you're talking about is, you know, three to five years out for the mainstream in the industry. Certainly there are early adopters you know, who have sort of gotten there. They're talking about that now. But as sort of a mainstream phenomenon I think that's a couple years out. >> Excuse me Peter, one of the things that just kind of made me think of was, you know, these companies as what you're saying, that is till had hesitancy regarding cloud. >> Right. >> And kind of vendor lock in popped into my head. And that kid of brought me back to one of the things that you were mentioning in the beginning. Open source, complexity there. >> David: Yep. >> Are you seeing, or are you helping companies to go back to more of that commercialized proprietary software. Are you seeing a shift in enterprises being less concerned about lock-in because they want simplicity? >> You know that's a great question. I think in the big data space it's hard to avoid, you know, sort of going down the open source path. I think what people are getting concerned about is getting locked into a single cloud vendor. So more and more of the conversations we have are about, what are your multi-cloud and eventually cross-cloud capabilities? >> Peter: That's the question I just asked, right. >> Exactly so I think more and more of that's coming to the front. I was with a large, very large healthcare company a week ago, and I said, what's your cloud strategy? And they said we have a no vendor left behind policy. So you know our, we're standardized on Azure, we've got a bunch of pilots on AWS, and we're planning to move from a data warehousing vendor to Oracle in the cloud. Ha so, I think for large companies a lot of them can't control the fact that different division, departments, whatever will use different clouds. So architecturally, they're going to have to start to think about using these multi-cloud, cross-cloud you know, scenarios. And you know, most large companies, given a choice, will not bet the farm on a single cloud provider. And you know, we're great partners and we love Amazon, but every time they have you know, an S3 outage like they had a few months ago. You know, it really makes people think carefully about what their infrastructure is and how they're dealing with reliability. >> Well in fairness they don't have that many, >> They don't, it only takes one. >> That's right, that's right, and there's reasons to suspect that there will be increased specialization of services in the cloud. >> David: Correct. >> So I mean it's going to get more complex as we go as well. >> David: Oh absolutely correct. >> Not less. >> Well David Hseih, SVP of Marketing at Qubole. Thank you so much for joining, >> Thank you. >> And sharing your insights with Peter and myself. It's been very insightful. >> Right. >> So this is another great example of how we've been talking about the Warriors and food, Sonic was brought up into play here. >> David: Exactly, go Sonic. Very exciting you never know what's going to happen on theCUBE. So for David and Peter, I am Lisa Martin, You're watching Day One, of the Data Work Summit, in the heart of Silicon Valley. But stick around because we've got more great content coming your way.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. in the heart of Silicon Valley. Welcome to theCUBE, we're glad you still have a voice It was a close call 'cause I was So talk to us about you the SVP of Marketing for Qubole. and the barriers to entry and you know, is not about 70 to 80% of the efforts to reach So the research we've done, suggest something as well That kind of failure is still okay. So it's not the failure in the Like you have an idea, you try it out, that's all great. it's failure in the implementation of the infrastructure. If the end result what you hoped, great. So let me test something on you Dave, David. So by that I mean if you put an operating, If you put a development tool same kind of thing. and the use of big data in their business. But at the end of the day, you know, unlike are designed to deliver the result you pay for. So give us an example David, you know, you're a They know there's value in the data, and really start making this big data project successful? So actually the other day, I was sort of surfing the the settings, and you know, how to manage costs, So you know, the more people you have Or the more users you have, the more it gives. or automates stuff that you don't have to do anymore. you know, up into applications. many of the tools that are here right now. And you know, it's going to be, you know, I mean I have my own idea, but what do you think And you know, there's a bunch of people who may or that there are some very, very large shops. of their overall approach to data. and how I'm going to integrate, the models That suggests there's going to be I think that's coming, you know, I think I'm still just kind of made me think of was, you know, And that kid of brought me back to one of the things Are you seeing, or are you helping companies So more and more of the conversations we have And you know, we're great partners and we love Amazon, to suspect that there will be increased Thank you so much for joining, And sharing your insights with Peter and myself. talking about the Warriors and food, Very exciting you never know what's

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Peter Burgess	PERSON	0.99+
Lisa Martin	PERSON	0.99+
David Hseih	PERSON	0.99+
Peter	PERSON	0.99+
Lisa	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
70	QUANTITY	0.99+
Dave	PERSON	0.99+
five users	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
2018	DATE	0.99+
Oracle	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
two years	QUANTITY	0.99+
100%	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
24 open source projects	QUANTITY	0.99+
80%	QUANTITY	0.99+
three	QUANTITY	0.99+
Gardner	PERSON	0.99+
Qubole	ORGANIZATION	0.99+
Sonic Drive-In	ORGANIZATION	0.99+
Linus Torvalds	PERSON	0.99+
yesterday	DATE	0.99+
two things	QUANTITY	0.99+
one	QUANTITY	0.99+
five years	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
Today	DATE	0.98+
Wikibon	ORGANIZATION	0.98+
Data Work Summit	EVENT	0.98+
a week ago	DATE	0.98+
eight versions	QUANTITY	0.97+
last night	DATE	0.97+
theCUBE	ORGANIZATION	0.97+
Day One	QUANTITY	0.96+
today	DATE	0.96+
DataWorks Summit 2017	EVENT	0.96+
Sonic	PERSON	0.95+
single cloud	QUANTITY	0.95+
180,000 people	QUANTITY	0.95+
Hortonworks	ORGANIZATION	0.93+
day one	QUANTITY	0.93+
S3	COMMERCIAL_ITEM	0.93+
Spark	ORGANIZATION	0.92+
Azure	TITLE	0.92+
Cloudera	ORGANIZATION	0.92+
SVP	PERSON	0.89+
few months ago	DATE	0.89+
few weeks ago	DATE	0.86+
first autonomous data platform	QUANTITY	0.86+
a couple weeks ago	DATE	0.81+

Bill Schmarzo, Dell EMC | DataWorks Summit 2017

>> Voiceover: Live from San Jose in the heart of Silicon Valley, it's The Cube covering DataWorks Summit 2017. Brought to you by: Hortonworks. >> Hey, welcome back to The Cube. We are live on day one of the DataWorks Summit in the heart of Silicon Valley. I'm Lisa Martin with my co-host Peter Burris. Not only is this day one of the DataWorks Summit, this is the day after the Golden State Warriors won the NBA Championship. Please welcome our next guess, the CTO of Dell AMC, Bill Shmarzo. And Cube alumni, clearly sporting the pride. >> Did they win? I don't even remember. I just was-- >> Are we breaking news? (laughter) Bill, it's great to have you back on The Cube. >> The Division III All-American from-- >> Cole College. >> 1947? >> Oh, yeah, yeah, about then. They still had the peach baskets. You make a basket, you have to climb up this ladder and pull it out. >> They're going rogue on me. >> It really slowed the game down a lot. (laughter) >> All right so-- And before we started they were analyzing the game, it was actually really interesting. But, kick things off, Bill, as the volume and the variety and the velocity of data are changing, organizations know there's a tremendous amount of transformational value in this data. How is Dell AMC helping enterprises extract and maximize that as the economic value of data's changing? >> So, the thing that we find is most relevant is most of our customers don't give a hoot about the three V's of big data. Especially on the business side. We like to jokingly say they care of the four M's of big data, make me more money. So, when you think about digital transformation and how it might take an organization from where they are today to sort of imbed digital capabilities around data and analytics, it's really about, "How do I make more money?" What processes can I eliminate or reduce? How do I improve my ability to market and reach customers? How do I, ya know-- All the things that are designed to drive value from a value perspective. Let's go back to, ya know, Tom Peters kind of thinking, right? I guess Michael Porter, right? His value creation processes. So, we find that when we have a conversation around the business and what the business is trying to accomplish that provides the framework around which to have this digital transformation conversation. >> So, well, Bill, it's interesting. The volume, velocity, variety; three V's, really say something about the value of the infrastructure. So, you have to have infrastructure in place where you can get more volume, it can move faster, and you can handle more variety. But, fundamentally, it is still a statement about the underlying value of the infrastructure and the tooling associated with the data. >> True, but one of the things that changes is not all data is of equal value. >> Peter: Absolutely. >> Right? So, what data, what technologies-- Do I need to have Spark? Well, I don't know, what are you trying to do, right? Do I need to have Kafka or Ioda, right? Do I need to have these things? Well, if I don't know what I'm trying to do, then I don't have a way to value the data and I don't have a way to figure out and prioritize my investment and infrastructure. >> But, that's what I want to come to. So, increasingly, what business executives, at least the ones who we're talking to all the time, are make me more money. >> Right. >> But, it really is, what is the value of my data? And, how do I start pricing data and how do I start thinking about investing so that today's data can be valuable tomorrow? Or the data that's not going to be valuable tomorrow, I can find some other way to not spend money on it, etc. >> Right. >> That's different from the variety, velocity, volume statement which is all about the infrastructure-- >> Amen. >> --and what an IT guy might be worried about. So, I've done a lot of work on data value, you've done a lot of work in data value. We've coincided a couple times. Let's pick that notion up of, ya know, digital transformation is all about what you do with your data. So, what are you seeing in your clients as they start thinking this through? >> Well, I think one of the first times it was sort of an "aha" moment to me was when I had a conversation with you about Adam Smith. The difference between value in exchange versus value in use. A lot of people when they think about monetization, how do I monetize my data, are thinking about value in exchange. What is my data worth to somebody else? Well, most people's data isn't worth anything to anybody else. And the way that you can really drive value is not data in exchange or value in exchange, but it's value in use. How am I using that data to make better decisions regarding customer acquisition and customer retention and predictive maintenance and quality of care and all the other oodles of decisions organizations are making? The evaluation of that data comes from putting it into use to make better decisions. If I know then what decision I'm trying to make, now I have a process not only in deciding what data's most valuable but, you said earlier, what data is not important but may have liability issues with it, right? Do I keep a data set around that might be valuable but if it falls into the wrong hands through cyber security sort of things, do I actually open myself up to all kinds of liabilities? And so, organizations are rushing from this EVD conversation, not only from a data evaluation perspective but also from a risk perspective. Cause you've got to balance those two aspects. >> But, this is not a pure-- This is not really doing an accounting in a traditional accounting sense. We're not doing double entry book keeping with data. What we're really talking about is understand how your business used its data. Number one today, understand how you think you want your business to be able to use data to become a more digital corporation and understand how you go from point "a" to point "b". >> Correct, yes. And, in fact, the underlying premise behind driving economic value of data, you know people say data is the new oil. Well, that's a BS statement because it really misses the point. The point is, imagine if you had a barrel of oil; a single barrel of oil that can be used across an infinite number of vehicles and it never depleted. That's what data is, right? >> Explain that. You're right but explain it. >> So, what it means is that data-- You can use data across an endless number of use cases. If you go out and get-- >> Peter: At the same time. >> At the same time. You pay for it once, you put it in the data lake once, and then I can use it for customer acquisition and retention and upsell and cross-sell and fraud and all these other use cases, right? So, it never wears out. It never depletes. So, I can use it. And what organizations struggle with, if you look at data from an accounting perspective, accounting tends to value assets based on what you paid for it. >> Peter: And how you can apply them uniquely to a particular activity. A machine can be applied to this activity and it's either that activity or that activity. A building can be applied to that activity or that activity. A person's time to that activity or that activity. >> It has a transactional limitation. >> Peter: Exactly, it's an oar. >> Yeah, so what happens now is instead of looking at it from an accounting perspective, let's look at it from an economics and a data science perspective. That is, what can I do with the data? What can I do as far as using the data to predict what's likely to happen? To prescribe actions and to uncover new monetization opportunities. So, the entire approach of looking at it from an accounting perspective, we just completed that research at the University of San Francisco. Where we looked at, how do you determine economic value of data? And we realized that using an accounting approach grossly undervalued the data's worth. So, instead of using an accounting, we started with an economics perspective. The multiplier effect, marginal perpetuity to consume, all that kind of stuff that we all forgot about once we got out of college really applies here because now I can use that same data over and over again. And if I apply data science to it to really try to predict, prescribe, and monetize; all of a sudden economic value of your data just explodes. >> Precisely because of your connecting a source of data, which has a particular utilization, to another source of data that has a particular utilization and you can combine them, create new utilizations that might in and of itself be even more valuable than either of the original cases. >> They genetically mutate. >> That's exactly right. So, think about-- I think it's right. So, congratulations, we agree. Thank you very much. >> Which is rare. >> So, now let's talk about this notion of as we move forward with data value, how does an organization have to start translating some of these new ways of thinking about the value of data into investments in data so that you have the data where you want it, when you want it, and in the form that you need it. >> That's the heart of why you do this, right? If I know what the value of my data is, then I can make decisions regarding what data am I going to try to protect, enhance? What data am I going to get rid of and put on cold storage, for example? And so we came up with a methodology for how we tie the value of data back to use cases. Everything we do is use case based so if you're trying to increase same-store sales at a Chipotle, one of my favorite places; if you're trying to increase it by 7.1 percent, that's worth about 191 million dollars. And the use cases that support that like increasing local even marketing or increasing new product introduction effectiveness, increasing customer cross-sale or upsell. If you start breaking those use cases down, you can start tying financial value to those use cases. And if I know what data sets, what three, five, seven data sets are required to help solve that problem, I now have a basis against which I can start attaching value to data. And as I look across at a number of use cases, now the valued data starts to increment. It grows exponentially; not exponentially but it does increment, right? And it gets more and more-- >> It's non-linear, it's super linear. >> Yeah, and what's also interesting-- >> Increasing returns. >> From an ROI perspective, what you're going to find that as you go down these use cases, the financial value of that use case may not be really high. But, when the denominator of your ROI calculation starts approaching zero because I'm reusing data at zero cost, I can reuse data at zero cost. When the denominator starts going to zero ya know what happens to your ROI? In infinity, it explodes. >> Last question, Bill. You mentioned The University of San Francisco and you've been there a while teaching business students how to embrace analytics. One of the things that was talked about this morning in the keynote was Hortonworks dedication to the open-source community from the beginning. And they kind of talked about there, with kids in college these days, they have access to this open-source software that's free. I'd just love to get, kind of the last word, your take on what are you seeing in university life today where these business students are understanding more about analytics? Do you see them as kind of, helping to build the next generation of data scientists since that's really kind of the next leg of the digital transformation? >> So, the premise we have in our class is we probably can't turn business people into data scientists. In fact, we don't think that's valuable. What we want to do is teach them how to think like a data scientist. What happens, if we can get the business stakeholders to understand what's possible with data and analytics and then you couple them with a data scientist that knows how to do it, we see exponential impact. We just did a client project around customer attrition. The industry benchmark in customer attrition is it was published, I won't name the company, but they had a 24 percent identification rate. We had a 59 percent. We two X'd the number. Not because our data scientists are smarter or our tools are smarter but because our approach was to leverage and teach the business people how to think like a data scientist and they were able to identify variables and metrics they want to test. And when our data scientists tested them they said, "Oh my gosh, that's a very highly predicted variable." >> And trust what they said. >> And trust what they said, right. So, how do you build trust? On the data science side, you fail. You test, you fail, you test, you fail, you're never going to understand 100 percent accuracy. But have you failed enough times that you feel comfortable and confident that the model is good enough? >> Well, what a great spirit of innovation that you're helping to bring there. Your keynote, we should mention, is tomorrow. >> That's right. >> So, you can, if you're watching the livestream or you're in person, you can see Bill's keynote. Bill Shmarzo, CTO of Dell AMC, thank you for joining Peter and I. Great to have you on the show. A show where you can talk about the Warriors and Chipotle in one show. I've never seen it done, this is groundbreaking. Fantastic. >> Psycho donuts too. >> And psycho donuts and now I'm hungry. (laughter) Thank you for watching this segment. Again, we are live on day one of the DataWorks Summit in San Francisco for Bill Shmarzo and Peter Burris, my co-host. I am Lisa Martin. Stick around, we will be right back. (music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by: Hortonworks. in the heart of Silicon Valley. I don't even remember. Bill, it's great to have you back on The Cube. You make a basket, you have to climb It really slowed the game down a lot. and maximize that as the economic value of data's changing? All the things that are designed to drive value and the tooling associated with the data. True, but one of the things that changes Well, I don't know, what are you trying to do, right? at least the ones who we're talking to all the time, Or the data that's not going to be valuable tomorrow, So, what are you seeing in your clients And the way that you can really drive value is and understand how you go from point "a" to point "b". because it really misses the point. You're right but explain it. If you go out and get-- based on what you paid for it. Peter: And how you can apply them uniquely So, the entire approach of looking at it and you can combine them, create new utilizations Thank you very much. so that you have the data where you want it, That's the heart of why you do this, right? the financial value of that use case may not be really high. One of the things that was talked about this morning So, the premise we have in our class is we probably On the data science side, you fail. Well, what a great spirit of innovation Great to have you on the show. Thank you for watching this segment.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
Peter Burris	PERSON	0.99+
Peter	PERSON	0.99+
Bill Shmarzo	PERSON	0.99+
Michael Porter	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Chipotle	ORGANIZATION	0.99+
three	QUANTITY	0.99+
Tom Peters	PERSON	0.99+
Golden State Warriors	ORGANIZATION	0.99+
7.1 percent	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Adam Smith	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Bill	PERSON	0.99+
five	QUANTITY	0.99+
100 percent	QUANTITY	0.99+
59 percent	QUANTITY	0.99+
University of San Francisco	ORGANIZATION	0.99+
two aspects	QUANTITY	0.99+
24 percent	QUANTITY	0.99+
tomorrow	DATE	0.99+
Cole College	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
today	DATE	0.99+
1947	DATE	0.99+
zero	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
about 191 million dollars	QUANTITY	0.98+
one	QUANTITY	0.98+
Dell AMC	ORGANIZATION	0.98+
Cube	ORGANIZATION	0.98+
Dell EMC	ORGANIZATION	0.97+
first times	QUANTITY	0.97+
One	QUANTITY	0.97+
DataWorks Summit 2017	EVENT	0.97+
day one	QUANTITY	0.96+
one show	QUANTITY	0.96+
four M's	QUANTITY	0.92+
zero cost	QUANTITY	0.91+
Hortonworks	ORGANIZATION	0.91+
NBA Championship	EVENT	0.89+
CTO	PERSON	0.86+
single barrel	QUANTITY	0.83+
The Cube	ORGANIZATION	0.82+
once	QUANTITY	0.8+
two X	QUANTITY	0.75+
three V	QUANTITY	0.74+
seven data sets	QUANTITY	0.73+
Number one	QUANTITY	0.73+
this morning	DATE	0.67+
double entry	QUANTITY	0.65+
Kafka	ORGANIZATION	0.63+
Spark	ORGANIZATION	0.58+
Hortonworks	PERSON	0.55+
III	ORGANIZATION	0.46+
Division	OTHER	0.38+
Ioda	ORGANIZATION	0.35+
American	OTHER	0.28+

Joe Goldberg, BMC Software - DataWorks Summit 2017

>> Announcer: Live from San Jose in the heart of Silicon Valley, it's The Cube covering DataWorks Summit 2017. Brought to you by Horton works. >> Hi. Welcome back to The Cube. We are live at day one of the DataWorks Summit in San Jose, in the heart of Silicon Valley, hosted by Hortonworks. We've had a great day so far. Lots of innovation. Lots of great announcements. We're very excited to be joined by one of this week's keynotes and Cube alumni, Joe Goldberg, Innovation Evangelist at BMC Software. Welcome back to The Cube. >> Thank you very much. Always a pleasure to be here. >> Exactly and we're happy to have you back. So, talk to us, what's happening with BMC? What are you guys doing there? What are people going to learn in your keynote on Thursday? >> So BMC has been really working with all of our customers to modernize, not only our tool chain, but the way automation is used and deployed throughout the organization. We actually did a survey recently, The State of Automation. We got pretty much the kind of results we would've expected, but this let us really sort of make tangible what we have sort of always felt was, you know the state of this kind of approach to how critical automation is in the enterprise. We had a response from leaders and CXOs that 93% thought that automation was key to helping them make that digital transformation that everyone is involved in today. So, that's been one of the key elements that has really kind of driven everything that we've been doing with BMC today. >> Now, BMC's known especially for handling workflows that operate more than a batch work >> Joe Goldberg: Yes So high certainty, very much predictability in terms of when things going to happen, how long's it going to take, what action's going to take place. Very, very complex types of processing takes place. I'm always fascinated and I've talked to other customers that are wondering about this when you come back to the State of Automation that we want to move, everybody wants to move to interactive. >> Joe Goldberg: Yes. >> But often the jump to interactive takes place well in advance of predictability of how the data's actually being constructed and put together and aggregated in the back end. Talk a little bit about the priorities. How does one...? Cause it's really not a chicken and egg kind of a problem. How does one anticipate excellence in the other? So what we've been hearing and actually I think of the previous Hortonworks or DataWorks Summit, we had one of our customers talk about their approach to what was a fundamental data architecture for them, which was the separation between the speed and batch layer. And I think you hear an awful lot of that kind of conversation. And they run in parallel and from our perspective managing the batch layer really underpins the kind of real actionable insides that you can extract from the speed layer, which is focusing on capturing that very small percentage of what is really the signal in the data, but then being able to take that and enrich it with what you've been collecting and managing using the batch layer. I think that that's the kind of approach that we've seen from a lot of customers, where certainly all of the cool stuff and the focus is on the interactive and the realtime and streaming. But in order to really be able to be predictive, because you know there's no magic, we still don't know how to tell the future. The only to be able to do that is by making sure that you are basing yourself on history that is well, sort of collected, curated, make sure that you have actually captured it, that you've enriched it from a variety of different sources. And that's where we come in. What we have been focusing on is providing a set of facilities for managing batch that is... I talk about hyper heterogeneity, I know that's a mouthful, but that's really what the new enterprise environment is like. So you add or you know, a layer on top of your conventional applications and your conventional data, all of this new data formats and data does now arriving in real time in high volume. I think that taking that kind of an approach is really the only way that you can ensure that you are capturing all of your... Ingesting all of the data that's coming in from all of your endpoints, including you know, IOT applications and really being able to combine it with all of the corporate sort of knowledge that you've accumulated through your traditional sources. >> So, batches historically meant, again a lot of precise code, it had to be written to handle complex jobs and it scared off a lot of folks into thinking about interactive. In the last 10 years, there's been some pretty significant advances in how we think about putting together batch workflows, become much more programmable. How does control (mumbles) and some of the other tool set that BNC provides, How does it fit into? How does it look more like the types of application development, tasks and methods that are becoming increasingly popular, as you think about delivering the outcomes of big data processing to other applications or to other segments? >> So, you know that's very, that's a great question. Its almost like, thanks for the set up. So, you can see. >> Well let's not ask it then. (laughs) >> You can see the shirt that I'm wearing and of course this is very intentional, but our history has been that we've come from the data center, operations focus. And the transition in the marketplace today has been that really the focus has shifted, whether you talk about shift left or everything as code, where the new methods of building and delivering applications really look at everything manual that is done, coding to create an application that's done upfront. And then the rigger for enterprise operations is built in through this automated delivery pipeline. And so, obviously you have to invert this kind of approach that we've had in terms of layering management tools on at the very end and instead you have to be able to inject them into your application early. So, we feel that certainly it's true for all applications and it's I think doubly true in data applications, that the automation and the operational instrumentation is an equal partner to your business logic under the code that you write and so it needs to be created right upfront and then moved together with all of the rest of your application components through that delivery pipeline in a CIDC fashion. And so that is what we have done. And again that what the concept is of Jawless. >> So, as you think about what the next step is, is batch going to, presumably batch will be sustained as mode of operation. How is it going to become even more comfortable to a lot of the development methodologys as we move forward? How do you think it's going to be evolved as a tool for increasing the amount of predictability in that back end? >> So, I think that the key to continuing to evolve this Jawless code approach is to enable developers to be able to build and work with that operational plumbing in the same way they work with their business logic. >> Or any other resource? >> Exactly. So, you know, you think about what are the tools that developers have today when they build, whether you're writing in Java or C or R or Scala, there are development environments, there are these tools that let you test that let you step through your logic to be able to identify and find any flaws, you know sort of bugs in your code. And in order for jawless code to really meet the test of being code, we are working on providing the same kind of capabilities to work with our objects that developers expect to have for programming languages. >> So Joe, I'm not going to shift us back last question here. Kind of looking at more of a business industry level, to do big data write, to bring Hadoop to an enterprise successfully, what are some of the mission critical elements that c-suite really needs to embrace in order to be successful across big industries, like healthcare, financial services, Telco? >> So, I think they have to be able to apply the same requirements and the test for how a big data application moves into their enterprise in terms of, not only how it's operated, but how is it made accessible to all of the constituents that need to use it. One of the key elements we hear frequently is that, and I think it's a danger that when technicians solely create what is the end deliverable tool, it frequently is very technical and it has to be consumable by the people that actually need to use it. And so you have to strike this balance between providing sufficient technical sophistication and business usability and I think that that's kind of a goal for being successful in implementing any kind of technology and certainly big data. >> Excellent. Well, Joe Goldberg, thank you so much for coming back to the Cube and joining my cohost, Peter Burris and I for this great chat. And people can watch your keynote on Thursday. >> Yes. >> This week, on the 15th of June. So again for my cohost Peter Burris. I am Lisa Martin. Thanks so much for watching the Cube live, again at day one of the DataWorks Summit. Stick around. We'll be right back. (upbeat music)

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Horton works. in San Jose, in the heart of Silicon Valley, Always a pleasure to be here. What are people going to learn in your keynote on Thursday? We got pretty much the kind of results we would've expected, that are wondering about this when you come back is really the only way that you can ensure the outcomes of big data processing to other applications So, you know that's very, that's a great question. Well let's not ask it then. and so it needs to be created right upfront How is it going to become even more comfortable So, I think that the key to continuing to evolve that let you step through your logic that c-suite really needs to embrace and it has to be consumable by the people for coming back to the Cube again at day one of the DataWorks Summit.

ENTITIES

Entity	Category	Confidence
Joe Goldberg	PERSON	0.99+
Lisa Martin	PERSON	0.99+
BMC	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
San Jose	LOCATION	0.99+
Thursday	DATE	0.99+
Joe	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
93%	QUANTITY	0.99+
BMC Software	ORGANIZATION	0.99+
Telco	ORGANIZATION	0.99+
15th of June	DATE	0.99+
Scala	TITLE	0.99+
Java	TITLE	0.99+
One	QUANTITY	0.99+
C	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
The Cube	ORGANIZATION	0.97+
today	DATE	0.96+
DataWorks Summit 2017	EVENT	0.95+
Horton	PERSON	0.94+
Jawless	TITLE	0.94+
Hortonworks	ORGANIZATION	0.94+
this week	DATE	0.94+
Cube	ORGANIZATION	0.91+
R	TITLE	0.9+
last 10 years	DATE	0.87+
day one	QUANTITY	0.86+
BNC	ORGANIZATION	0.82+
Hortonworks	EVENT	0.8+
The State of	TITLE	0.67+
elements	QUANTITY	0.58+
Software	EVENT	0.54+
Cube	PERSON	0.43+

Linton Ward, IBM & Asad Mahmood, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE! Covering Data Works Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. I'm Lisa Martin with my co-host George Gilbert. We are live on day one of the Data Works Summit in San Jose in the heart of Silicon Valley. Great buzz in the event, I'm sure you can see and hear behind us. We're very excited to be joined by a couple of fellows from IBM. A very longstanding Hortonworks partner that announced a phenomenal suite of four new levels of that partnership today. Please welcome Asad Mahmood, Analytics Cloud Solutions Specialist at IBM, and medical doctor, and Linton Ward, Distinguished Engineer, Power Systems OpenPOWER Solutions from IBM. Welcome guys, great to have you both on the queue for the first time. So, Linton, software has been changing, companies, enterprises all around are really looking for more open solutions, really moving away from proprietary. Talk to us about the OpenPOWER Foundation before we get into the announcements today, what was the genesis of that? >> Okay sure, we recognized the need for innovation beyond a single chip, to build out an ecosystem, an innovation collaboration with our system partners. So, ranging from Google to Mellanox for networking, to Hortonworks for software, we believe that system-level optimization and innovation is what's going to bring the price performance advantage in the future. That traditional seamless scaling doesn't really bring us there by itself but that partnership does. >> So, from today's announcements, a number of announcements that Hortonworks is adopting IBM's data science platforms, so really the theme this morning of the keynote was data science, right, it's the next leg in really transforming an enterprise to be very much data driven and digitalized. We also saw the announcement about Atlas for data governance, what does that mean from your perspective on the engineering side? >> Very exciting you know, in terms of building out solutions of hardware and software the ability to really harden the Hortonworks data platform with servers, and storage and networking I think is going to bring simplification to on-premises, like people are seeing with the Cloud, I think the ability to create the analyst workbench, or the cognitive workbench, using the data science experience to create a pipeline of data flow and analytic flow, I think it's going to be very strong for innovation. Around that, most notable for me is the fact that they're all built on open technologies leveraging communities that universities can pick up, contribute to, I think we're going to see the pace of innovation really pick up. >> And on that front, on pace of innovation, you talked about universities, one of the things I thought was really a great highlight in the customer panel this morning that Raj Verma hosted was you had health care, insurance companies, financial services, there was Duke Energy there, and they all talked about one of the great benefits of open source is that kids in universities have access to the software for free. So from a talent attraction perspective, they're really kind of fostering that next generation who will be able to take this to the next level, which I think is a really important point as we look at data science being kind of the next big driver or transformer and also going, you know, there's not a lot of really skilled data scientists, how can that change over time? And this is is one, the open source community that Hortonworks has been very dedicated to since the beginning, it's a great it's really a great outcome of that. >> Definitely, I think the ability to take the risk out of a new analytical project is one benefit, and the other benefit is there's a tremendous, not just from young people, a tremendous amount of interest among programmers, developers of all types, to create data science skills, data engineering and data science skills. >> If we leave aside the skills for a moment and focus on the, sort of, the operationalization of the models once they're built, how should we think about a trained model, or, I should break it into two pieces. How should we think about training the models, where the data comes from and who does it? And then, the orchestration and deployment of them, Cloud, Edge Gateway, Edge device, that sort of thing. >> I think it all comes down to exactly what your use case is. You have to identify what use case you're trying to tackle, whether that's applicable to clinical medicine, whether that's applicable to finance, to banking, to retail or transportation, first you have to have that use case in mind, then you can go about training that model, developing that model, and for that you need to have a good, potent, robust data set to allow you to carry out that analysis and whether you want to do exploratory analysis or you want to do predictive analysis, that needs to be very well defined in your training stage. Once you have that model developed, then we have certain services, such as Watson Machine Learning, within data science experience that will allow you to take that model that you just developed, just moments ago, and just deploy that as a restful API that you can then embed into an application and to your solution, and in that solution you can basically use across industry. >> Are there some use cases where you have almost like a tiering of models where, you know, there're some that are right at the edge like, you know, a big device like a car and then, you know, there's sort of the fog level which is the, say, cell towers or other buildings nearby and then there's something in the Cloud that's sort of like, master model or an ensemble of models, I don't assume that's like, Evel Knievel would say you know, "Don't try that at home," but sort-of, is the tooling being built to enable that? >> So the tooling is already in existence right now. You can actually go ahead right now and be able to build out prototypes, even full-level, full-range applications right on the Cloud, and you can do that, you can do that thanks to Data Science Experience, you can do that thanks to IBM Bluemix, you can go ahead and do that type of analysis right there and not only that, you can allow that analysis to actually guide you along the path from building a model to building a full-range application and this is all happening on the Cloud level. We can talk more about it happening on on-premise level but on the Cloud level specifically, you can have those applications built on the fly, on the Cloud and have them deployed for web apps, for moblie apps, et cetera. >> One of the things that you talked about is use cases in certain verticals, IBM has been very strong and vertically focused for a very long time, but you kind of almost answered the question that I'd like to maybe explore a little bit more about building these models, training the models, in say, health care or telco and being able to deploy them, where's the horizontal benefits there that IBM would be able to deliver faster to other industries? >> Definitely, I think the main thing is that IBM, first of all, gives you that opportunity, that platform to say that hey, you have a data set, you have a use case, let's give you the tooling, let's give you the methodology to take you from data, to a model, to ultimately that full range application and specifically, I've built some applications specific to federal health care, specifically to address clinical medicine and behavioral medicine and that's allowed me to actually use IBM tools and some open source technologies as well to actually go out and build these applications on the fly as a prototype to show, not only the realm, the art of the possible when it comes to these technologies, but also to solve problems, because ultimately, that's what we're trying to accomplish here. We're trying to find real-world solutions to real-world problems. >> Linton, let me re-direct something towards you about, a lot of people are talking about how Moore's law slowing down or even ending, well at least in terms of speed of processors, but if you look at the, not just the CPU but FPGA or Asic or the tensor processing unit, which, I assume is an Asic, and you have the high speed interconnects, if we don't look at just, you know what can you fit on one chip, but you look at, you know 3D what's the density of transistors in a rack or in a data center, is that still growing as fast or faster, and what does it mean for the types of models that we can build? >> That's a great question. One of the key things that we did with the OpenPOWER Foundation, is to open up the interfaces to the chip, so with NVIDIA we have NVLink, which gives us a substantial increase in bandwidth, we have created something called OpenCAPI, which is a coherent protocol, to get to other types of accelerators, so we believe that hybrid computing in that form, you saw NVIDIDA on-stage this morning, and we believe especially for deploring the acceleration provided for GPUs is going to continue to drive substantial growth, it's a very exciting time. >> Would it be fair to say that we're on the same curve, if we look at it, not from the point of view of, you know what can we fit on a little square, but if we look at what can we fit in a data center or the power available to model things, you know Jeff Dean at Google said, "If Android users "talk into their phones for two to three minutes a day, "we need two to three times the data centers we have." Can we grow that price performance faster and enable sort of things that we did not expect? >> I think the innovation that you're describing will, in fact, put pressure on data centers. The ability to collect data from autonomous vehicles or other N points is really going up. So, we're okay for the near-term but at some point we will have to start looking at other technologies to continue that growth. Right now we're in the throws of what I call fast data versus slow data, so keeping the slow data cheaply and getting the fast data closer to the compute is a very big deal for us, so NAND flash and other non-volatile technologies for the fast data are where the innovation is happening right now, but you're right, over time we will continue to collect more and more data and it will put pressure on the overall technologies. >> Last question as we get ready to wrap here, Asad, your background is fascinating to me. Having a medical degree and working in federal healthcare for IBM, you talked about some of the clinical work that you're doing and the models that you're helping to build. What are some of the mission critical needs that you're seeing in health care today that are really kind of driving, not just health care organizations to do big data right, but to do data science right? >> Exactly, so I think one of the biggest questions that we get and one of the biggest needs that we get from the healthcare arena is patient-centric solutions. There are a lot of solutions that are hoping to address problems that are being faced by physicians on a day-to-day level, but there are not enough applications that are addressing the concerns that are the pain points that patients are facing on a daily basis. So the applications that I've started building out at IBM are all patient-centric applications that basically put the level of their data, their symptoms, their diagnosis, in their hands alone and allows them to actually find out more or less what's going wrong with my body at any particular time during the day and then find the right healthcare professional or the right doctor that is best suited to treating that condition, treating that diagnosis. So I think that's the big thing that we've seen from the healthcare market right now. The big need that we have, that we're currently addressing with our Cloud analytics technology which is just becoming more and more advanced and sophisticated and is trending towards some of the other health trends or technology trends that we have currently right now on the market, including the Blockchain, which is tending towards more of a de-centralized focus on these applications. So it's actually they're putting more of the data in the hands of the consumer, of the hands of the patient, and even in the hands of the doctor. >> Wow, fantastic. Well you guys, thank you so much for joining us on theCUBE. Congratulations on your first time being on the show, Asad Mahmood and Linton Ward from IBM, we appreciate your time. >> Thank you very much. >> Thank you. >> And for my co-host George Gilbert, I'm Lisa Martin, you're watching theCUBE live on day one of the Data Works Summit from Silicon Valley but stick around, we've got great guests coming up so we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. Welcome guys, great to have you both to build out an ecosystem, an innovation collaboration to be very much data driven and digitalized. the ability to really harden the Hortonworks data platform and also going, you know, there's not a lot is one benefit, and the other benefit is of the models once they're built, and for that you need to have a good, potent, to actually guide you along the path that platform to say that hey, you have a data set, the acceleration provided for GPUs is going to continue or the power available to model things, you know and getting the fast data closer to the compute for IBM, you talked about some of the clinical work There are a lot of solutions that are hoping to address Well you guys, thank you so much for joining us on theCUBE. on day one of the Data Works Summit from Silicon Valley

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Dean	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Asad Mahmood	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Asad	PERSON	0.99+
Mellanox	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Evel Knievel	PERSON	0.99+
OpenPOWER Foundation	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
Linton	PERSON	0.99+
Linton Ward	PERSON	0.99+
three times	QUANTITY	0.99+
Data Works Summit	EVENT	0.99+
one	QUANTITY	0.98+
first time	QUANTITY	0.98+
today	DATE	0.98+
one chip	QUANTITY	0.98+
one benefit	QUANTITY	0.97+
One	QUANTITY	0.96+
Android	TITLE	0.96+
three minutes a day	QUANTITY	0.95+
both	QUANTITY	0.94+
day one	QUANTITY	0.94+
Moore	PERSON	0.93+
this morning	DATE	0.92+
OpenCAPI	TITLE	0.91+
first	QUANTITY	0.9+
single chip	QUANTITY	0.89+
Data Works Summit 2017	EVENT	0.88+
telco	ORGANIZATION	0.88+
DataWorks Summit 2017	EVENT	0.85+
NVLink	COMMERCIAL_ITEM	0.79+
NVIDIDA	TITLE	0.76+
IBM Bluemix	ORGANIZATION	0.75+
Watson Machine Learning	TITLE	0.75+
Power Systems OpenPOWER Solutions	ORGANIZATION	0.74+
Edge	TITLE	0.67+
Edge Gateway	TITLE	0.62+
couple	QUANTITY	0.6+
Covering	EVENT	0.6+
Narrator	TITLE	0.56+
Atlas	TITLE	0.52+
Linton	ORGANIZATION	0.51+
Ward	PERSON	0.47+
3D	QUANTITY	0.36+

Scott Gnau, Hortonworks - DataWorks Summit 2017

>> Announcer: Live, from San Jose, in the heart of Silicon Valley, it's The Cube, covering DataWorks Summit 2017. Brought to you by Hortonworks. >> Welcome back to The Cube. We are live at DataWorks Summit 2017. I'm Lisa Martin with my cohost, George Gilbert. We've just come from this energetic, laser light show infused keynote, and we're very excited to be joined by one of the keynotes today, the CTO of Hortonworks, Scott Gnau. Scott, welcome back to The Cube. >> Great to be here, thanks for having me. >> Great to have you back here. One of the things that you talked about in your keynote today was collaboration. You talked about the modern data architecture and one of the things that I thought was really interesting is that now where Horton Works is, you are empowering cross-functional teams, operations managers, business analysts, data scientists, really helping enterprises drive the next generation of value creation. Tell us a little bit about that. >> Right, great. Thanks for noticing, by the way. I think the next, the important thing, kind of as a natural evolution for us as a company and as a community is, and I've seen this time and again in the tech industry, we've kind of moved from really cool breakthrough tech, more into a solutions base. So I think this whole notion is really about how we're making that natural transition. And when you think about all the cool technology and all the breakthrough algorithms and all that, that's really great, but how do we then take that and turn it to value really quickly and in a repeatable fashion. So, the notion that I launched today is really making these three personas really successful. If you can focus, combining all of the technology, usability and even some services around it, to make each of those folks more successful in their job. So I've broken it down really into three categories. We know the traditional business analyst, right? They've Sequel and they've been doing predictive modeling of structured data for a very long time, and there's a lot of value generated from that. Making the business analyst successful Hadoop inspired world is extremely valuable. And why is that? Well, it's because Hadoop actually now brings a lot more breadth of data and frankly a lot more depth of data than they've ever had access to before. But being able to communicate with that business analyst in a language they understand, Sequel, being able to make all those tools work seamlessly, is the next extension of success for the business analyst. We spent a lot of time this morning talking about data scientists, the next great frontier where you bring together lots and lots and lots and lots of data, for instance, Skin and Math and Heavy Compute, with the data scientists and really enable them to go build out that next generation of high definition kind of analytics, all right, and we're all, certainly I am, captured by the notion of self-driving cars, and you think about a self-driving car, and the success of that is purely based on the successful data science. In those cameras and those machines being able to infer images more accurately than a human being, and then make decisions about what those images mean. That's all data science, and it's all about raw processing power and lots and lots and lots of data to make those models train and more accurate than what would otherwise happen. So enabling the data scientist to be successful, obviously, that's a use case. You know, certainly voice activated, voice response kinds of systems, for better customer service; better fraud detection, you know, the cost of a false positive is a hundred times the cost of missing a fraudulent behavior, right? That's because you've irritated a really good customer. So being able to really train those models in high definition is extremely valuable. So bringing together the data, but the tool set so that data scientists can actually act as a team and collaborate and spend less of their time finding the data, and more of their time providing the models. And I said this morning, last but not least, the operations manager. This is really, really, really important. And a lot of times, especially geeks like myself, are just, ah, operations guys are just a pain in the neck. Really, really, really important. We've got data that we've never thought of. Making sure that it's secured properly, making sure that we're managing within the regulations of privacy requirements, making sure that we're governing it and making sure how that data is used, alongside our corporate mission is really important. So creating that tool set so that the operations manager can be confident in turning these massive files of data to the business analyst and to the data scientist and be confident that the company's mission, the regulation that they're working within in those jurisdictions are all in compliance. And so that's what we're building on, and that stack, of course, is built on open source Apache Atlas and open source Apache Ranger and it really makes for an enterprise grade experience. >> And a couple things to follow on to that, we've heard of this notion for years, that there is a shortage of data scientists, and now, it's such a core strategic enabler of business transformation. Is this collaboration, this team support that was talked about earlier, is this helping to spread data science across these personas to enable more of the to be data scientists? >> Yeah, I think there are two aspects to it, right? One is certainly really great data scientists are hard to find; they're scarce. They're unique creatures. And so, to the extent that we're able to combine the tool set to make the data scientists that we have more productive, and I think the numbers are astronomical, right? You could argue that, with the wrong tool set, a data scientist might spend 80% or 90% of his or her time just finding the data and only 10% working on the problem. If we can flip that around and make it 10% finding the data and 90%, that's like, in order of magnitude, more breadth of data science coverage that we get from the same pool of data scientists, so I think that from an efficiency perspective, that's really huge. The second thing, though, is that by looking at these personas and the tools that we're rolling out, can we start to package up things that the data scientists are learning and move those models into the business analysts desktop. So, now, not only is there more breadth and depth of data, but frankly, there's more depth and breadth of models that can be run, but inferred with traditional business process, which means, turning that into better decision making, turning that into better value for the business, just kind of happens automatically. So, you're leveraging the value of data scientists. >> Let me follow that up, Scott. So, if the, right now the biggest time sync for the data scientist or the data engineer is data cleansing and transformation. Where do the cloud vendors fit in in terms of having trained some very broad horizontal models in terms of vision, natural language understanding, text to speech, so where they have accumulated a lot of data assets, and then they created models that were trained and could be customized. Do you see a role for, not just mixed gen UI related models coming from the cloud vendors, but for other vendors who have data assets to provide more fully baked models so that you don't have to start from scratch? >> Absolutely. So, one of the things that I talked about also this morning is this notion, and I said it this morning, kind of opens where open community, open source, and open ecosystem, I think it's now open to the third power, right, and it's talking about open models and algorithms. And I think all of those same things are really creating a tremendous opportunity, the likes of which we've not seen before, and I think it's really driving the velocity in the market, right, so there's no, because we're collaborating in the open, things just get done faster and more efficiently, whether it be in the core open source stuff or whether it be in the open ecosystem, being able to pull tools in. Of course, the announcement earlier today, with IBMs Data Science Experience software as a framework for the data scientists to work as a team, but that thing in and of itself is also very open. You can plug in Python, you can plug in open source models and libraries, some of which were developed in the cloud and published externally. So, it's all about continued availability of open collaboration that is the hallmark of this wave of technology. >> Okay, so we have this issue of how much can we improve the productivity with better tools or with some amount of data. But then, the part that everyone's also point out, besides the cloud experience, is also the ability to operationalize the models and get them into production either in Bespoke apps or packaged apps. How's that going to sort of play out over time? >> Well, I think two things you'll see. One, certainly in the near term, again, with our collaboration with IBM and the Data Science Experience. One of the key things there is not only, not just making the data scientists be able to be more collaborative, but also the ease of which they can publish their models out into the wild. And so, kind of closing that loop to action is really important. I think, longer term, what you're going to see, and I gave a hint of this a little bit in my keynote this morning, is, I believe in five years, we'll be talking about scalability, but scalability won't be the way we think of it today, right? Oh, I have this many petabytes under management, or, petabytes. That's upkeep. But truly, scalability is going to be how many connected devices do you have interacting, and how many analytics can you actually push from model perspective, actually out to the center or out to the device to run locally. Why is that important? Think about it as a consumer with a mobile device. The time of interaction, your attention span, do you get an offer in the right time, and is that offer relevant. It can't be rules based, it has to be models based. There's no time for the electrons to move from your device across a power grid, run an analytic and have it come back. It's going to happen locally. So scalability, I believe, is going to be determined in terms of the CPU cycles and the total interconnected IOT network that you're working in. What does that mean from your original question? That means applications have to be portable, models have to be portable so that they can execute out to the edge where it's required. And so that's, obviously, part of the key technology that we're working with in Portworks Data Flow and the combination of Apache Nifi and Apache Caca and Storm to really combine that, "How do I manage, not only data in motion, but ultimately, how do I move applications and analytics to the data and not be required to move the data to the analytics?" >> So, question for you. You talked about real time offers, for example. We talk a lot about predicted analytics, advanced analytics, data wrangling. What are your thoughts on preemptive analytics? >> Well, I think that, while that sounds a little bit spooky, because we're kind of mind reading, I think those things can start to exist. Certainly because we now have access to all of the data and we have very sophisticated data science models that allow us to understand and predict behavior, yeah, the timing of real time analytics or real time offer delivery, could actually, from our human being perception, arrive before I thought about it. And isn't that really cool in a way. I'm thinking about, I need to go do X,Y,Z. Here's a relevant offer, boom. So it's no longer, I clicked here, I clicker here, I clicked here, and in five seconds I get a relevant offer, but before I even though to click, I got a relevant offer. And again, to the extent that it's relevant, it's not spooky. >> Right. >> If it's irrelevant, then you deal with all of the other downstream impact. So that, again, points to more and more and more data and more and more and more accurate and sophisticated models to make sure that that relevance exists. >> Exactly. Well, Scott Gnau, CTO of Hortonworks, thank you so much for stopping by The Cube once again. We appreciate your conversation and insights. And for George Gilbert, I am Lisa Martin. You're watching The Cube live, from day one of the DataWorks Summit in the heart of Silicon Valley. Stick around, though, we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

in the heart of Silicon Valley, it's The Cube, the CTO of Hortonworks, Scott Gnau. One of the things that you talked about So enabling the data scientist to be successful, And a couple things to follow on to that, and the tools that we're rolling out, for the data scientist or the data engineer as a framework for the data scientists to work as a team, is also the ability to operationalize the models not just making the data scientists be able to be You talked about real time offers, for example. And again, to the extent that it's relevant, So that, again, points to more and more and more data of the DataWorks Summit in the heart of Silicon Valley.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
San Jose	LOCATION	0.99+
10%	QUANTITY	0.99+
90%	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBMs	ORGANIZATION	0.99+
Python	TITLE	0.99+
two aspects	QUANTITY	0.99+
five seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
One	QUANTITY	0.99+
DataWorks Summit 2017	EVENT	0.98+
Horton Works	ORGANIZATION	0.98+
Hadoop	TITLE	0.98+
one	QUANTITY	0.98+
DataWorks Summit	EVENT	0.98+
today	DATE	0.98+
each	QUANTITY	0.98+
five years	QUANTITY	0.97+
third	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Apache Caca	ORGANIZATION	0.95+
three personas	QUANTITY	0.95+
this morning	DATE	0.95+
Apache Nifi	ORGANIZATION	0.95+
this morning	DATE	0.94+
three categories	QUANTITY	0.94+
CTO	PERSON	0.93+
The Cube	TITLE	0.9+
Sequel	PERSON	0.89+
Apache Ranger	ORGANIZATION	0.88+
two things	QUANTITY	0.86+
hundred times	QUANTITY	0.85+
Portworks	ORGANIZATION	0.82+
earlier today	DATE	0.8+
Data Science Experience	TITLE	0.79+
The Cube	ORGANIZATION	0.78+
Apache Atlas	ORGANIZATION	0.75+
Storm	ORGANIZATION	0.74+
day one	QUANTITY	0.74+
wave	EVENT	0.69+
one of the keynotes	QUANTITY	0.66+
lots	QUANTITY	0.63+
years	QUANTITY	0.53+
Hortonworks	EVENT	0.5+
lots of data	QUANTITY	0.49+
Sequel	ORGANIZATION	0.46+
Flow	ORGANIZATION	0.39+

Rob Bearden, Hortonworks & Rob Thomas, IBM Analytics - #DataWorks - #theCUBE

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2017, brought to you by Hortonworks. >> Hi, welcome to theCUBE. We are live in San Jose, in the heart of Silicon Valley at the DataWorks Summit, day one. I'm Lisa Martin, with my co-host, George Gilbert. And we're very excited to be talking to two Robs. With Rob squared on the program this morning. Rob Bearden, the CEO of Hortonworks. Welcome, Rob. >> Thank you for having us. >> And Rob Thomas, the VP, GM rather, of IBM Analytics. So, guys, we just came from this really exciting, high energy keynote. The laser show was fantastic, but one of the great things, Rob, that you kicked off with was really showing the journey that Hortonworks has been on, and in a really pretty short period of time. Tremendous inertia, and you talked about the four mega-trends that are really driving enterprises to modernize their data architecture. Cloud, IOT, streaming data, and the fourth, next leg of this is data science. Data science, you said, will be the transformational next leg in the journey. Tell our viewers a little bit more about that. What does that mean for Hortonworks and your partnership with IBM? >> Well, what I think what IBM and Hortonworks now have the ability to do is to bring all the data together across a connected data platform. The data in motion, the data at rest, now have in one common platform, irrespective of the deployment architecture, whether it's on prim across multiple data centers or whether deployed in the cloud. And now that the large volume of data and we have access to it, we can now start to begin to drive the analytics in the end as that data moves through each phase of its life cycle. And what really happens now, is now that we have visibility and access to the inclusive life cycle of the data we can now put a data science framework over that to really now understand and learn those patterns and what's the data telling us, what's the pattern behind that. And we can bring simplification to the data science and turn data science actually into a team sport. Allow them to collaborate, allow them to have access to it. And sort of take the black magic out of doing data science with the framework of the tool and the power of DSX on top of the connected data platform. Now we can advance rapidly the insights in the end of the data and what that really does is drive value really quickly back into the customer. And then we can then begin to bring smart applications via the data science back into the enterprise. So we can now do things like connected car in real time, and have connected car learn as it's moving and through all the patterns, we can now, from a retail standpoint really get smart and accurate about inventory placement and inventory management. From an industrial standpoint, we know in real time, down to the component, what's happening with the machine, and any failures that may happen and be able to eliminate downtime. Agriculture, same kind of... Healthcare, every industry, financial services, fraud detection, money laundering advances that we have but it's all going to be attributable to how machine learning is applied and the DSX platform is the best platform in the world to do that with. >> And one of the things that I thought was really interesting, was that, as we saw enterprises start to embrace Hadoop and Big Data and Segano this needs to co-exist and inter-operate with our traditional applications, our traditional technologies. Now you're saying and seeing data science is going to be strategic business differentiator. You mentioned a number of industries, and there were several of them on stage today. Give us some, maybe some, one of your favorite examples of one of your customers leveraging data science and driving a pretty significant advantage for their business. >> Sure. Yeah, well, to step back a little bit, just a little context, only ten companies have out performed the S&P 500 in each of the last five years. We start looking at what are they doing. Those are companies that have decided data science and machine learning is critical. They've made a big bet on it, and every company needs to be doing that. So a big part of our message today was, kind of, I'd say, open the eyes of everybody to say there is something happening in the market right now. And it can make a huge difference in how you're applying data analytics to improve your business. We announced our first focus on this back in February, and one of our clients that spoke at that event is a company called Argus Healthcare. And Argus has massive amounts of data, sitting on a mainframe, and they were looking for how can we unleash that to do better care of patients, better care for our hospital networks, and they did that with data they had in their mainframe. So they brought data science experience and machine learning to their mainframe, that's what they talked about. What Rob and I have announced today is there's another great trove of data in every organization which is the data inside Hadoop. HDP, leading distribution for that, is a great place to start. So the use case that I just shared, which is on the mainframe, that's going to apply anywhere where there's large amounts of data. And right now there's not a great answer for data science on Hadoop, until today, where data science experience plus HDP brings really, I'd say, an elegant approach to it. It makes it a team sport. You can collaborate, you can interact, you can get education right in the platform. So we have the opportunity to create a next generation of data scientists working with data and HDP. That's why we're excited. >> Let me follow up with this question in your intro that, in terms of sort of the data science experience as this next major building block, to extract, or to build on the value from the data lake, the two companies, your two companies have different sort of, better markets, especially at IBM, but the industry solutions and global business services, you guys can actually build semi-custom solutions around this platform, both the data and the data science experience. With Hortonworks, what are those, what's your go to market motion going to look like and what are the offerings going to look like to the customer? >> They'll be several. You just described a great example, with IBM professional services, they have the ability to take those industry templates and take these data science models and instantly be able to bring those to the data, and so as part of our joint go to market motion, we'll be able now partner, bring those templates, bring those models to not only our customer base, but also part of the new sales go to market motion in the light space, in new customer opportunities and the whole point is, now we can use the enterprise data platforms to bring the data under management in a mission critical way that then bring value to it through these kinds of use case and templates that drive the smart applications into quick time to value. And just increase that time to value for the customers. >> So, how would you look at the mix changing over time in terms of data scientists working with the data to experiment on the model development and the two hard parts that you talked about, data prep and operationalization. So in other words, custom models, the issue of deploying it 11 months later because there's no real process for that that's packaged, and then packaged enterprise apps that are going to bake these models in as part of their functionality that, you know, the way Salesforce is starting to do and Workday is starting to do. How does that change over time? >> It'll be a layering effect. So today, we now have the ability to bring through the connected data platforms all the data under management in a mission critical manner from point of origination through the entire stream till it comes at rest. Now with the data science, through DSX, we can now, then, have that data science framework to where, you know, the analogy I would say, is instead of it being a black science of how you do data access and go through and build the models and determine what the algorithms are and how that yields a result, the analogy is you don't have to be a mechanic to drive a car anymore. The common person can drive a car. So, now we really open up the community business analyst that can now participate and enable data science through collaboration and then we can take those models and build the smart apps and evolve the smart apps that go to that very rapidly and we can accelerate that process also now through the partnership with IBM and bringing their core domain and value that, drivers that they've already built and drop that into the DSX environments and so I think we can accelerate the time to value now much faster and efficient than we've ever been able to do before. >> You mentioned teamwork a number of times, and I'm curious about, you also talked about the business analyst, what's the governance like to facilitate business analysts and different lines of business that have particular access? And what is that team composed of? >> Yeah, well, so let's look at what's happening in the big enterprises in the world right now. There's two major things going one. One is everybody's recognizing this is a multi-cloud world. There's multiple public cloud options, most clients are building a private cloud. They need a way to manage data as a strategic asset across all those multiple cloud environments. The second piece is, we are moving towards, what I would call, the next generation data fabric, which is your warehousing capabilities, your database capabilities, married with Hadoop, married with other open source data repositories and doing that in a seamless fashion. So you need a governance strategy for all of that. And the way I describe governance, simple analogy, we do for data what libraries do for books. Libraries create a catalog of books, they know they have different copies of books, some they archive, but they can access all of the intelligence in the library. That's what we do for data. So when we talk about governance and working together, we're both big supporters of the Atlas project, that will continue, but the other piece, kind of this point around enterprise data fabric is what we're doing with Big SQL. Big SQL is the only 100% ANSI-SQL compliant SQL engine for data across Hadoop and other repositories. So we'll be working closely together to help enterprises evolve in a multi-cloud world to this enterprise data fabric and Big SQL's a big capability for that. >> And an immediate example of that is in our EDW optimization suite that we have today we be loading Big SQL as the platform to do the complex query sector of that. That will go to market with almost immediately. >> Follow up question on the governance, there's, to what extent is end to end governance, meaning from the point of origin through the last mile, you know, if the last mile might be some specialized analytic engine, versus having all the data management capabilities in that fabric, you mentioned operational and analytic, so, like, are customers going to be looking for a provider who can give them sort of end to end capabilities on both the governance side and on all the data management capabilities? Is that sort of a critical decision? >> I believe so. I think there's really two use cases for governance. It's either insights or it's compliance. And if you're focus is on compliance, something like GDPR, as an example, that's really about the life cycle of data from when it starts to when it can be disposed of. So for compliance use case, absolutely. When I say insights as a governance use case, that's really about self-service. The ideal world is you can make your data available to anybody in your organization, knowing that they have the right permissions, that they can access, that they can do it in a protected way and most companies don't have that advantage today. Part of the idea around data science on HDP is if you've got the right governance framework in place suddenly you can enable self-service which is any data scientist or any business analyst can go find and access the data they need. So it's a really key part of delivering on data science, is this governance piece. Now I just talked to clients, they understand where you're going. Is this about compliance or is this about insights? Because there's probably a different starting point, but the end game is similar. >> Curious about your target markets, Tyler talked about the go to market model a minute ago, are you targeting customers that are on mainframes? And you said, I think, in your keynote, 90% of transactional data is in a mainframe. Is that one of the targets, or is it the target, like you mention, Rob, with the EDW optimization solution, are you working with customers who have an existing enterprise data warehouse that needs to be modernized, is it both? >> The good news is it's both. It's about, really the opportunity and mission, is about enabling the next generation data architecture. And within that is again, back to the layering approach, is being able to bring the data under management from point of origination through point of it reg. Now if we look at it, you know, probably 90% of, at least transactional data, sits in the mainframe, so you have to be able to span all data sets and all deployment architectures on prim multi-data center as well as public cloud. And that then, is the opportunity, but for that to then drive value ultimately back, you've got to be able to have then the simplification of the data science framework and toolset to be able to then have the proper insights and basis on which you can bring the new smart applications. And drive the insights, drive the governance through the entire life cycle. >> On the value front, you know, we talk about, and Hortonworks talks about, the fact that this technology can really help a business unlock transformational value across their organization, across lines of business. This conversation, we just talked about a couple of the customer segments, is this a conversation that you're having at the C-suite initially? Where are the business leaders in terms of understanding? We know there's more value here, we probably can open up new business opportunities or are you talking more the data science level? >> Look, it's at different levels. So, data science, machined learning, that is a C-suite topic. A lot of times I'm not sure the audience knows what they're asking for, but they know it's important and they know they need to be doing something. When you go to things like a data architecture, the C-suite discussion there is, I just want to become more productive in how I'm deploying and using technology because my IT budget's probably not going up, if anything it may be going down, so I've got to become a lot more productive and efficient to do that. So it depends on who you're talking to, there's different levels of dialogue. But there's no question in my mind, I've seen, you know, just look at major press Financial Times, Wallstreet Journal last year. CEOs are talking about AI, machine learning, using data as a competitive weapon. It is happening and it's happening right now. What we're doing together, saying how do we make data simple and accessible? How do we make getting there really easy? Because right now it's pretty hard. But we think with the combination of what we're bringing, we make it pretty darn easy. >> So one quick question following up on that, and then I think we're getting close to the end. Which is when the data lakes started out, it was sort of, it seemed like, for many customers a mandate from on high, we need a big data strategy, and that translated into standing up a Hadoop cluster, and that resulted in people realizing that there's a lot to manage there. It sounds like, right now people know machine learning is hot so they need to get data science tools in place, but is there a business capability sort of like the ETL offload was for the initial Hadoop use cases, where you would go to a customer and recommend do this, bite this off as something concrete? >> I'll start and then Rob can comment. Look, the issue's not Hadoop, a lot of clients have started with it. The reason there hasn't been, in some cases, the outcomes they wanted is because just putting data into Hadoop doesn't drive an outcome. What drives an outcome is what do you do with it. How do you change your business process, how do you change what the company's doing with the data, and that's what this is about, it's kind of that next step in the evolution of Hadoop. And that's starting to happen now. It's not happening everywhere, but we think this will start to propel that discussion. Any thoughts you had, Rob? >> Spot on. Data lake was about releasing the constraints of all the silos and being able to bring those together and aggregate that data. And it was the first basis for being able to have a 360 degree or wholistic centralized insight about something and, or pattern, but what then data science does is it actually accelerates those patterns and those lessons learned and the ability to have a much more detailed and higher velocity insight that you can react to much faster, and actually accelerate the business models around this aggregate. So it's a foundational approach with Hadoop. And it's then, as I mentioned in the keynote, the data science platforms, machine learning, and AI actually is what is the thing that transformationally opens up and accelerates those insights, so then new models and patterns and applications get built to accelerate value. >> Well, speaking of transformation, thank you both so much for taking time to share your transformation and the big news and the announcements with Hortonworks and IBM this morning. Thank you Rob Bearden, CEO of Hortonworks, Rob Thomas, General Manager of IBM Analytics. I'm Lisa Martin with my co-host, George Gilbert. Stick around. We are live from day one at DataWorks Summit in the heart of Silicon Valley. We'll be right back. (tech music)

Published Date : Jun 13 2017

SUMMARY :

brought to you by Hortonworks. We are live in San Jose, in the heart of Silicon Valley and the fourth, next leg of this is data science. now have the ability to do And one of the things and every company needs to be doing that. and the data science experience. that drive the smart applications into quick time to value. and the two hard parts that you talked about, and drop that into the DSX environments and doing that in a seamless fashion. in our EDW optimization suite that we have today and most companies don't have that advantage today. Tyler talked about the go to market model a minute ago, but for that to then drive value ultimately back, On the value front, you know, we talk about, and they know they need to be doing something. that there's a lot to manage there. it's kind of that next step in the evolution of Hadoop. and the ability to have a much more detailed and the announcements with Hortonworks and IBM this morning.

ENTITIES

Entity	Category	Confidence
Lisa Martin	PERSON	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Bearden	PERSON	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Rob	PERSON	0.99+
Argus	ORGANIZATION	0.99+
90%	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
Tyler	PERSON	0.99+
February	DATE	0.99+
two companies	QUANTITY	0.99+
second piece	QUANTITY	0.99+
Argus Healthcare	ORGANIZATION	0.99+
last year	DATE	0.99+
360 degree	QUANTITY	0.99+
GDPR	TITLE	0.99+
one	QUANTITY	0.99+
Hadoop	TITLE	0.99+
One	QUANTITY	0.99+
both	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
ten companies	QUANTITY	0.99+
two	QUANTITY	0.99+
fourth	QUANTITY	0.99+
today	DATE	0.99+
two hard parts	QUANTITY	0.98+
DataWorks Summit 2017	EVENT	0.98+
11 months later	DATE	0.98+
each	QUANTITY	0.98+
two use cases	QUANTITY	0.97+
100%	QUANTITY	0.97+
one quick question	QUANTITY	0.97+
Segano	ORGANIZATION	0.97+
SQL	TITLE	0.96+
four mega-trends	QUANTITY	0.96+
Big SQL	TITLE	0.96+
first basis	QUANTITY	0.94+
one common platform	QUANTITY	0.94+
two major things	QUANTITY	0.92+
Robs	PERSON	0.92+
Wallstreet Journal	ORGANIZATION	0.92+
Financial Times	ORGANIZATION	0.92+

Mike Merritt-Holmes, Think Big - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Covering Data Works Summit Europe 2017 brought to you by Horton Works. (uptempo, energetic music) >> Okay, welcome back everyone. We're here live in Germany at Munich for DataWorks Summit 2017, formerly Hadoop Summit. I'm John Furrier, my co-host Dave Vellante. Our next guest is Mike Merritt-Holmes, is senior Vice President of Global Services Strategy at Think Big, a Teradata company, formerly the co-founder of the Big Data Partnership merged in with Think Big and Teradata. Mike, welcome to The Cube. >> Mike: Thanks for having me. >> Great having an entrepreneur on, you're the co-founder, which means you've got that entrepreneurial blood, and I got to ask you, you know, you're in the big data space, you got to be pretty pumped by all the hype right now around AI because that certainly gives a lot of that extra, extra steroid of recognition. People love AI it gives a face to it, and certainly IOT is booming as well, Internet of Things, but big data's cruising along. >> I mean it's a great place to be. The train is certainly going very, very quickly right now. But the thing for us is, we've been doing data science and AI and trying to build business outcomes, and value for businesses for a long time. It's just great now to see this really, the data science and AI both were really starting to take effect and so companies are starting to understand it and really starting to really want to embrace it which is amazing. >> It's inspirational too, I mean I have a bunch of kids in my family, some are in college and some are in high school, even the younger generation are getting jazzed up on just software, right, but the big data stuffs been cruising along now. It's been a good, decade now of really solid DevOps culture, cloud now accelerating, but now the customers are forcing the vendors to be very deliberate in delivering great product, because the demand (chuckling) for real time, the demand for more stuff, is at an all time high. Can you elaborate your thoughts on, your reaction to what customers are doing, because they're the ones driving everyone, not to create friction, to create simplicity. >> Yeah, and you know, our customers are global organizations, trying to leverage this kind of technology, and they are, you know, doing an awesome amount of stuff right now to try to move them from, effectively, a step change in their business, whether it's, kind of, shipping companies doing preventive asset maintenance, or whether it's retailers looking to target customers in a more personalized way, or really understand who their customers are, where they come from, they're leveraging all those technologies, and really what they're doing is pushing the boundaries of all of them, and putting more demands on all of the vendors in the space to say, we want to do this quicker, faster, but more easily as well. >> And then the things that you're talking about, I want to get your thoughts on, because this is the conversation that you're having with customers, I want to extract is, have those kind of data-driven mindset questions, have come out the hype of the Hadoob. So, I mean we've been on a hype cycle for awhile, but now its back to reality. Where are we with the customer conversations, and, from your stand point, what are they working on? I mean, is it mostly IT conversation? Is it a frontoffice conversation? Is it a blend of both? Because, you know, data science kind of threads both sides of the fence there. >> Yeah, I mean certainly you can't do big data without IT being involved, but since the start, I mean, we've always been engaged with the business, it's always been about business outcome, because you bring data into a platform, you provide all this data science capability, but unless you actually find ROI from that, then there's no point, because you want to be moving the business forward, so it's always been about business engagement, but part of that has always been also about helping them to change their mindset. I don't want a report, I want to understand why you look at that report and what's the thing you're looking for, so we can start to identify that for you quicker. >> What's the coolest conversation you've been in, over the past year? >> Uh, I mean, I can't go into too much details, but I've had some amazing conversations with companies like Lego, for instance, they're an awesome company to work with. But when you start to see some of the things we're doing, we're doing some amazing object recognition with deep-learning in Japan. We're doing some ford analytics in the Nordics with deep-learning, we're doing some amazing stuff that's really pushing the boundaries, and when you start to put those deep-learning aspects into real world applications, and you start to see, customers clambering over to want to be part of that, it's a really exciting place to be. >> Let me just double-click on that for a second, because a lot of, the question I get a lot on The Cube, and certainly off-camera is, I want to do deep-learning, I want to do AI, I love machine learning, I hear, oh, it's finally coming to reality so people see it forming. How do they get started, what are some of the best practices of getting involved in deep-learning? Is it using open-source, obviously, is one avenue, but what advice would you give customers? >> From a deep-learning perspective, so I think first of all, I mean, a lot of the greatest deep-learning technologies, run open-source, as you rightly said, but I think actually there's a lot of tutorials and stuff on there, but really what you need is someone who has done it before, who knows where the pitfalls are, but also know when to use the right technology at the right time, and also to know around some of the aspects about whether using a deep-learning methodology is going to be the right approach for your business problem. Because a lot of companies are, like, we want to use this deep-learning thing, its amazing, but actually its not appropriate, necessarily, for the use case you're trying to draw from. >> It's the classic holy grail, where is it, if you don't know what you're looking for, it's hard to know when to apply it. >> And also, you've got to have enough data to utilize those methods as well, so. >> You hear a lot about the technical complexity associated with Hadoop specifically, but just ol' big data generally. I wonder if you could address that, in terms of what you're seeing, how people are dealing with that technical complexity but what other headwinds are there, in terms of adopting these new capabilities. >> Yeah, absolutely, so one of the challenges that we still see is that customers are struggling to leverage value from their platform, and normally that's because of the technical complexities. So we really, we introduced to the open-source world last month Kaylo, something you can download free of charge. It's completely open-source on the Apache license, and that really was about making it easier for customers to start to leverage the data on the platform, to self-serve injection onto that, and for data scientists to wrangle the data better. So, I think there's a real push right now about that next level up, if you like, in the technology stack to start to enable non-technical users to start to do interesting things on the platform directly, rather than asking someone to do it for them. And that, you know, we've had technologies in the PI space like Tableau, and, obviously, the (mumbling) did a data-warehouse solutions on Teradata that have been giving customers something, before and previously, but actually now they're asking for more, not just that, but more as well. And that's where we are starting to see the increases. >> So that's sort of operationalizing analytics as an example, what are some of the business complexities and challenges of actually doing that? >> That's a very good question, because, I think, when you find out great insight, and you go, wow you've built this algorithm, I've seen things I've never seen before, then the business wants to have that always on they want to know that it's that insight all the time is it changing, is it going up, is it going down do I need to change my business decisions? And doing that and making that operational means, not only just deploying it but also monitoring those models, being able to keep them up to date regularly, understanding whether those things are still accurate or not, because you don't want to be making business decisions, on algorithms that are now a bit stale. So, actually operationalizing it, is about building out an entire capability that's keeping these things accurate, online, and, therefore, there's still a bit of work to do, I think, actually in the marketplace still, around building out an operational capability. >> So you kind of got bottom-up, top-down. Bottom-up is the you know the Hadoop experiments, and then top-down is CXO saying we need to do big data. Have those two constituencies come together now, who's driving the bus? Are they aligned or is it still, sort of, a mess organizationally? >> Yeah, I mean, generally, in the organization, there's someone playing the Chief Data Officer, whether they have that as a title or a roll, ultimately someone is in charge of generating value from the data they have in the organization. But they can't do that with IT, and I think where we've seen companies struggle is where they've driven it from the bottom-up, and where they succeed is where they drive it from the top-down, because by driving it from the top-down, you really align what you're doing with the business and strategy that you have. So, the company strategy, and what you're trying to achieve, but ultimately, they both need to meet in the middle, and you can't do one without the other. >> And one of our practitioner friends, who's describing this situation in our office in Palo Alto, a couple of weeks ago. he said, you know, the challenge we have as an organization is, you've got top people saying alright, we're moving. And they start moving, the train goes, and then you've got kind of middle management, sort of behind them, and then you got the doers that are far behind, and aligning those is a huge challenge for this particular organization. How do you recommend organizations to address that alignment challenge, does Think Big have capabilities to help them through that, or is that, sort of, you got to call Accenture? >> In essence, our reason for being is to help with those kind of things, and, you know, whether it's right from the start, so, oh, my God, my Chief Data Officer or my CEO is saying we need to be doing this thing right now, come on, let's get on with it, and we help them to understand what does that mean, what are the use cases, how, where's the value going to come from, what's that architecting to look like, or whether its helping them to build out capability, in terms of data science or building out the cluster itself, and then managing that and providing training for staff. Our whole reason for being is supporting that transformation as a business, from, oh, my God, what do I do about this thing, to, I'm fully embracing it, I know what's going on, I'm enabling my business, and I'm completely comfortable with that world. >> There was a lot talk three, or four or five years ago, about the ROI of so-called big data initiatives, not being really, you know, there were edge cases which were huge ROI, but there was a lot of talk about not a lot of return. My question is, has that, first question, has that changed, are you starting to see much bigger phone numbers coming back where the executives are saying yeah, lets double down on this. >> Definitely, I'm definitely seeing that. I mean, I think it's fair to say that companies are a bit nervous about reporting their ROI around this stuff, in some cases, so there's more ROI out there than you necessarily see out in the public place, but-- >> Why is that? Because they don't want to expose to the competition, or they don't want to front run their earnings, or whatever it is? >> They're trying to get a competitive edge. The minute you start saying, we're doing this, their competitors have an opportunity to catch up. >> John: Very secretive. >> Yeah and I think, it's not necessarily about what they're doing, it's about keeping the edge over their customers, really, over their competitors. So, but what we're seeing is that many customers are getting a lot of ROI more recently because they're able to execute better, rather than being struggling with the IT problems, and even just recently, for instance, we had a customer of ours, the CEO phones us up and says, you know what, we've got this problem with our sales. We don't really know why this is going down, you know, in this country, in this part of the world, it's going up, in this country, it's going down, we don't know why, and that's making us very nervous. Could you come in and just get the data together, work out why it's happening, so that we can understand what it is. And we came in, and within weeks, we were able to give them a very good insight into exactly why that is, and they changed their strategy, moving forward, for the next year, to focus on addressing that problem, and that's really amazing ROI for a company to be able to get that insight. Now, we're working with them to operationalize that, so that particular insight is always available to them, and that's an example of how companies are now starting to see that ROI come through, and a lot of it is about being able to articulate the right business question, rather than trying to worry about reports. What is the business question I'm trying to solve or answer, and that's when you can start to see the ROI come through. >> Can you talk about the customer orientation when they get to that insight, because you mentioned earlier that they got used to the reports, and you mentioned visualization, Tableau, they become table states, once you get addicted to the visualization, you want to extract more insights so the pressure seems to be getting more insight. So, two questions, process gap around what they need to do process-wise, and then just organizational behavior. Are they there mentally, what are some of the criteria in your mind, in your experiments, with customers around the processes that they go through, and then organizational mindset. >> Yeah, so what I would say is, first of all, from an organizational mindset perspective, it's very important to start educating, not just the analysis team, but the entire business on what this whole machine-learning, big data thing is all about, and how to ask the right questions. So, really starting to think about the opportunities you have to move your business forward, rather than what you already know, and think forward rather than retrospective. So, the other thing we often have to teach people, as well, is that this isn't about what you can get from the data warehouse, or replacing your data warehouse or anything like that. It's about answering the right questions, with the right tools, and here is a whole set of tools that allow you to answer different questions that you couldn't before, so leverage them. So, that's very important, and so that mindset requires time actually, to transform business into that mindset, and a lot of commitment from the business to make that happen. >> So, mindset first, and then you look at the process, then you get to the product. >> Yep, so, and basically, once you have that mindset, you need to set up an engine that's going to run, and start to drive the ROI out, and the engine includes, you know, your technical folk, but also your business users, and that engine will then start to build up momentum. The momentum builds more interest, and, overtime, you start to get your entire business into using these tools. >> It kind of makes sense, just kind of riffing in real time here, so the product-gap conversation should probably come after you lay that out first, right? >> Totally, yeah, I mean, you don't choose a product before you know what you need to do with it. So, but actually often companies don't know what they need to do with it, because they've got the wrong mindset in the first place. And so part of the road map stuff that we do, that we have a road map offering, is about changing that mindset, and helping them to get through that first stage, where we start to put, articulate the right use cases, and that really is driving a lot of value for our customers. Because they start from the right place-- >> Sometimes we hear stories, like the product kind of gives them a blind spot, because they tend to go into, with a product mindset first, and that kind of gives them some baggage, if you will. >> Well, yeah, because you end up with a situation, where you go, you get a product in, and then you say what can we do with it. Or, in fact, what happens is the vendor will say, these are the things you could do, and they give you use cases. >> It constrains things, forecloses tons of opportunities, because you're stuck within a product mindset. >> Yeah, exactly that, and you're not, you don't want to be constrained. And that's why open-source, and the kind of ecosystem that we have within the big data space is so powerful, because there's so many different tools for different things but don't choose your tool until you know what you're trying to achieve. >> I have a market question, maybe you just give us opinion, caveat, if you like, it's sort of a global, macro view. When we started first looking at the big data market, we noticed right away the dominant portion of revenue was coming from services. Hardware was commodity, so, you know, maybe sort of less than you would, obviously, in a mainframe world, and open-source software has a smaller contribution, so services dominated, and, frankly, has continued to dominate, since the early days. Do you see that changing, or do you think those percentages, if you will, will stay relatively constant? >> Well, I think it will change over time, but not in the near future, for sure, there's too much advancement in the technology landscape for that to stop, so if you had a set of tools that weren't really evolving, becoming very mature, and that's what tools you had, ultimately, the skill sets around them start to grow, and it becomes much easier to develop stuff, and then companies start to build out industry- or solutions-specific stuff on top, and it makes it very easy to build products. When you have an ecosystem that's evolving, growing with the speed it is, you're constantly trying to keep up with that technology, and, therefore, services have to play an awful big part in making sure that you are using the right technology, at the right time, and so, for the near future, for certain, that won't change. >> Complexity is your friend. >> Yeah, absolutely. Well, you know, we live in a complex world, but we live and breathe this stuff, so what's complex to some is not to us, and that's why we add value, I guess. >> Mike Merritt-Holmes here inside The Cube with Teradata Think Big. Thanks for spending the time sharing your insights. >> Thank you for having me. >> Understand the organizational mindset, identify the process, then figure out the products. That's the insight here on The Cube, more coverage of Data Works Summit 2017, here in Germany after this short break. (upbeat electronic music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Horton Works. formerly the co-founder of and I got to ask you, you know, I mean it's a great place to be. but the big data stuffs and they are, you know, of the fence there. that for you quicker. and when you start to put but what advice would you give customers? a lot of the greatest if you don't know what you're looking for, got to have enough data I wonder if you could address that, and for data scientists to and you go, wow you've Bottom-up is the you know and you can't do one without the other. and then you got the is to help with those kind of things, not being really, you know, in the public place, but-- The minute you start and that's when you can start so the pressure seems to and a lot of commitment from the business then you get to the product. and the engine includes, you and helping them to get because they tend to go into, and then you say what can we do with it. because you're stuck and the kind of ecosystem that we have of less than you would, and so, for the near future, Well, you know, we live Thanks for spending the identify the process, then

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Japan	LOCATION	0.99+
Mike	PERSON	0.99+
John Furrier	PERSON	0.99+
Lego	ORGANIZATION	0.99+
Mike Merritt-Holmes	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Think Big	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
first question	QUANTITY	0.99+
Munich	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
last month	DATE	0.99+
one	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Big Data Partnership	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.98+
two constituencies	QUANTITY	0.98+
next year	DATE	0.98+
first	QUANTITY	0.98+
Nordics	LOCATION	0.98+
first stage	QUANTITY	0.98+
#DW17	EVENT	0.97+
Data Works Summit 2017	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.96+
Tableau	TITLE	0.95+
Hadoop	TITLE	0.95+
four	DATE	0.93+
Hadoop Summit	EVENT	0.93+
five years ago	DATE	0.9+
Apache	TITLE	0.89+
The Cube	ORGANIZATION	0.87+
Vice President	PERSON	0.87+
Data Works Summit Europe 2017	EVENT	0.83+
a couple of weeks ago	DATE	0.82+
one avenue	QUANTITY	0.82+
DataWorks Summit Europe 2017	EVENT	0.8+
Kaylo	PERSON	0.8+
past year	DATE	0.79+
Global Services Strategy	ORGANIZATION	0.79+
Teradata Think Big	ORGANIZATION	0.77+
three	QUANTITY	0.76+
double	QUANTITY	0.75+
Think Big -	EVENT	0.71+
Covering	EVENT	0.69+
Hadoob	ORGANIZATION	0.62+
decade	QUANTITY	0.58+
second	QUANTITY	0.58+
Cube	COMMERCIAL_ITEM	0.56+
CXO	PERSON	0.48+
Cube	ORGANIZATION	0.46+
#theCUBE	ORGANIZATION	0.45+

Scott Gnau | DataWorks Summit Europe 2017

>> More information, click here. (soothing technological music) >> Announcer: Live from Munich, Germany, it's theCUBE. Covering Dataworks Summit Europe 2017. Brought to you by Hortonworks. (soft technological music) >> Okay welcome back everyone, we're here in Munich, Germany for Dataworks Summit 2017 formerly Hadoop Summit powered by Hortonworks. It's their event, but now called Dataworks because data is at the center of the value proposition Hadoop plus Airal Data and storage. I'm John, my cohost David. Our next guest is Scott Gnau he's the CTO of Hortonworks joining us again from the keynote stage, good to see you again. >> Thanks for having me back, great to be here. >> Good having you back. Get down and dirty and get technical. I'm super excited about the conversations that are happening in the industry right now for a variety of reasons. One is you can't get more excited about what's happening in the data business. Machine learning AI has really brought up the hype around, to me is human America, people can visualize AI and see the self-driving cars and understand how software's powering all this. But still it's data driven and Hadoop is extending into data seeing that natural extension and CloudAIR has filed their S1 to go public. So it brings back the conversations of this opensource community that's been doin' all this work in the big data industry, originally riding in the horse of Hadoop. You guys have an update to your Hadoop data platform which we'll get to in a second, but I want to ask you a lot of stories around Hadoop, I say Hadoop was the first horse that everyone rode in on in the big data industry... When I say big data, I mean like DevOps, Cloud, the whole open sourcing he does, but it's evolving it's not being replaced. So I want you to clarify your position on this because we're just talkin' about some of the false premises, a lot of stories being written about the demise of Hadoop, long-live Hadoop. Yeah, well, how long do we have? (laughing) I think you hit it first, we're at Dataworks Summit 2017 and we rebranded and it was previously Hadoop Summit. We rebranded it to really recognize that there's this bigger thing going on and it's not just Hadoop. Hadoop is a big contributor, a big driver, a very important part of the ecosystem but it's more than that. It's really about being able to manage and deliver analytic content on all data across that data's lifecycle from when it gets created at the edge to its moving through networks, to its landed and store in a cluster to analytics run and decisions go back out. It's that entire lifecycle and you mentioned some of the megatrends and I talked about this morning in the opening keynote. With AI and streaming and IoT, all of these things kind of converging are creating a much larger problem set and frankly, opportunity for us as an industry to go soft. So that's the context that we're really looking-- >> And there's real demand there. This is not like, I mean there's certainly a hype factor on AI, but IoT is real. You have data now, not just a back office concept, you have a front-facing business centric... I mean there's real customer demand here. >> There's real customer demand and it really creates the ability to dramatically change a business. A simple example that I used onstage this morning is think about the electric utility business. I live in Southern California. 25 years ago, by the way I studied to be an electrical engineer, 20 years ago, 30 years ago, that business not entirely simple was about building a big power plant and distributing electrons out to all the consumers of electrons. One direction and optimization of that grid, network and that business was very hard and there was billions of dollars at stake. Fast forward to today, now you still got those generating plants online, but you've also got folks like me generating their own power and putting it back into the grid. So now you've got bidirectional electrons. The optimization is totally different. Then how do you figure out how most effectively to create capacity and distribute that capacity because created capacity that's not consumed is 100% spoiled. So it's a huge data problem but it's a huge data problem meeting IoT, right? Devices, smart meter devices out at the edge creating data doing it in realtime. A cloud blew over, my generating capacity on my roof went down so I've got to pull from the grid, combining all of that data to make realtime decisions is we're talking hundreds of billions of dollars and it's being done today in an industry, it's not a high-tech Silicon Valley kind of industry, electric utilities are taking advantage of this technology today. >> So we were talking off-camera about you know some commentary about the Hadoop is failed and obviously you take exception to that and I and you also made the point it's not just about Hadoop but in a way it is because Hadoop was the catalyst of all this open Why has Hadoop not failed in your view >> Well because we have customers and you know the great thing about conferences like this is we're actually able to get a lot of folks to come in and talk about what they're doing with the technology and how they're driving business benefit and share that business benefit to their colleagues so we see that that it's business benefit coming along you know In any hype cycle you know people can go down a path maybe they had false expectations right early on you know six years ago years ago we were talking about hey is open source of Hadoop is going to come along and replace EDW complete fallacy right what I talked about in that opportunity being able to store all kinds of disparate data being able to manage and maneuver analytics in real time that's the value proposition is very different than some of the legacy ten. So if you view it as hey this thing is going to replace that thing okay maybe not but the point is is very successful for what is not verified that-- >> Just to clarify what you just said there that was you guys never kicked that position. CloudAIR or did with their impala was their initial on you could give me that you don't agree with that? >> Publicly they would say oh it's not a replacement but you're right i mean the actions were maybe designed to do that >> And set in the marketplace that that might be one of the outcomes >> Yeah, but they pivoted quickly when they realized that was failed strategy but i mean that but that became a premise that people locked in on. >> If that becomes your yardstick for measuring then then so-- >> Oh but but wouldn't you agree that that Hadoop in many respects was designed to solve some of the problems that edw never could >> Exactly so so you know again when you think about the the variety of data when you think about the analytic content doing time series analysis is very hard to do in a relational model so it's a new tool in the workbench to go solve analytic problems and so when you look at it from that perspective and I use the utility example the manufacturing example financial consumer finance telco all of these companies are using this technology leveraging this technology to solve problems they couldn't solve or and frankly to build new businesses that they couldn't build before because they didn't have access to that real time-- >> And so money did shift from pouring money into the edw with limited returns because you were at the steep part or the flat part of the s-curve to hey let's put it over here and this so called big data thing and that's why the market I think was conditioned to sort of come to that simple conclusion but dollars the spending did shift did it not? >> Yeah I mean if you subscribe kind of that to that herd mentality and you know the net increase the net new expenditure in the new technology is always going to outpace the growth of the existing kind of plateau technologists. That's just math. >> The growth yes, but not the size not the absolute dollars and so you have a lot of companies right now struggling in the traditional legacy space and you got this rocket ship going in-- >> And again I think if you think about kind of the converging forces that are out there in addition to you know i OT and streaming the ability frankly Hadoop is an enabler of AI when you think about the success of AI and machine learning it's about having massive massive massive amounts of data right? And I think back 25 years ago my first data Mart was 30 gigabytes and we thought that was all the data in the world Now fits on your phone so so when you think about just having the utter capacity and the ability to actually process that capacity of data these are technology breakthroughs that have been driven in the poor open source in Hadoop community when combined with the ability then to execute in clouds and ephemeral kinds of workloads you combine all that stuff together now instead of going to capital committee for 20 millioin dollars for a bunch of hardware to do an exabyte kind of study where you may not get an answer that means anything you can now spin that up in the cloud and for a couple of thousand dollars get the answer take that answer and go build a new system of insight that's going to drive your business and this is a whole new area of opportunity or even by the convergence of all that >> So I agree i mean it's absurd to say Hadoop and big data has failed, it's crazy. Okay but despite the growth i called profitless prosperity can the industry fund itself I mean you've got to make big bets yarn tezz different clouds how does the industry turn into one that is profitable and growing well I mean obviously it creates new business models and new ways of monetizing software in deploying software you know one of the key things that is core to our belief system is really leveraging and working with and nurturing the community is going to be a key success factor for our business right nurturing that innovation in collaboration across the community to keep up with the rate of pace of change is one of the aspects of being relevant as a business and then obviously creating a great service experience for our customers so that they they know that they can depend on enterprise class support enterprise-class security and governance and operational management in the cloud and on-prem in creating that value propisition along with the the advanced and accelerated delivery of innovation is where I think you know we kind of intersect uniquely in in the in the industry. >> and one of the things that I think that people point out and I have this conversation all the time of people who try to squint through the you know the wall street implications of the value proposition of the industry and this and that and I want to get your thoughts on because open source at this era that we're living in today bringing so much value outside of just important works in your your company Dave would made a comment on the intro package we're doing is that the practitioners are getting a lot of value people out in the field so these are the white space as a value and they're actually transformative can you give some examples where things are getting done that are real of real value as use cases that are that are highlighted you guys can i light I think that's the unwritten story that no one thought about it that rising tide floating all boat happening? >> Yeah yes I mean what is the most use cases the white so you have some of those use cases again it really involves kind of integrating legacy traditional transactional information right very valuable information about a company its operations its customers its products and all this kind of thing about being able to combine that with the ability to do real-time sensor management and ultimately have a technology stack that enables kind of the connection of all of those sources of data for an analytic and that's an important differentiation you know for the first 25 years of my career right it was all about what school all this data into a place and then let's do something with it and then we can push analytics back not an entirely bad model but a model that breaks in the world of IOT connected devices it's just frankly isn't enough money to spend on bandwidth to make that happen and as fast as the speed of light is it creates latency so those decisions aren't going to be able to be made in time so we're seeing even in traditional i mentioned utility business think about manufacturing oil and gas right sensors everywhere being able to take advantage not not of collecting all the central data and all of that but being able to actually create analytics based on sensor data and put those analytics outs of the sensors to make real-time decisions that can affect hundreds of millions of dollars of production or equipment are the use cases that we're seeing be deployed today and that's complete white space that was unavailable before. >> Yeah and customer demand too I mean Dave and I were also debating about the this not being a new trend this is just big data happening the customers are demanding production workload so you've seen a lot more forcing function driven by the customer and you guys have some news I want to get to and give your thoughts on HTTP or worse data platform two points dicks what's the key news their house in real time you talking about real time. >> Yeah it's about real time real time flexibility and choice you know motherhood and apple pie >> And the major highlights of that operate >> So the upgrades really inside of hive we now have operational analytic query capabilities where when you do tactical response times second sub second kind of response time. >> You know Hadoop and Hive wasn't previously known for that kind of a tactical response we've been able to now add inside of that technology the ability to view that workload we have customers who building these white space applications who have hundreds or thousands of users or applications that depend on consistency of very quick analytic response time we now deliver that inside the platform what's really cool about it in addition to the fact that it works is is that we did it inside a pipe so we didn't create yet another project or yet another thing that a customer has to integrate to or rewrite their application so any high based application cannot take advantage of this performance enhancement and that's part of our thinking of it as a platform the second thing inside of that that we've done that really it creaks to those kinds of workload is is we've really enhance the ability to incremental data acquisition right whether it be streaming whether it be patch up certs right on the sequel person doing up service being able to do that data maintenance in an active compliant fashion completely automatically and behind the scenes so that those applications again can just kind of run without any heavy lifting >> Just staying in motion kind of thing going on >> Right it's anywhere from data in motion even to batch to mini batch and anywhere kind of in between but we're doing those incremental data loads you know, it's easy to get the same file twice by mistake you don't want to double count you want to have sanctity of the transactions we now handle that inside of Hive with acid compliance. >> So a layperson question for the CTO if I may you mentioned Hadoop was not known for a sort of real-time response you just mentioned acid it was never in the early days known for a sort of acid you know complies others would say you know Hadoop the original Big Data Platform is not designed for the matrix of the matrix math of AI for example are these misconceptions and like Tim Berners-lee when we met Tim Berners-lee web 2.0 this is what the web was designed for would you say the same thing about Hadoop? >> Yeah. Ultimately from my perspective and kind of mending it out, Hadoop was designed for the easy acquisition of data the easy onboarding of data and then once you've onboarded that data it it also was known for enabling new kinds of analytics that could be plugged in certainly starting out with MapReduce in HDFS was kind of before but the whole idea is I have now the flexible way to easily acquire data in its native form without having to apply schema without having to have any formatting distort I can get it exactly as it was and store it and then I can apply whatever schema whatever rules whatever analytics on top of that that I want so the center of gravity from my mind has really moved up to yarn which enables a multi-tenancy approach to having pluggable multiple different kinds of file formats and pluggable different kinds of analytics and data access methods whether it be sequel whether it be machine learning whether the HBase will look up and indexing and anywhere kind of in between it's that it's that Swiss Army knife as it were for handling all of this new stuff that is changing every second we sit here data has changed. >> And just a quick follow-up if I can just clarification so you said new types of analytics that can be plugged in by design because of its openness is that right? >> By design because of its openness and the flexibility that the platform was was built for in addition on the performance we've also got a new update to spark and usability consume ability and collaboration for data scientists using the latest versions of spark inside the platform we've got a whole lot of other features and functions as that our customers have asked for and then on the flexibility and choice it's available public cloud infrastructures of service public cloud platform as a service on Prem x and net new on prem with power >> Just got final question for you just as the industry evolves what are some of the key areas that open source can pivot to that really takes advantage of the machine learning the AI trends going on because you start to see that really increase the narrative around the importance of data and a lot of people are scratching their heads going okay i need to do the back office to set up my IT to have all those crates stuff always open source projects all that the Hadoop data platform but then I got to get down and dirty i might do multiple clouds on the hybrid cloud going on i might want to leverage the moles canoe cool containers and super Nettie's and micro services and almost devops where's that transition happening as a CTO what do you see that that how do you talk to customers about that this transition this evolution of how the data businesses in getting more and more mainstream? >> Yeah i mean i think i think the big thing that people had to get over is we've reverse polarity from again 30 years of I want a stack vendor to have an integrated stack of everything a plug-and-play it's integrated and end it might not be a hundred percent what I want but the cost leverage that I get out of the stack versus what I'm going to go do that's perfect in this world if the opposite it's about enabling the ecosystem and that's where having and by the way it's a combination of open source and proprietary software that you know some of our partners have proprietary software that's okay but it's really about enabling the ecosystem and I think the biggest service that we as an open source community can do is to continue to kind of keep that standard kernel for the platform and make it very usable and very easy for many apps and software providers and other folks. >> A thousand flower bloom and kind of concept and that's what you've done with the white spaces as these cases are evolving very rapidly and then the bigger apps are kind of going to settling into a workload with realtime. >> Yeah all time you know think about the next generation of IT professional the next generation of business professional grew up with iphones and here comes they grew up in a mini app world i mean it download an app i'm going to try it is a widget boom and it's going to help me get something done but it's not a big stack that I'm going to spend 30 years to implement and I liked it and then I want to take to those widgets and connect them together to do things that i haven't been able to do before and that's how this ecosystem is really-- >> Great DevOps culture very agile that's their mindset. So Scott congratulations on your 2.6 upgrade and >> Scott: We're thrilled about it. >> Great stuff acid compliance really big deal again these compliance because little things are important in the enterprise great all right thanks for coming to accuse the Dataworks in Germany Munich I'm John thanks for watching more coverage live here in Germany after this short break

Published Date : Apr 5 2017

SUMMARY :

(soothing technological music) Brought to you by Hortonworks. because data is at the center of the value proposition that are happening in the industry you have a front-facing business centric... combining all of that data to make realtime decisions and share that business benefit to their Just to clarify what you just said there a premise that people locked in on. that to that herd mentality and you know the community to keep up with the rate cases the white so you have some of debating about the this not being a new So the upgrades really inside of hive we it's easy to get the same file twice by mistake you the CTO if I may you mentioned Hadoop acquisition of data the easy onboarding the big thing that people had to get kind of going to settling into a So Scott congratulations on your 2.6 upgrade and

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
100%	QUANTITY	0.99+
John	PERSON	0.99+
David	PERSON	0.99+
Dave	PERSON	0.99+
Germany	LOCATION	0.99+
Southern California	LOCATION	0.99+
30 years	QUANTITY	0.99+
30 gigabytes	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
hundreds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Swiss Army	ORGANIZATION	0.99+
six years ago years ago	DATE	0.99+
America	LOCATION	0.99+
25 years ago	DATE	0.99+
Hadoop	TITLE	0.99+
Munich, Germany	LOCATION	0.99+
today	DATE	0.98+
Dataworks Summit 2017	EVENT	0.98+
30 years ago	DATE	0.98+
two points	QUANTITY	0.98+
iphones	COMMERCIAL_ITEM	0.98+
telco	ORGANIZATION	0.98+
Hadoop	ORGANIZATION	0.98+
hundred percent	QUANTITY	0.98+
billions of dollars	QUANTITY	0.98+
first 25 years	QUANTITY	0.97+
DevOps	TITLE	0.97+
hundreds of millions of dollars	QUANTITY	0.97+
20 years ago	DATE	0.97+
20 millioin dollars	QUANTITY	0.97+
twice	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
first	QUANTITY	0.97+
one	QUANTITY	0.97+
One	QUANTITY	0.96+
second thing	QUANTITY	0.96+
Tim Berners-lee	PERSON	0.96+
Silicon Valley	LOCATION	0.96+
Munich	LOCATION	0.96+
Hadoop Summit	EVENT	0.96+
One direction	QUANTITY	0.96+
first horse	QUANTITY	0.95+
first data	QUANTITY	0.95+
Dataworks	ORGANIZATION	0.94+
second	QUANTITY	0.92+
Cloud	TITLE	0.92+
EDW	ORGANIZATION	0.85+
2017	EVENT	0.85+
couple of thousand dollars	QUANTITY	0.84+
Dataworks Summit Europe 2017	EVENT	0.84+
MapReduce	TITLE	0.84+
thousands of users	QUANTITY	0.83+
lot of folks	QUANTITY	0.83+
this morning	DATE	0.8+
S1	TITLE	0.79+
Europe	LOCATION	0.78+
A thousand flower bloom	QUANTITY	0.78+
2.6	OTHER	0.76+
apps	QUANTITY	0.73+

Day One Kickoff– DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Recovery. DataWorks Summit Europe 2017. Brought to you by Hortonworks. >> Hello everyone, welcome to The Cube's special presentation here in Munich, Germany for DataWorks Summit 2017. This is the Hadoop Summit powered by Hortonworks. This is their event and again, shows the transition from the Hadoop world to the big data world. I'm John Furrier. My co-host Dave Vellante, good to see you Dave. We're back in the seats together, usually on different events, but now here together in Munich. Great beer, great scene here. Small European event for Hortonworks and the ecosystem but it's called DataWorks 2017. Strata Hadoop is calling themselves Strata and Data. They're starting to see the word Hadoop being sunsetted from these events, which is a big theme of this year. The transition from Hadoop being the branded category to Data. >> Well, you're certainly seeing that in a number of ways. The titles of these events. Well, first of all, I love being in Europe. These venues are great, right? They're so Euro, very clean and magnificent. But back to your point. You're seeing the Hadoop Summit now called the DataWorks Summit. You're seeing the Strata Plus Hadoop is now Strata Plus, I don't even know what it is. Right, it's not Hadoop driven anymore. You see it also in Cloudera's IPO. They're going to talk about Hadoop and Hadoop Distro. They're a Hadoop Distro vendor but they talked about being a data management company and John, I think we are entering the era, or well deep into the era of what I have been calling for the last couple of years, profitless prosperity. Really where you see the Cloudera IPO, as you know, they raised money from Intel, over $600 million at a $4.1 billion dollar valuation. The Wall Street Journal says they'll have a tough time getting a billion dollar valuation. For every dollar each of these companies spends, Hortonworks and Cloudera, they lose between $1.70 and $2.50, so we've always said at SiliconANGLE, Wiki Bond and The Cube that people are going to make money in big data or the practitioners of big data, and it's hard to find those guys, it's hard to see them but that's really what's happening is the industries are transforming and those are the guys that are putting money into their bottom line. Not so much for technology vendors. >> Great to unpack that but first of all, I want to just say congratulations to Wiki Bond for getting it right again. As usual Wiki Bond, ahead of the curve and being out there and getting it right because I think you nailed it and I think Wiki Bond saw this first of all the research firms, kind of, you know, pat ourselves on the back here, but the truth is that practitioners are making the money and I think you're going to see more of that. In fact, last night as I'm having a nice beer here in Germany, I just like to listen to the conversations in the bar area and a lot of conversations around, real conversations around, you know, doing deals, and you know, deployments. You know, you're hearing about HBase, you're hearing about clusters, you're hearing about service revenue, and I think this is the focus. Cloudera, I think, in a classic Silicon Valley way, their hubris was tempered by their lack of scale. I mean, they didn't really blow it out. I mean, now they do 200 million in revenue. Nothing to shake a stick at, they did a great job, but they're buying revenue and Hortonworks is as well. But the ecosystem is the factor, and this is the wildcard. I'm making a prediction. Profitless prosperity that you point out is right, but I think that it has longevity with these companies like Hortonworks and Cloudera and others, like MapR because the ecosystem's robust. If you factor in the ecosystem revenue that is enough rising tide in my opinion. The question is how do they become sustainable as a standalone venture, that Red Hat for Hadoop never worked as Pat Gilson, you know, predicted. So, I think you're going to see a quick shift and pivot quickly by Hortonworks, certainly Cloudera's going to be under the microscope once they go public. I'm expecting that valuation to plummet like a rock. They're going to go public, Silicon Valley people are going to get their exits but. >> Excel will be happy. >> Everyone, yeah, they'll be happy. They already sold in 2013. They did a big sale, I mean, all of them cashed out two years ago when that liquidation event happened with Intel but that's fine. But now it's back to business building and Hortonworks has been doing it for years, so when you see your evaluation is less than a billion, so I'm expecting Cloudera to plummet like a rock. I would not buy the IPO at all because I think it's going to go well under a billion dollars. >> And I think it's the right call and as we know, last year, at the end of last year, Fidelity and other mutual funds devalued their holdings in Cloudera and so, you know, you've got this situation where, as you say, a couple hundred, maybe you know, on the way to 300 million in revenue, Hortonworks on the way to 200 million in revenue. Add up the ecosystem, yeah, maybe you get to a billion, throw in all of what IBM and Oracle call big data, and it's kind of a more interesting business, but you've called it same wine, new bottle. Is it a new bottle? Now, what I mean by that is the shift from Hadoop and then again, you read Cloudera's S1, it's all about AI, machine learning, you know, the cloud. Interesting, we'll talk about the cloud a little later, but is it same wine, new bottle, or is this really a shift toward a new era of innovation? >> It's not a new shift. It's the same innovation that the Hortonworks was founded on. Big data is a categorical and Hadoop was the horse they rode in on, but I think what's changing is the fact that customers are now putting real projects on the table and the scrutiny around those projects have to produce value, and the value comes down to total cost of ownership and business value. And that's becoming a data specific thing, and you look at all the successes in the big data world, Spark and others, you're seeing a focus on cloud integration and real-time workloads. These are real projects. This isn't fantasy. This isn't hype. This isn't early adopter. These are real companies saying we are moving to a new paradigm of digital transforming our companies and we need cost efficiencies but revenue-producing applications and workloads that are going to be running in the cloud with data at the heart of it. So, this is a customer-forcing function where the customers are generally excited about machine learning, moving to real-time classification of workloads. This is the deal and no hubris, no technology posturing, no open standards, jockeying can right the situation. Customers have demands and they want them filled, and we're going to have a lot of guests on here and I'm going to ask them those direct questions. What are you looking for and? >> Well, I totally agree with what you're saying and when we first met, it was right around the, you know, the mid point of the web 2.0 era, and I remember Tim Berners-Lee commenting on all this excitement, everybody's doing, he said this is what the web was invented to do, and this is what big data was invented to do. It was to produce deep analytics, deep learning, machine learning, you know, cognitive, as IBM likes to brand that, and so, it really is the next era even though people don't like to use the term big data anymore. We were talking to, you know, some of the folks in our community earlier, John, you and I, about some of the challenges. Why is it profitless, you know? Why is there so much growth but it's no profit? And you know, we have to point out here that people like Hortonworks and Cloudera, they've made some big bets, take HDSF of example. And now you have the cloud guys, particularly Amazon, coming in, you know, with S3. Look at YARN, big open source project. But you got Docker and Kubernetes seem to be mopping that up. Tez was supposed to replace MapReduce and now you've got. >> I mean, I wouldn't say mopping up, I mean. >> You've got Spark. >> At the end of the day the ecosystem's going to revolve around what the customers want, and portability of workloads, Kubernetes and microservices, these are areas that just absolutely make a lot of sense and I think, you know, people will move to where the frictionless action is and that's going to happen with Kubernetes and containers and microservices, but that just speaks to the devops culture, and I think Hadoop ecosystem, again, was grounded in the devops culture. So, yeah, there's some progress that are going to maybe go out of flavor, but there's other stuff coming up trough the ranks in open source and I think it's compelling. >> But where I disagree with what you're saying is well, the point I'm trying to make, is you have to, if you're Cloudera and Hortonworks, you have to support those multiple projects and it's expensive as hell. Whereas the cloud guys put all their wood behind one arrow, to use an old Scott McNealy phrase, and you know, Amazon, I would argue is mopping up in big data. I think the cloud guys, you know, it's ironic to me that Cloudera in the cloud era picked that name, you know, but really never had. >> John: They missed the cloud. >> They've never really had a strong cloud play, and I would say the same thing with Hortonworks and MapR. They have to play in the cloud and they talk about cloud, but they've got to support hybrid, they've got to support on param, they got to pick the clouds that they're going to support, AWS, Azure, maybe IBM's cloud. >> Look, Cloudera completely missed the cloud era, pun intended. However, they didn't miss open source but they're great at and I'm an admirer of Cloudera and Hortonworks on is that their open source ethos is what drove them, and so they kind of got isolated in with some of their product decisions, but that's not a bad thing. I mean, ultimately, I'm really bullish on Cloudera and Hortonworks because the ecosystem points I mentioned earlier are not high on the I wouldn't buy the IPO, I think I'd buy them at a discount, but Cloudera's not going to go away, Dave. They're going to go public. I think the valuation's going to drop like a rock and then settle around a billion, but they have good management. The founders still there, Michael Olson, Amr Awadallah. So, you're going to see Cloudera transform as a company. They have to do business out in the open and they're not afraid to, obviously they're open source. So, we're going to start to see that transition from a private venture backed, scale up, buy revenue. In the playbook of Silicon Valley venture capital's Excel partners and Greylock. Now they go public and get liquid and then now next phase of their journey is going to be build a public company and I think that they will do a good job doing it and I'm not down on them at all for that and I think it's just going to be a transition. >> Well, they're going to raise what? A couple 100 million dollars? But this industry, yeah, this industry's cashflow negative, so I agree with you. Open source is great, let's ra-ra for open source and it drives innovation, but how does this industry pay for itself? That's what I want to know. How you respond to that? >> Well, I think they have sustainable issues around services and I think partnering with the big companies like Intel that have professional services might help them on that front, but Michael Olson said in his founder's letter in his S1, kind of AI washing, he said AI and cognitive. But that's okay because Cloudera could easily pivot with their brain power, and same with Hortonworks to AI. Machine learning is very open source driven. Open source culture is growing, it's not going away, so I think Cloudera's in a very good position. >> I think the cloud guys are going to kill them in that game, and cloud guys and IBM are going to cream these profitless startups in that AI and machine learning game. >> We'll see. >> You disagree? >> I disagree, I think. Well, I mean, it depends. I mean, you know, I'm not going to, you know, forecast what the managements might do, but I mean, if I'm cloud looking at what Cloudera's done. >> What would you do? >> I would do exactly what Mike Olson's doing is I'd basically pivot immediately to machine learning. Look at Google. TensorFlow it's go so much traction with their cloud because it's got machine learning built into it. Open source is where the action is, and that's where you could do a lot of good work and use it as an advantage in that they know that game. I would not count out the open source game. >> So, we know how IBM makes money at that, you know, in theory anyway it wants. We know how Amazon's going to make money at that with their priority approach, Microsoft will do the same thing. How to Cloudera and Hortonworks make money? >> I think it's a product transition around getting to the open source with cloud technologies. Amazon is not out to kill open source, so I think there's an opportunity to wedge in a position there, and so they just got to move quickly. If they don't make these decisions then that's a failed execution on the management team at Cloudera and Hortonworks and I think they're on it. So, we'll keep an eye on that. >> No, Amazon's not trying to kill open source, I would agree, but they are bogarting open source in a big way and profiting amazingly from it. >> Well, they just do what Amy Jessie would say, they're customer driven. So, if a customer doesn't want to do five things to do one thing this is back to my point. The customers want real-time workloads. They want it with open source and they don't want all these steps in the cost of ownership. That's why this is not a new shift, it's the same wine, new bottle because now you're just seeing real projects that are demanding successful and efficient code and support and whoever delivers it builds the better mousetrap. In this case, the better mousetrap will win. >> And I'm arguing that the better mousetrap and the better marginal economics, I know I'm like a broken record on this, but if I take Kinesis and DynamoDB and Red Ship and wrap it into my big data play, offer it as a service with a set of APIs on the cloud, like AWS is going to do, or is doing, and Azure is doing, that's a better business model than, as you say, five different pieces that I have to cobble together. It's just not economically viable for customers to do that. >> Well, we've got some big new coming up here. We're going to have two days of wall-to-wall coverage of DataWorks 2017. Hortonworks announcing 2.6 of their Hadoop Hortonworks data platform. We're going to talk to Scott now, the CTO, coming up shortly. Stay with us for exclusive coverage of DataWorks in Munich, Germany 2017. We'll be back with more after this short break.

Published Date : Apr 5 2017

SUMMARY :

Brought to you by Hortonworks. Hortonworks and the ecosystem and it's hard to find those guys, and you know, deployments. going to go well under and then again, you read Cloudera's S1, and I'm going to ask them and so, it really is the next era I mean, I wouldn't and that's going to happen with Kubernetes and you know, Amazon, that they're going to support, and I think that they will Well, they're going to raise what? and same with Hortonworks to AI. and cloud guys and IBM are going to cream I mean, you know, and that's where you could to make money at that and so they just got to move quickly. to kill open source, and they don't want all these steps and the better marginal economics, We're going to talk to Scott now, the CTO,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Michael Olson	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
2013	DATE	0.99+
Amy Jessie	PERSON	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
Fidelity	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Mike Olson	PERSON	0.99+
Germany	LOCATION	0.99+
Munich	LOCATION	0.99+
Wiki Bond	ORGANIZATION	0.99+
$2.50	QUANTITY	0.99+
Dave	PERSON	0.99+
Scott	PERSON	0.99+
John Furrier	PERSON	0.99+
last year	DATE	0.99+
MapR	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
200 million	QUANTITY	0.99+
Pat Gilson	PERSON	0.99+
Intel	ORGANIZATION	0.99+
less than a billion	QUANTITY	0.99+
two days	QUANTITY	0.99+
Scott McNealy	PERSON	0.99+
Tim Berners-Lee	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
over $600 million	QUANTITY	0.99+
The Cube	ORGANIZATION	0.99+
SiliconANGLE	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Hadoop	ORGANIZATION	0.98+
Hadoop Distro	ORGANIZATION	0.98+
300 million	QUANTITY	0.98+
two years ago	DATE	0.98+
DataWorks 2017	EVENT	0.98+
Google	ORGANIZATION	0.98+
Hadoop Summit	EVENT	0.98+
each	QUANTITY	0.98+
a billion	QUANTITY	0.97+
DataWorks Summit 2017	EVENT	0.97+
billion dollar	QUANTITY	0.97+
Amr Awadallah	PERSON	0.97+
Munich, Germany	LOCATION	0.97+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Informatica - DataWorks Summit 2017: