Cindy Maike, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering Data Works Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of Dataworks here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Cindy Maike. She is the VP Industry Solutions and GM Insurance and Healthcare at Hortonworks. Thanks so much for coming on theCUBE, Cindy. >> Thank you, thank you, look forward to it. >> So, before the cameras were rolling we were talking about the business case for data, for data analytics. Walk our viewers through how you, how you think about the business case and your approach to sort of selling it. >> So, when you think about data and analytics, I mean, as industries we've been very good sometimes at doing kind of like the operational reporting. To me that's looking in the rearview mirror, something's already happened, but when you think about data and analytics, especially big data it's about what questions haven't I been able to answer. And, a lot of companies when they embark on it they're like, let's do it for technology's sake, but from a business perspective when we, as our industry GMs we are out there working with our customers it's like, what questions can't you answer today and how can I look at existing data on new data sources to actually help me answer questions. I mean, we were talking a little bit about the usage of sensors and so forth around telematics and the insurance industry, connected homes, connective lives, connected cars, those are some types of concepts. In other industries we're looking at industrial internet of things, so how do I actually make the operations more efficient? How do I actually deploy time series analysis to actually help us become more profitable? And, that's really where companies are about. You know, I think in our keynote this morning we were talking about new communities and it's what does that mean? How do we actually leverage data to either monetize new data sources or make us more profitable? >> You're a former insurance CFO, so let's delve into that use case a little bit and talk about the questions that I haven't asked yet. What are some of those and how are companies putting this thing to work? >> Yeah so, the insurance industry you know, it's kind of frustrating sometimes where as an insurance company you sit there and you always monitor what your combined ratio is, especially if you're a property casualty company and you go, yeah, but that tells me information like once a month, you know, but I was actually with a chief marketing officer recently and she's like, she came from the retail industry and she goes, I need to understand what's going on in my business on any given day. And so, how can we leverage better real time information to say, what customers are we interacting with? You know, what customers should we not be interacting with? And then you know, the last thing insurance companies want to do is go out and say, we want you as a customer and then you decline their business because they're not risk worthy. So, that's where we're seeing the insurance industry and I'll focus a lot on insurance here, but it's how do we leverage data to change that customer engagement process, look at connected ecosystems and it's a good time to be well fundamentally in the insurance industry, we're seeing a lot of use cases, but also in the retail industry, new data opportunities that are out there. We talked a little bit before the interview started on shrinkage and you know, the retail industry's especially in the food, any type of consumer type packages, we're starting to see the usage of sensors to actually help companies move fresh food around to reduce their shrinkage. You know, we've got. >> Sorry, just define shrinkage, 'cause I'm not even sure I understand, it's not that your gapple is getting smaller. It refers to perishable goods, you explain it. >> Right, so you're actually looking at, how do we make sure that my produce or items that are perishable, you know, I want to minimize the amount of inventory write offs that I have to do, so that would be the shrinkage and this one major retail chain is, they have a lot of consumer goods that they're actually saying, you know what, their shrinkage was pretty high, so they're now using sensors to help them monitor should we, do we need to move certain types of produce? Do we need to look at food before it expires you know, to make sure that we're not doing an inventory write off. >> You say sensors and it's kind of, are you referring to cameras taking photos of the produce or are you referring to other types of chemical analysis or whatever it might be, I don't know. >> Yeah, so it's actually a little bit of both. It's how do I actually you know, looking at certain types of products, so we all know when you walk into a grocery store or some type of department store, there's cameras all over the place, so it's not just looking at security, but it's also looking at you know, are those goods moving? And so, you can't move people around a store, but I can actually use the visualization and now with deep machine learning you can actually look at that and say, you know what, those bananas are getting a little ripe. We need to like move those or we need to help turn the inventory. And then, there's also things with bar coding you know, when you think of things that are on the shelves. So, how do I look at those bar codes because in the past you would've taken somebody down the isle. They would've like checked that, but no, now we're actually looking up the bar codes and say, do we need to move this? Do we need to put these things on sale? >> At this conference we're hearing just so much excitement and talk about data as the new oil and it is an incredible strategic asset, but you were also saying that it could become a liability. Talk about the point at which it becomes a liability. >> It becomes a liability when one, we don't know what to do with it, or we make decisions off of data data, so you think about you know, I'll give you an example, in the healthcare industry. You know, medical procedures have changed so immensely. The advancement in technology, precision medicine, but if we're making healthcare decisions on medical procedures from 10 years ago, so you really need to say how do I leverage you know, newer data stats, so over time if you make your algorithms based on data that's 10, 20 years old, it's good in certain things, but you know, you can make some bad business decisions if the data is not recent. So, that's when I talk about the liability aspect. >> Okay, okay, and then, thinking about how you talk with, collaborate with customers, what is your approach in the sense of how you help them think through their concerns, their anxieties? >> So, a lot of times it's really kind of understanding what's their business strategy. What are their financial, what are their operational goals? And you say, what can we look at from a data perspective, both data that we have today or data that we can acquire from new data sources to help them actually achieve their business goals and you know, specifically in the insurance industry we focus on top line growth with growing your premium or decreasing your combined ratio. So, what are the types of data sources and the analytical use cases that we can actually you know, use? See the exact same thing in manufacturing, so. >> And, have customer attitudes evolved over time since you've been in the industry? How would you describe their mindsets right now? >> I think we still have some industries that we struggle with, but it's actually you know, I mentioned healthcare, the way we're seeing data being used in the healthcare industry, I mean, it's about precision medicine. You look at gnomics research. It says that if people like 58 percent of the world's population would actually do a gnomics test if they could actually use that information. So, it's interesting to see. >> So, the struggle is with people's concern about privacy encroachment, is that the primary struggle? >> There's a little bit of that and companies are saying, you know, I want to make sure that it's not being used against me, but there was actually a recent article in Best Review, which is an insurance trade magazine, that says, you know, if I have, actually have a gnomic test can the insurance industry use that against me? So, I mean, there's still a little bit of concern. >> Which is a legitimate concern. >> It is, it is, absolutely and then also you know, we see globally with just you know, the General Data Protection act, the GDPR, you know, how are companies using my information and data? So you know, consumers have to be comfortable with the type of data, but outside of the consumer side there's so much data in the industry and you made the comment about you know, data's the new oil. I have a thing, against, with that is, but we don't use oil straight in a car, we don't use crude putting in a car, so once we do something with it which is the analytical side, then that's where we get the business end side. So, data for data's sake is just data. It's the business end sites is what's really important. >> Looking ahead at Hortonworks five, 10 years from now I mean, how much, how much will your business account for the total business of Hortonworks do you think, in the sense of as you've said, this is healthcare and insurance represents such huge potential possibilities and opportunities for the company? Where do you see the trajectory? >> The trajectory I believe is really in those analytical apps, so we were working with a lot of partners that are like you know, how do I accelerate those business value because like I said, it's like we're not just into data management, we're in the data age and what does that mean? It's like turning those things into business value and I've got to be able to I think from an industry perspective, you know be working with the right partners and then also customers because they lack some of the skillsets. So, who can actually accelerate the time to value of using data for profitability? >> Is your primary focus area at helping regulated industries with their data analytics challenges and using IOT or does it also cover unregulated? >> Unregulated as well. >> Are the analytics requirements different between regulated and unregulated in terms of the underlying capabilities they require in terms of predictive modeling, of governance and so forth and how does Hortonworks differentiate their response to those needs? >> Yeah, so it varies a little bit based upon their regulations. I mean, even if you look at life sciences, life sciences is very, very regulated on how long do I have to keep the data? How can I actually use the data? So, if you look at those industries that maybe aren't regulated as much, so we'll get away from financial services, highly regulated across all different areas, but I'll also look at say business insurance, not as much regulated as like you and I as consumers, because insurance companies can use any type of data to actually do the pricing and doing the underwriting and the actual claims. So, still regulated based upon the solvency, but not regulated on how we use it to evaluate risk. Manufacturing, definitely some regulation there from a work safety perspective, but you can use the data to optimize your yields you know, however you see fit. So, we see a mixture of everything, but I think from a Hortonworks perspective it's being able to share data across multiple industries 'cause we talk about connected ecosystems and connected ecosystems are really going to change business of the future. >> So, how so? I mean, especially in bringing it back to this conference, to Data Works, and the main stage this morning we heard so much about these connected communities and really it's all about the ecosystem, what do you see as the biggest change going forward? >> So, you look at, and I'll give you the context of the insurance industry. You look at companies like Arity, which is a division of All State, what they're doing actually working with the car manufacturers, so at some point in time you know, the automotive industry, General Motors tried this 20 years ago, they didn't quite get it with On Star and GMAC Insurance. Now, you actually have the opportunity with you know, maybe on the front man for the insurance industry. So, I can now start to collect the data from the vehicle. I'm using that for driving of the vehicle, but I can also use it to help a driver make safer driving. >> And upsize their experience of actually driving, making it more pleasant as well as safer. There's many layers of what can be done now with the same data. Some of those uses impinge or relate to regulated concern or mandatory concerns, then some are purely for competitive differentiation of the whole issue of experience. >> Right, and you think about certain aspects that the insurance industry just has you know, a negative connotation and we have an image challenge on what data can and cannot be used, so, but a lot of people opt in to an automotive manufacturer and share that type of data, so moving forward who's to say with the connected ecosystem I still have the insurance company in the background doing all the underwriting, but my distribution channel is now the car dealer. >> I love it, great. That's a great note to end on. Thanks so much for coming on theCUBE. Thank you Cindy. I'm Rebecca Knight for James Kobielus. We will have more from theCUBE's live coverage of Data Works in just a little bit. (upbeat music)

Published Date : Jun 19 2018

SUMMARY :

brought to you by Hortonworks. She is the VP Industry Thank you, thank about the business case and your approach kind of like the operational reporting. the questions that I haven't asked yet. And then you know, the last goods, you explain it. before it expires you know, of the produce or are you also looking at you know, about data as the new oil but you know, you can make actually you know, use? actually you know, I mentioned that says, you know, if I have, the industry and you made accelerate the time to value business of the future. of the insurance industry. competitive differentiation of the whole Right, and you think Thank you Cindy.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Cindy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Cindy Maike	PERSON	0.99+
General Motors	ORGANIZATION	0.99+
General Data Protection act	TITLE	0.99+
San Jose	LOCATION	0.99+
10	QUANTITY	0.99+
Silicon Valley	LOCATION	0.99+
San Jose, California	LOCATION	0.99+
58 percent	QUANTITY	0.99+
Arity	ORGANIZATION	0.99+
GDPR	TITLE	0.98+
20 years ago	DATE	0.98+
On Star	ORGANIZATION	0.98+
once a month	QUANTITY	0.98+
GM Insurance	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.97+
Data Works Summit 2018	EVENT	0.96+
one	QUANTITY	0.96+
today	DATE	0.96+
DataWorks Summit 2018	EVENT	0.95+
both	QUANTITY	0.95+
10 years ago	DATE	0.94+
VP Industry Solutions	ORGANIZATION	0.94+
GMAC Insurance	ORGANIZATION	0.92+
this morning	DATE	0.9+
both data	QUANTITY	0.84+
five	QUANTITY	0.78+
20 years	QUANTITY	0.75+
10 years	QUANTITY	0.72+
Dataworks	ORGANIZATION	0.59+
Data Works	TITLE	0.59+
Best Review	TITLE	0.54+
theCUBE	EVENT	0.54+
State	ORGANIZATION	0.49+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Nathan Trueblood, DataTorrent | CUBEConversations

(techno music) >> Hey welcome back everybody, Jeff Frick here with The CUBE. We're having a cube conversation in the Palo Alto studio. It's a different kind of format of CUBE. Not in the context of a big show. Got a great guest here lined up who we just had on at a show recently. He's Nathan Trueblood, he's the vice president of product management for DataTorrent. Nathan great to see you. >> Thanks for having me. >> We just had you on The CUBE at Hadoop, or Data Works now, >> That's right. >> not Hadoop Summit anymore. So just a quick follow up on that, we were just talking before we turned the cameras on. You said that was a pretty good show for you guys. >> Yeah it was a really great show. In fact as a software company one of the things you really want to see at shows is a lot of customer flow and a lot of good customer discussions, and that's definitely what happened at Data Works. It was also really good validation for us that everyone was coming and talking to us about what can you do from a real time analytics perspective? So that was also a good strong signal that we're onto something in this marketplace. >> It's interesting, I heard your quote from somewhere, that really the streaming and the real time streaming in the big data space is really grabbing all the attention. Obviously we do Spark Summit. We did Flink Forward. So we're seeing more and more activity around streaming and it's so logical that now that we have the compute horsepower, the storage horsepower, the networking horsepower, to enable something that we couldn't do very effectively before but now it's opening up a whole different way to look at data. >> Yeah it really is and I think as someone who's been working the tech world for a while, I'm always looking for simplifying ways to explain what this means. 'Cause people say streaming and real time and all of that stuff. For us what it really comes down to is the faster I can make decisions or the closer to when something happens I can make a decision, that gives me competitive advantage. And so if you look at the whole big data evolution. It's always been towards how quickly can we analyze this data so that we can respond to what it's telling us? And in many ways that means being more responsive to my customer. So a lot of this came out of course originally from very large scale systems at some of the big internet companies like Yahoo where Hadoop was born. But really it all comes down to if I'm more responsive to my customer, I'm more competitive and I win. And I think what a lot of customers are saying across many different verticals is real time means more responsiveness and that means competitive advantage. >> Right and even we hear all the time moving into a predictive model, and then even to a prescriptive model where you're offloading a lot of the grunt work of the decision making, letting the machine do a lot more of that, and so really it's the higher value stuff that finally gets to the human at the end of the interaction who's got to make a judgment. >> That's exactly right, that's right. And so to me all the buzz about streaming is really representative of just this is now the next evolution of where big data architecture has been going which is towards moving away from a batch oriented world into something where we're making decisions as close to the time of data creation as possible. >> So you've been involved in not only tech for a long time but Hadoop specifically and Big Data specifically. And one of the knocks, I remember that first time I ever heard about Hadoop, is actually from Bill Schmarzo at EMC the dean of Big Data. And I was talking to a friend of it and he goes yeah but what Bill didn't tell you, there's not enough people. You know Hadoop's got all this great promise, there just aren't enough people for all the enterprises at the individual company level to implement this stuff. Huge part of the problem. And now you're at DataTorrent and as we talked before, interesting kind of shift in strategy and going to really an application focus strategy as opposed to more of a platform focus strategy so that you can help people at companies solve problems faster. >> That's right we've definitely focused, especially recently on more of an application strategy. But to kind of peel that back a little bit, you need a platform with all the capabilities that a platform has to be able to deliver large scale operable streaming analytics. But customers aren't looking for platforms, they're looking for please solve my business problem, give me that competitive advantage. I think it's a long standing problem in technology and particularly in Big Data where you build a tremendous platform but there's only a handful of people who know how to actually construct the applications to deliver that value. And I think increasingly in big data but also across all of tech, customers are looking for outcomes now and the way for us to deliver outcomes is to deliver applications that run on our platform. So we've built a tremendous platform and now we are working with customers and delivering applications for that platform so that it takes a lot of the complexity out of the equation for them. And we kind of think of it like if in the past it required sort of an architect level person in order to construct an application on our platform, now we're gearing towards a much larger segment of developers in the enterprise who are tremendously capable but don't have that deep Big Data experience that they need to build an application from scratch. >> And it's pretty interesting too 'cause another theme we see over and over and over and over, especially around the innovation theme is the democratization of the access to the data, the democratization of the tools to access the data so that anyone in the company or a much greater set of individuals inside the company have the opportunity to have a hypothesis, to explore the hypothesis, to come back with solutions. And so by kind of removing this ivory tower, either the data scientists or the super smart engineer who's the only one that has the capability to play with the data and the tools. That's really how you open up innovation is democratizing access and ability to test and try things. >> That's right, to me I look at it very simply, when you have large scale adoption of a technology, usually it comes down to simplifying abstractions of one kind or another. And the big simplifying abstraction really of Big Data is providing the ability to break up a huge amount of data and make some sense of it, using of course large scale distributed computing. The abstraction we're delivering at DataTorrent now is building on all that stuff, on all those layers, we've obscured all of that and now you can download with our software an application that produces an outcome. So for example one of the applications we're shipping shortly is a Omni-Channel credit card fraud prevention application. Now our customers in the past have already constructed applications like this on our platform. But now what we're doing like you said is democratizing access to those kinds of applications by providing an application that works out of the box. And that's a simplifying abstraction. Now truthfully there's still a lot of complexity in there but we are providing the pattern, the foundational application that then the customer can focus on customizing to their particular situation, their integrations, their fraud rules and so forth. And so that just means getting you closer to that outcome much more quickly. >> Watching your video from Data Works, one of the interesting topics you brought up is really speed and how faster, better, cheaper, which is innovative for a little while, becomes the new norm. And as soon as you reset the bar on speed, then they just want it, well can you go faster. So whether you went from a week to a day, a day to an hour, there's just this relentless pressure to be able to get the data, analyze the data, make a decision faster and faster and faster. And you've seen this just changing by leap years right over time. >> Right and I literally started my career in the days of ETL extracting data from tape that was data produced weeks or months ago, down to now we're analyzing data at volumes that were inconceivable and producing insight in less than a second, which is kind of mind boggling. And I think the interesting thing that's happening when we think about speed, and I've had a few discussions with other folks about this, they say well speed really only matters for some very esoteric applications. It's one of the things that people bring up. But no one has ever said well I wish my data was less fresh or my insight was not as current. And so when you start to look at the kinds of customers that want to bring real time data processing and analytics, it turns out that nearly every vertical that we look at has a whole host of applications where if you could bring real time analytics you could be more responsive to what your customer's doing. >> Right right. >> Right and that can be, certainly that's the case in retail, but we see it in industrial automation and IoT. All I think of is IoT is a way to sense what's going on in the world, bring that data in, get insight and take action from it. And so real time analytics is a huge part of that, which you know again, healthcare, insurance, banking, all these different places have used cases. And so what we're aiming to do at DataTorrent is make it easy for the businesses in those different verticals to really get the outcome they're looking for, not produce a platform and say imagine what you could do, but produce an application that actually delivers on a particular problem they have. >> It's funny too the speed equation, you saw it in Flash, remembering to shift gears a little bit into the hardware space right, is people said well it's only super low latency, super high volume transactions, financial services, is the only benefit we're going to get from Flash. >> Right yeah we've had the same knock for real time analytics. >> Same thing right, but as soon as you put it in, there's all these second order impacts, third order impacts that nobody ever thought of, that speed that delivers, that aren't directly tied to that transactional speed, but now enable you because of that transactional speed, to do so many other things that you couldn't even imagine to do and so that's why I think we see this pervasiveness of Flash, why wouldn't you want Flash? I mean why wouldn't you want to go faster? 'Cause there's so much upside. >> Yeah so again all of these innovations in IT come down to how can I be more flexible and more responsive to changing conditions? More responsive to my customer, more flexible when it comes to changing business conditions and so forth. And so now as we start to instrument the world and have technologies like machine learning and artificial intelligence, that all needs to be fed by data that is delivered as quickly as possible and then it can be analyzed to make decisions in real time. >> So I wanted to shift gears a little bit, kind of back to the application strategies. So you said you had the first app that's going to be, (Jeff drowned out by Nathan) >> Yeah so the first application yes it was fraud prevention. That's an important distinction there because the distinction between detection and prevention is the competitive advantage of real time. Because what we deliver in DataTorrent is the ability to process massive amounts of data in very very low time frame. Sub seconds time frames. And so that's the kind of fundamental capability you need in order to do something like respond to some kind of fraud event. And what we see in the market is that fraud is becoming a greater and greater problem. The market itself is expanding. But I think as we see fraud is also evolving in terms of the ways it can take place across e-commerce and point of sale and so forth. And so merchants and processors and everyone in the whole spectrum of that market is facing a massive problem and an evolving problem. And so that's where we're focused in one of our first I would say vertically oriented business applications is it's really easy to be able to take in new sources of data with our application but also to be able to process all that data and then run it through a decision engine to decide if something is fraudulent or not in a short period of time. So you need to be able to take in all that data to be able to make a good decision. And you need to be able to decide quickly if it's going to matter. And you also need to be able to have a really strong model for making decisions so that you avoid things like false positives which are as big a problem as preventing fraud itself if you deliver bad customer experience. And we've all had that experience as well which is your card gets shut down for what you think is a legitimate activity. >> It's just so ironic that false positives are the biggest problem with credit card fraud. >> Yeah it's one of yeah. >> You would think we would be thankful for a false positive but all you hear over and over and over is that false positive and the customer experience. It shows that we're so good at it is the thing that really irks people. >> Well if you think about that, having an application that allows you to make better decisions more quickly and prevent those false positives and take care of fraud is a huge competitive advantage for all the different players in that industry. And it's not just for the credit card companies of course, it's for the whole spectrum of people from the merchant all the way to the bank that are trying to deal with this problem. And so that's why it's one of the applications that we think of as a key example where we see a lot of opportunity. And certainly people that are looking at credit card fraud have been thinking about this problem for a while. But there's the complexity like we were discussing earlier of finding the talent, on being able to deliver these kinds of applications finding the technology that can actually scale to the processing volume. And so by delivering Omni-Channel fraud prevention as a Big Data application, that just puts our customers so much closer to the outcome that they want. And it makes it a lot easier to adopt. >> So as you sit, shift gears a little bit, as your VP of product hat, and there's a huge wide world of opportunity in front of you, we talked about IoT a little bit, obviously fraud, you've talked about Omni-Channel retail. How are you guys going to figure out where you want to go next? How are you prioritizing the world, and as you build up more of these applications is it going to be vertically focused, horizontally focused, what are you thoughts as you start down the application journey? >> So a few thoughts on that. Certainly one of the key indicators for me as a product manager when I look at where to go next and what applications we should build next, it comes down to what signal are the customers giving us? As we mentioned earlier, we built a platform for real time analytics and decision making, and one of the things that we see is broad adoption across a lot of different verticals. So I mentioned industrial IoT and financial services fraud prevention and advertising technology, and, and, and. We have a company that we're working with in GPS geofencing. So the possibilities are pretty interesting. But when it comes to prioritizing those different applications we have to also look at what are the economics involved for the customer and for us. So certainly one of the reasons we chose fraud prevention is that the economics are pretty obvious for our customers. Some of these other things are going to take a little bit longer for the economics to show up when it comes to the applications. So you'll certainly see us focusing on vertically oriented business applications because again the horizontals tend to be more like a platform and it's not close enough to delivering an outcome for a customer. But it's worth noting one of the things we see is that while we will deliver vertically oriented applications that oftentimes switching from one vertical app to another is really not a lot more than changing the kind of data we're analyzing, and changing the decision engine. But the fundamental idea of processing data in a pipeline at very high volume with fault tolerance and low latency, that remains the same in every case. So we see a lot of opportunity essentially as we solve an application in one vertical, to rescan it into another. >> So you can say you're tweaking the dials and tweaking the UDI. >> Tweaking the data and the rules that you apply to that data. So if you think about Omni-Channel fraud prevention, well it's not that big of a leap to look at healthcare fraud or into look at all the other kinds of fraud in different verticals that you might see. >> Do you ever see that you'll potentially break out the algorithm, I forget which one we're at, people are talking about algorithms as a service. Or is that too much of a bit, does there need to be a little bit more packaging? >> No I mean I think there will be cases where we will have an algorithm out of the box that provides some basics for the decisions support. But as we see a huge market springing up around AI and machine learning and machine scoring and all of that, there's a whole industry that's growing up around essentially, we provide you the best way to deliver that algorithm or that decision engine, that you train on your data and so forth. So that's certainly an area where we're looking from a partnership perspective. Where we already today partner with some of the AI vendors for what I would say is some custom applications that customers have deployed. But you'll see more of that in our applications coming up in the future. But as far as algorithms as a service, I think that's already here in the form of being able to query against some kind of AI with a question, you know essentially a model and then getting an answer back. >> Right well Nathan, exciting times, and your Big Data journey continues. >> It certainly does, thanks a lot Jeff. >> Thanks Nathan Trueblood from DataTorrent. I'm Jeff Frick, you're watching The CUBE, we'll see you next time, thanks for watching. (techno music)

Published Date : Jul 21 2017

SUMMARY :

Not in the context of a big show. You said that was a pretty good show for you guys. In fact as a software company one of the things and it's so logical that now that we have or the closer to when something happens and so really it's the higher value stuff And so to me all the buzz about streaming at the individual company level to implement this stuff. so that it takes a lot of the complexity is the democratization of the access to the data, is providing the ability to break up a huge amount of data one of the interesting topics you brought up is really speed And so when you start to look at the kinds of customers is make it easy for the businesses is the only benefit we're going to get from Flash. for real time analytics. to do so many other things that you couldn't even imagine that all needs to be fed by data kind of back to the application strategies. And so that's the kind of fundamental capability you need are the biggest problem with credit card fraud. is that false positive and the customer experience. And it's not just for the credit card companies of course, is it going to be vertically focused, horizontally focused, and one of the things that we see So you can say you're tweaking the dials that you apply to that data. break out the algorithm, I forget which one we're at, that provides some basics for the decisions support. and your Big Data journey continues. we'll see you next time, thanks for watching.

ENTITIES

Entity	Category	Confidence
Jeff Frick	PERSON	0.99+
Bill Schmarzo	PERSON	0.99+
Jeff	PERSON	0.99+
Nathan Trueblood	PERSON	0.99+
Nathan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
EMC	ORGANIZATION	0.99+
a week	QUANTITY	0.99+
Bill	PERSON	0.99+
DataTorrent	ORGANIZATION	0.99+
first app	QUANTITY	0.99+
Data Works	ORGANIZATION	0.99+
Palo Alto	LOCATION	0.99+
first application	QUANTITY	0.99+
a day	QUANTITY	0.99+
less than a second	QUANTITY	0.99+
second order	QUANTITY	0.98+
one	QUANTITY	0.98+
Hadoop	ORGANIZATION	0.97+
today	DATE	0.97+
third order	QUANTITY	0.97+
an hour	QUANTITY	0.97+
Big Data	ORGANIZATION	0.96+
first	QUANTITY	0.96+
first time	QUANTITY	0.95+
Flash	TITLE	0.94+
Hadoop	PERSON	0.92+
Hadoop	TITLE	0.91+
weeks	DATE	0.85+
one vertical	QUANTITY	0.83+
Hadoop Summit	EVENT	0.81+
The CUBE	ORGANIZATION	0.79+
one of the applications	QUANTITY	0.77+
Flink	ORGANIZATION	0.72+
Omni-Channel	ORGANIZATION	0.72+
UDI	ORGANIZATION	0.7+
Summit	EVENT	0.66+
CUBE	ORGANIZATION	0.57+
CUBEConversations	ORGANIZATION	0.47+
Spark	ORGANIZATION	0.46+
months	QUANTITY	0.43+

Linton Ward, IBM & Asad Mahmood, IBM - DataWorks Summit 2017

>> Narrator: Live from San Jose, in the heart of Silicon Valley, it's theCUBE! Covering Data Works Summit 2017. Brought to you by Hortonworks. >> Welcome back to theCUBE. I'm Lisa Martin with my co-host George Gilbert. We are live on day one of the Data Works Summit in San Jose in the heart of Silicon Valley. Great buzz in the event, I'm sure you can see and hear behind us. We're very excited to be joined by a couple of fellows from IBM. A very longstanding Hortonworks partner that announced a phenomenal suite of four new levels of that partnership today. Please welcome Asad Mahmood, Analytics Cloud Solutions Specialist at IBM, and medical doctor, and Linton Ward, Distinguished Engineer, Power Systems OpenPOWER Solutions from IBM. Welcome guys, great to have you both on the queue for the first time. So, Linton, software has been changing, companies, enterprises all around are really looking for more open solutions, really moving away from proprietary. Talk to us about the OpenPOWER Foundation before we get into the announcements today, what was the genesis of that? >> Okay sure, we recognized the need for innovation beyond a single chip, to build out an ecosystem, an innovation collaboration with our system partners. So, ranging from Google to Mellanox for networking, to Hortonworks for software, we believe that system-level optimization and innovation is what's going to bring the price performance advantage in the future. That traditional seamless scaling doesn't really bring us there by itself but that partnership does. >> So, from today's announcements, a number of announcements that Hortonworks is adopting IBM's data science platforms, so really the theme this morning of the keynote was data science, right, it's the next leg in really transforming an enterprise to be very much data driven and digitalized. We also saw the announcement about Atlas for data governance, what does that mean from your perspective on the engineering side? >> Very exciting you know, in terms of building out solutions of hardware and software the ability to really harden the Hortonworks data platform with servers, and storage and networking I think is going to bring simplification to on-premises, like people are seeing with the Cloud, I think the ability to create the analyst workbench, or the cognitive workbench, using the data science experience to create a pipeline of data flow and analytic flow, I think it's going to be very strong for innovation. Around that, most notable for me is the fact that they're all built on open technologies leveraging communities that universities can pick up, contribute to, I think we're going to see the pace of innovation really pick up. >> And on that front, on pace of innovation, you talked about universities, one of the things I thought was really a great highlight in the customer panel this morning that Raj Verma hosted was you had health care, insurance companies, financial services, there was Duke Energy there, and they all talked about one of the great benefits of open source is that kids in universities have access to the software for free. So from a talent attraction perspective, they're really kind of fostering that next generation who will be able to take this to the next level, which I think is a really important point as we look at data science being kind of the next big driver or transformer and also going, you know, there's not a lot of really skilled data scientists, how can that change over time? And this is is one, the open source community that Hortonworks has been very dedicated to since the beginning, it's a great it's really a great outcome of that. >> Definitely, I think the ability to take the risk out of a new analytical project is one benefit, and the other benefit is there's a tremendous, not just from young people, a tremendous amount of interest among programmers, developers of all types, to create data science skills, data engineering and data science skills. >> If we leave aside the skills for a moment and focus on the, sort of, the operationalization of the models once they're built, how should we think about a trained model, or, I should break it into two pieces. How should we think about training the models, where the data comes from and who does it? And then, the orchestration and deployment of them, Cloud, Edge Gateway, Edge device, that sort of thing. >> I think it all comes down to exactly what your use case is. You have to identify what use case you're trying to tackle, whether that's applicable to clinical medicine, whether that's applicable to finance, to banking, to retail or transportation, first you have to have that use case in mind, then you can go about training that model, developing that model, and for that you need to have a good, potent, robust data set to allow you to carry out that analysis and whether you want to do exploratory analysis or you want to do predictive analysis, that needs to be very well defined in your training stage. Once you have that model developed, then we have certain services, such as Watson Machine Learning, within data science experience that will allow you to take that model that you just developed, just moments ago, and just deploy that as a restful API that you can then embed into an application and to your solution, and in that solution you can basically use across industry. >> Are there some use cases where you have almost like a tiering of models where, you know, there're some that are right at the edge like, you know, a big device like a car and then, you know, there's sort of the fog level which is the, say, cell towers or other buildings nearby and then there's something in the Cloud that's sort of like, master model or an ensemble of models, I don't assume that's like, Evel Knievel would say you know, "Don't try that at home," but sort-of, is the tooling being built to enable that? >> So the tooling is already in existence right now. You can actually go ahead right now and be able to build out prototypes, even full-level, full-range applications right on the Cloud, and you can do that, you can do that thanks to Data Science Experience, you can do that thanks to IBM Bluemix, you can go ahead and do that type of analysis right there and not only that, you can allow that analysis to actually guide you along the path from building a model to building a full-range application and this is all happening on the Cloud level. We can talk more about it happening on on-premise level but on the Cloud level specifically, you can have those applications built on the fly, on the Cloud and have them deployed for web apps, for moblie apps, et cetera. >> One of the things that you talked about is use cases in certain verticals, IBM has been very strong and vertically focused for a very long time, but you kind of almost answered the question that I'd like to maybe explore a little bit more about building these models, training the models, in say, health care or telco and being able to deploy them, where's the horizontal benefits there that IBM would be able to deliver faster to other industries? >> Definitely, I think the main thing is that IBM, first of all, gives you that opportunity, that platform to say that hey, you have a data set, you have a use case, let's give you the tooling, let's give you the methodology to take you from data, to a model, to ultimately that full range application and specifically, I've built some applications specific to federal health care, specifically to address clinical medicine and behavioral medicine and that's allowed me to actually use IBM tools and some open source technologies as well to actually go out and build these applications on the fly as a prototype to show, not only the realm, the art of the possible when it comes to these technologies, but also to solve problems, because ultimately, that's what we're trying to accomplish here. We're trying to find real-world solutions to real-world problems. >> Linton, let me re-direct something towards you about, a lot of people are talking about how Moore's law slowing down or even ending, well at least in terms of speed of processors, but if you look at the, not just the CPU but FPGA or Asic or the tensor processing unit, which, I assume is an Asic, and you have the high speed interconnects, if we don't look at just, you know what can you fit on one chip, but you look at, you know 3D what's the density of transistors in a rack or in a data center, is that still growing as fast or faster, and what does it mean for the types of models that we can build? >> That's a great question. One of the key things that we did with the OpenPOWER Foundation, is to open up the interfaces to the chip, so with NVIDIA we have NVLink, which gives us a substantial increase in bandwidth, we have created something called OpenCAPI, which is a coherent protocol, to get to other types of accelerators, so we believe that hybrid computing in that form, you saw NVIDIDA on-stage this morning, and we believe especially for deploring the acceleration provided for GPUs is going to continue to drive substantial growth, it's a very exciting time. >> Would it be fair to say that we're on the same curve, if we look at it, not from the point of view of, you know what can we fit on a little square, but if we look at what can we fit in a data center or the power available to model things, you know Jeff Dean at Google said, "If Android users "talk into their phones for two to three minutes a day, "we need two to three times the data centers we have." Can we grow that price performance faster and enable sort of things that we did not expect? >> I think the innovation that you're describing will, in fact, put pressure on data centers. The ability to collect data from autonomous vehicles or other N points is really going up. So, we're okay for the near-term but at some point we will have to start looking at other technologies to continue that growth. Right now we're in the throws of what I call fast data versus slow data, so keeping the slow data cheaply and getting the fast data closer to the compute is a very big deal for us, so NAND flash and other non-volatile technologies for the fast data are where the innovation is happening right now, but you're right, over time we will continue to collect more and more data and it will put pressure on the overall technologies. >> Last question as we get ready to wrap here, Asad, your background is fascinating to me. Having a medical degree and working in federal healthcare for IBM, you talked about some of the clinical work that you're doing and the models that you're helping to build. What are some of the mission critical needs that you're seeing in health care today that are really kind of driving, not just health care organizations to do big data right, but to do data science right? >> Exactly, so I think one of the biggest questions that we get and one of the biggest needs that we get from the healthcare arena is patient-centric solutions. There are a lot of solutions that are hoping to address problems that are being faced by physicians on a day-to-day level, but there are not enough applications that are addressing the concerns that are the pain points that patients are facing on a daily basis. So the applications that I've started building out at IBM are all patient-centric applications that basically put the level of their data, their symptoms, their diagnosis, in their hands alone and allows them to actually find out more or less what's going wrong with my body at any particular time during the day and then find the right healthcare professional or the right doctor that is best suited to treating that condition, treating that diagnosis. So I think that's the big thing that we've seen from the healthcare market right now. The big need that we have, that we're currently addressing with our Cloud analytics technology which is just becoming more and more advanced and sophisticated and is trending towards some of the other health trends or technology trends that we have currently right now on the market, including the Blockchain, which is tending towards more of a de-centralized focus on these applications. So it's actually they're putting more of the data in the hands of the consumer, of the hands of the patient, and even in the hands of the doctor. >> Wow, fantastic. Well you guys, thank you so much for joining us on theCUBE. Congratulations on your first time being on the show, Asad Mahmood and Linton Ward from IBM, we appreciate your time. >> Thank you very much. >> Thank you. >> And for my co-host George Gilbert, I'm Lisa Martin, you're watching theCUBE live on day one of the Data Works Summit from Silicon Valley but stick around, we've got great guests coming up so we'll be right back.

Published Date : Jun 13 2017

SUMMARY :

Brought to you by Hortonworks. Welcome guys, great to have you both to build out an ecosystem, an innovation collaboration to be very much data driven and digitalized. the ability to really harden the Hortonworks data platform and also going, you know, there's not a lot is one benefit, and the other benefit is of the models once they're built, and for that you need to have a good, potent, to actually guide you along the path that platform to say that hey, you have a data set, the acceleration provided for GPUs is going to continue or the power available to model things, you know and getting the fast data closer to the compute for IBM, you talked about some of the clinical work There are a lot of solutions that are hoping to address Well you guys, thank you so much for joining us on theCUBE. on day one of the Data Works Summit from Silicon Valley

ENTITIES

Entity	Category	Confidence
George Gilbert	PERSON	0.99+
Lisa Martin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff Dean	PERSON	0.99+
Duke Energy	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Asad Mahmood	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Raj Verma	PERSON	0.99+
NVIDIA	ORGANIZATION	0.99+
Asad	PERSON	0.99+
Mellanox	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Evel Knievel	PERSON	0.99+
OpenPOWER Foundation	ORGANIZATION	0.99+
two pieces	QUANTITY	0.99+
Linton	PERSON	0.99+
Linton Ward	PERSON	0.99+
three times	QUANTITY	0.99+
Data Works Summit	EVENT	0.99+
one	QUANTITY	0.98+
first time	QUANTITY	0.98+
today	DATE	0.98+
one chip	QUANTITY	0.98+
one benefit	QUANTITY	0.97+
One	QUANTITY	0.96+
Android	TITLE	0.96+
three minutes a day	QUANTITY	0.95+
both	QUANTITY	0.94+
day one	QUANTITY	0.94+
Moore	PERSON	0.93+
this morning	DATE	0.92+
OpenCAPI	TITLE	0.91+
first	QUANTITY	0.9+
single chip	QUANTITY	0.89+
Data Works Summit 2017	EVENT	0.88+
telco	ORGANIZATION	0.88+
DataWorks Summit 2017	EVENT	0.85+
NVLink	COMMERCIAL_ITEM	0.79+
NVIDIDA	TITLE	0.76+
IBM Bluemix	ORGANIZATION	0.75+
Watson Machine Learning	TITLE	0.75+
Power Systems OpenPOWER Solutions	ORGANIZATION	0.74+
Edge	TITLE	0.67+
Edge Gateway	TITLE	0.62+
couple	QUANTITY	0.6+
Covering	EVENT	0.6+
Narrator	TITLE	0.56+
Atlas	TITLE	0.52+
Linton	ORGANIZATION	0.51+
Ward	PERSON	0.47+
3D	QUANTITY	0.36+

Mike Merritt-Holmes, Think Big - DataWorks Summit Europe 2017 - #DW17 - #theCUBE

>> Narrator: Covering Data Works Summit Europe 2017 brought to you by Horton Works. (uptempo, energetic music) >> Okay, welcome back everyone. We're here live in Germany at Munich for DataWorks Summit 2017, formerly Hadoop Summit. I'm John Furrier, my co-host Dave Vellante. Our next guest is Mike Merritt-Holmes, is senior Vice President of Global Services Strategy at Think Big, a Teradata company, formerly the co-founder of the Big Data Partnership merged in with Think Big and Teradata. Mike, welcome to The Cube. >> Mike: Thanks for having me. >> Great having an entrepreneur on, you're the co-founder, which means you've got that entrepreneurial blood, and I got to ask you, you know, you're in the big data space, you got to be pretty pumped by all the hype right now around AI because that certainly gives a lot of that extra, extra steroid of recognition. People love AI it gives a face to it, and certainly IOT is booming as well, Internet of Things, but big data's cruising along. >> I mean it's a great place to be. The train is certainly going very, very quickly right now. But the thing for us is, we've been doing data science and AI and trying to build business outcomes, and value for businesses for a long time. It's just great now to see this really, the data science and AI both were really starting to take effect and so companies are starting to understand it and really starting to really want to embrace it which is amazing. >> It's inspirational too, I mean I have a bunch of kids in my family, some are in college and some are in high school, even the younger generation are getting jazzed up on just software, right, but the big data stuffs been cruising along now. It's been a good, decade now of really solid DevOps culture, cloud now accelerating, but now the customers are forcing the vendors to be very deliberate in delivering great product, because the demand (chuckling) for real time, the demand for more stuff, is at an all time high. Can you elaborate your thoughts on, your reaction to what customers are doing, because they're the ones driving everyone, not to create friction, to create simplicity. >> Yeah, and you know, our customers are global organizations, trying to leverage this kind of technology, and they are, you know, doing an awesome amount of stuff right now to try to move them from, effectively, a step change in their business, whether it's, kind of, shipping companies doing preventive asset maintenance, or whether it's retailers looking to target customers in a more personalized way, or really understand who their customers are, where they come from, they're leveraging all those technologies, and really what they're doing is pushing the boundaries of all of them, and putting more demands on all of the vendors in the space to say, we want to do this quicker, faster, but more easily as well. >> And then the things that you're talking about, I want to get your thoughts on, because this is the conversation that you're having with customers, I want to extract is, have those kind of data-driven mindset questions, have come out the hype of the Hadoob. So, I mean we've been on a hype cycle for awhile, but now its back to reality. Where are we with the customer conversations, and, from your stand point, what are they working on? I mean, is it mostly IT conversation? Is it a frontoffice conversation? Is it a blend of both? Because, you know, data science kind of threads both sides of the fence there. >> Yeah, I mean certainly you can't do big data without IT being involved, but since the start, I mean, we've always been engaged with the business, it's always been about business outcome, because you bring data into a platform, you provide all this data science capability, but unless you actually find ROI from that, then there's no point, because you want to be moving the business forward, so it's always been about business engagement, but part of that has always been also about helping them to change their mindset. I don't want a report, I want to understand why you look at that report and what's the thing you're looking for, so we can start to identify that for you quicker. >> What's the coolest conversation you've been in, over the past year? >> Uh, I mean, I can't go into too much details, but I've had some amazing conversations with companies like Lego, for instance, they're an awesome company to work with. But when you start to see some of the things we're doing, we're doing some amazing object recognition with deep-learning in Japan. We're doing some ford analytics in the Nordics with deep-learning, we're doing some amazing stuff that's really pushing the boundaries, and when you start to put those deep-learning aspects into real world applications, and you start to see, customers clambering over to want to be part of that, it's a really exciting place to be. >> Let me just double-click on that for a second, because a lot of, the question I get a lot on The Cube, and certainly off-camera is, I want to do deep-learning, I want to do AI, I love machine learning, I hear, oh, it's finally coming to reality so people see it forming. How do they get started, what are some of the best practices of getting involved in deep-learning? Is it using open-source, obviously, is one avenue, but what advice would you give customers? >> From a deep-learning perspective, so I think first of all, I mean, a lot of the greatest deep-learning technologies, run open-source, as you rightly said, but I think actually there's a lot of tutorials and stuff on there, but really what you need is someone who has done it before, who knows where the pitfalls are, but also know when to use the right technology at the right time, and also to know around some of the aspects about whether using a deep-learning methodology is going to be the right approach for your business problem. Because a lot of companies are, like, we want to use this deep-learning thing, its amazing, but actually its not appropriate, necessarily, for the use case you're trying to draw from. >> It's the classic holy grail, where is it, if you don't know what you're looking for, it's hard to know when to apply it. >> And also, you've got to have enough data to utilize those methods as well, so. >> You hear a lot about the technical complexity associated with Hadoop specifically, but just ol' big data generally. I wonder if you could address that, in terms of what you're seeing, how people are dealing with that technical complexity but what other headwinds are there, in terms of adopting these new capabilities. >> Yeah, absolutely, so one of the challenges that we still see is that customers are struggling to leverage value from their platform, and normally that's because of the technical complexities. So we really, we introduced to the open-source world last month Kaylo, something you can download free of charge. It's completely open-source on the Apache license, and that really was about making it easier for customers to start to leverage the data on the platform, to self-serve injection onto that, and for data scientists to wrangle the data better. So, I think there's a real push right now about that next level up, if you like, in the technology stack to start to enable non-technical users to start to do interesting things on the platform directly, rather than asking someone to do it for them. And that, you know, we've had technologies in the PI space like Tableau, and, obviously, the (mumbling) did a data-warehouse solutions on Teradata that have been giving customers something, before and previously, but actually now they're asking for more, not just that, but more as well. And that's where we are starting to see the increases. >> So that's sort of operationalizing analytics as an example, what are some of the business complexities and challenges of actually doing that? >> That's a very good question, because, I think, when you find out great insight, and you go, wow you've built this algorithm, I've seen things I've never seen before, then the business wants to have that always on they want to know that it's that insight all the time is it changing, is it going up, is it going down do I need to change my business decisions? And doing that and making that operational means, not only just deploying it but also monitoring those models, being able to keep them up to date regularly, understanding whether those things are still accurate or not, because you don't want to be making business decisions, on algorithms that are now a bit stale. So, actually operationalizing it, is about building out an entire capability that's keeping these things accurate, online, and, therefore, there's still a bit of work to do, I think, actually in the marketplace still, around building out an operational capability. >> So you kind of got bottom-up, top-down. Bottom-up is the you know the Hadoop experiments, and then top-down is CXO saying we need to do big data. Have those two constituencies come together now, who's driving the bus? Are they aligned or is it still, sort of, a mess organizationally? >> Yeah, I mean, generally, in the organization, there's someone playing the Chief Data Officer, whether they have that as a title or a roll, ultimately someone is in charge of generating value from the data they have in the organization. But they can't do that with IT, and I think where we've seen companies struggle is where they've driven it from the bottom-up, and where they succeed is where they drive it from the top-down, because by driving it from the top-down, you really align what you're doing with the business and strategy that you have. So, the company strategy, and what you're trying to achieve, but ultimately, they both need to meet in the middle, and you can't do one without the other. >> And one of our practitioner friends, who's describing this situation in our office in Palo Alto, a couple of weeks ago. he said, you know, the challenge we have as an organization is, you've got top people saying alright, we're moving. And they start moving, the train goes, and then you've got kind of middle management, sort of behind them, and then you got the doers that are far behind, and aligning those is a huge challenge for this particular organization. How do you recommend organizations to address that alignment challenge, does Think Big have capabilities to help them through that, or is that, sort of, you got to call Accenture? >> In essence, our reason for being is to help with those kind of things, and, you know, whether it's right from the start, so, oh, my God, my Chief Data Officer or my CEO is saying we need to be doing this thing right now, come on, let's get on with it, and we help them to understand what does that mean, what are the use cases, how, where's the value going to come from, what's that architecting to look like, or whether its helping them to build out capability, in terms of data science or building out the cluster itself, and then managing that and providing training for staff. Our whole reason for being is supporting that transformation as a business, from, oh, my God, what do I do about this thing, to, I'm fully embracing it, I know what's going on, I'm enabling my business, and I'm completely comfortable with that world. >> There was a lot talk three, or four or five years ago, about the ROI of so-called big data initiatives, not being really, you know, there were edge cases which were huge ROI, but there was a lot of talk about not a lot of return. My question is, has that, first question, has that changed, are you starting to see much bigger phone numbers coming back where the executives are saying yeah, lets double down on this. >> Definitely, I'm definitely seeing that. I mean, I think it's fair to say that companies are a bit nervous about reporting their ROI around this stuff, in some cases, so there's more ROI out there than you necessarily see out in the public place, but-- >> Why is that? Because they don't want to expose to the competition, or they don't want to front run their earnings, or whatever it is? >> They're trying to get a competitive edge. The minute you start saying, we're doing this, their competitors have an opportunity to catch up. >> John: Very secretive. >> Yeah and I think, it's not necessarily about what they're doing, it's about keeping the edge over their customers, really, over their competitors. So, but what we're seeing is that many customers are getting a lot of ROI more recently because they're able to execute better, rather than being struggling with the IT problems, and even just recently, for instance, we had a customer of ours, the CEO phones us up and says, you know what, we've got this problem with our sales. We don't really know why this is going down, you know, in this country, in this part of the world, it's going up, in this country, it's going down, we don't know why, and that's making us very nervous. Could you come in and just get the data together, work out why it's happening, so that we can understand what it is. And we came in, and within weeks, we were able to give them a very good insight into exactly why that is, and they changed their strategy, moving forward, for the next year, to focus on addressing that problem, and that's really amazing ROI for a company to be able to get that insight. Now, we're working with them to operationalize that, so that particular insight is always available to them, and that's an example of how companies are now starting to see that ROI come through, and a lot of it is about being able to articulate the right business question, rather than trying to worry about reports. What is the business question I'm trying to solve or answer, and that's when you can start to see the ROI come through. >> Can you talk about the customer orientation when they get to that insight, because you mentioned earlier that they got used to the reports, and you mentioned visualization, Tableau, they become table states, once you get addicted to the visualization, you want to extract more insights so the pressure seems to be getting more insight. So, two questions, process gap around what they need to do process-wise, and then just organizational behavior. Are they there mentally, what are some of the criteria in your mind, in your experiments, with customers around the processes that they go through, and then organizational mindset. >> Yeah, so what I would say is, first of all, from an organizational mindset perspective, it's very important to start educating, not just the analysis team, but the entire business on what this whole machine-learning, big data thing is all about, and how to ask the right questions. So, really starting to think about the opportunities you have to move your business forward, rather than what you already know, and think forward rather than retrospective. So, the other thing we often have to teach people, as well, is that this isn't about what you can get from the data warehouse, or replacing your data warehouse or anything like that. It's about answering the right questions, with the right tools, and here is a whole set of tools that allow you to answer different questions that you couldn't before, so leverage them. So, that's very important, and so that mindset requires time actually, to transform business into that mindset, and a lot of commitment from the business to make that happen. >> So, mindset first, and then you look at the process, then you get to the product. >> Yep, so, and basically, once you have that mindset, you need to set up an engine that's going to run, and start to drive the ROI out, and the engine includes, you know, your technical folk, but also your business users, and that engine will then start to build up momentum. The momentum builds more interest, and, overtime, you start to get your entire business into using these tools. >> It kind of makes sense, just kind of riffing in real time here, so the product-gap conversation should probably come after you lay that out first, right? >> Totally, yeah, I mean, you don't choose a product before you know what you need to do with it. So, but actually often companies don't know what they need to do with it, because they've got the wrong mindset in the first place. And so part of the road map stuff that we do, that we have a road map offering, is about changing that mindset, and helping them to get through that first stage, where we start to put, articulate the right use cases, and that really is driving a lot of value for our customers. Because they start from the right place-- >> Sometimes we hear stories, like the product kind of gives them a blind spot, because they tend to go into, with a product mindset first, and that kind of gives them some baggage, if you will. >> Well, yeah, because you end up with a situation, where you go, you get a product in, and then you say what can we do with it. Or, in fact, what happens is the vendor will say, these are the things you could do, and they give you use cases. >> It constrains things, forecloses tons of opportunities, because you're stuck within a product mindset. >> Yeah, exactly that, and you're not, you don't want to be constrained. And that's why open-source, and the kind of ecosystem that we have within the big data space is so powerful, because there's so many different tools for different things but don't choose your tool until you know what you're trying to achieve. >> I have a market question, maybe you just give us opinion, caveat, if you like, it's sort of a global, macro view. When we started first looking at the big data market, we noticed right away the dominant portion of revenue was coming from services. Hardware was commodity, so, you know, maybe sort of less than you would, obviously, in a mainframe world, and open-source software has a smaller contribution, so services dominated, and, frankly, has continued to dominate, since the early days. Do you see that changing, or do you think those percentages, if you will, will stay relatively constant? >> Well, I think it will change over time, but not in the near future, for sure, there's too much advancement in the technology landscape for that to stop, so if you had a set of tools that weren't really evolving, becoming very mature, and that's what tools you had, ultimately, the skill sets around them start to grow, and it becomes much easier to develop stuff, and then companies start to build out industry- or solutions-specific stuff on top, and it makes it very easy to build products. When you have an ecosystem that's evolving, growing with the speed it is, you're constantly trying to keep up with that technology, and, therefore, services have to play an awful big part in making sure that you are using the right technology, at the right time, and so, for the near future, for certain, that won't change. >> Complexity is your friend. >> Yeah, absolutely. Well, you know, we live in a complex world, but we live and breathe this stuff, so what's complex to some is not to us, and that's why we add value, I guess. >> Mike Merritt-Holmes here inside The Cube with Teradata Think Big. Thanks for spending the time sharing your insights. >> Thank you for having me. >> Understand the organizational mindset, identify the process, then figure out the products. That's the insight here on The Cube, more coverage of Data Works Summit 2017, here in Germany after this short break. (upbeat electronic music)

Published Date : Apr 5 2017

SUMMARY :

brought to you by Horton Works. formerly the co-founder of and I got to ask you, you know, I mean it's a great place to be. but the big data stuffs and they are, you know, of the fence there. that for you quicker. and when you start to put but what advice would you give customers? a lot of the greatest if you don't know what you're looking for, got to have enough data I wonder if you could address that, and for data scientists to and you go, wow you've Bottom-up is the you know and you can't do one without the other. and then you got the is to help with those kind of things, not being really, you know, in the public place, but-- The minute you start and that's when you can start so the pressure seems to and a lot of commitment from the business then you get to the product. and the engine includes, you and helping them to get because they tend to go into, and then you say what can we do with it. because you're stuck and the kind of ecosystem that we have of less than you would, and so, for the near future, Well, you know, we live Thanks for spending the identify the process, then

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Japan	LOCATION	0.99+
Mike	PERSON	0.99+
John Furrier	PERSON	0.99+
Lego	ORGANIZATION	0.99+
Mike Merritt-Holmes	PERSON	0.99+
Teradata	ORGANIZATION	0.99+
Germany	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Think Big	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
first question	QUANTITY	0.99+
Munich	LOCATION	0.99+
Accenture	ORGANIZATION	0.99+
last month	DATE	0.99+
one	QUANTITY	0.99+
Horton Works	ORGANIZATION	0.99+
Big Data Partnership	ORGANIZATION	0.99+
both	QUANTITY	0.99+
both sides	QUANTITY	0.98+
two constituencies	QUANTITY	0.98+
next year	DATE	0.98+
first	QUANTITY	0.98+
Nordics	LOCATION	0.98+
first stage	QUANTITY	0.98+
#DW17	EVENT	0.97+
Data Works Summit 2017	EVENT	0.97+
DataWorks Summit 2017	EVENT	0.96+
Tableau	TITLE	0.95+
Hadoop	TITLE	0.95+
four	DATE	0.93+
Hadoop Summit	EVENT	0.93+
five years ago	DATE	0.9+
Apache	TITLE	0.89+
The Cube	ORGANIZATION	0.87+
Vice President	PERSON	0.87+
Data Works Summit Europe 2017	EVENT	0.83+
a couple of weeks ago	DATE	0.82+
one avenue	QUANTITY	0.82+
DataWorks Summit Europe 2017	EVENT	0.8+
Kaylo	PERSON	0.8+
past year	DATE	0.79+
Global Services Strategy	ORGANIZATION	0.79+
Teradata Think Big	ORGANIZATION	0.77+
three	QUANTITY	0.76+
double	QUANTITY	0.75+
Think Big -	EVENT	0.71+
Covering	EVENT	0.69+
Hadoob	ORGANIZATION	0.62+
decade	QUANTITY	0.58+
second	QUANTITY	0.58+
Cube	COMMERCIAL_ITEM	0.56+
CXO	PERSON	0.48+
Cube	ORGANIZATION	0.46+
#theCUBE	ORGANIZATION	0.45+

Scott Gnau, Hortonworks Big Data SV 17 #BigDataSV #theCUBE

>> Narrator: Live from San Jose, California it's theCUBE covering Big Data Silicon Valley 2017. >> Welcome back everyone. We're here live in Silicon Valley. This is theCUBE's coverage of Big Data Silicon Valley. Our event in conjunction with O'Reilly Strata Hadoop, of course we have our Big Data NYC event and we have our special popup event in New York and Silicon Valley. This is our Silicon Valley version. I'm John Furrier, with my co-host Jeff Frick and our next guest is Scott Gnau, CTO of Hortonworks. Great to have you on, good to see you again. >> Scott: Thanks for having me. >> You guys have an event coming up in Munich, so I know that there's a slew of new announcements coming up with Hortonworks in April, next month in Munich for your EU event and you're going to be holding a little bit of that back, but some interesting news this morning. We had Wei Wang yesterday with Microsoft Azure team HDInsight's. That's flowering nicely, a good bet there, but the question has always been at least from people in the industry and we've been questioning you guys on, hey, where's your cloud strategy? Because as a disture you guys have been very successful with your always open approach. Microsoft as your guy was basically like, that's why we go with Hortonworks because of pure open source, committed to that from day one, never wavered. The question is cloud first, AI, machine learning this is a sweet spot for IoT. You're starting to see the collision between cloud and data, and in the intersection of that is deep learning, IoT, a lot of amazing new stuff going to be really popping out of this. Your thoughts and your cloud strategy. >> Obviously we see cloud as an enabler for these use cases. In many instances the use cases can be femoral. They might not be tied immediately to an ROI, so you're going to go to the capital committee and all this kind of stuff, versus let me go prove some value very quickly. It's one of the key enablers core ingredients and when we say cloud first, we really mean it. It's something where the solutions work together. At the same time, cloud becomes important. Our cloud strategy and I think we've talked about this in many different venues is really twofold. One is we want to give a common experience to our customers across whatever footprint they chose, whether it be they roll their own, they do it on print, they do it in public cloud and they have choice of different public cloud vendors. We want to give them a similar experience, a good experience that is enterprise great, platform level experience, so not point solution kind of one function and then get rid of it, but really being able to extend the platform. What I mean by that of course, is being able to have common security, common governance, common operational management. Being able to have a blueprint of the footprint so that there's compatibility of applications that get written. And those applications can move as they decide to change their mind about where their platform hosting the data, so our goal really is to give them a great and common experience across all of those footprints number one. Then number two, to offer a lot of choices across all of those domains as well, whether it be, hey I want to do infrastructure as a service and I know what I want on one end of the spectrum to I'm not sure exactly what I want, but I want to spin up a data science cluster really quickly. Boom, here's a platform as a service offer that runs and is available very easy to consume, comes preconfigured and kind of everywhere in between. >> By the way yesterday Wei was pointing out 99.99 SLAs on some of the stuff coming out. >> Are amazing and obviously in the platform as a service space, you also get the benefit of other cloud services that can plug in that wouldn't necessarily be something you'd expect to be typical of a core Hadoop platform. Getting the SLAs, getting the disaster recovery, getting all of the things that cloud providers can provide behind the scenes is some additional upside obviously as well in those deployment options. Having that common look and feel, making it easy, making it frictionless, are all of the core components of our strategy and we saw a lot of success with that in coming out of year end last year. We see rapid customer adoption. We see rapid customer success and frankly I see that I would say that 99.9% of customers that I talk to are hybrid where they have a foot in nonprem and they have a foot in cloud and they may have a foot in multiple clouds. I think that's indicative of what's going on in the world. Think about the gravity of data. Data movement is expensive. Analytics and multi-core chipsets give us the ability to process and crunch numbers at unprecedented rates, but movement of data is actually kind of hard. There's latency, it can be expensive. A lot of data in the future, IoT data, machine data is going to be created and live its entire lifecycle in the cloud, so the notion of being able to support hybrid with a common look and feel, I think very strategically positions us to help our customers be successful when they start actually dealing with data that lives its entire lifecycle outside the four walls of the data center. >> You guys really did a good job I thought on having that clean positioning of data at rest, but also you had the data in motion, which I think ahead of its time you guys really nailed that and you also had the IoT edge in mind, we've talked I think two years ago and this was really not on everyone's radar, but you guys saw that, so you've made some good bets on the HDInsight and we talked about that yesterday with Wei on here and Microsoft. So edge analytics and data in motion a very key right now, because that batch streaming world's coming together and IoTs flooding it with all this kind of data. We've seen the success in the clouds where analytics have been super successful with powering by the clouds. I got to ask you with Microsoft as your preferred cloud provider, what's the current status for customers who have data in motion, specifically IoT too. It's the common question we're getting, not necessarily the Microsoft question, but okay I've got edge coming in strong-- >> Scott: Mm-hmm >> and I'm going to run certainly hybrid in a multi cloud world, but I want to put the cloud stuff for most of the analytics and how do I deal with the edge? >> Wow, there's a lot there (laughs) >> John: You got 10 seconds, go! (laughs) You have Microsoft as your premier cloud and you have an Amazon relationship with a marketplace and what not. You've got a great relationship with Microsoft. >> Yeah. I think it boils down to a bigger macro thing and hopefully I'll peel into some specifics. I think number one, we as an industry kind of short change ourselves talking about Hadoop, Hadoop, Hadoop, Hadoop, Hadoop. I think it's bigger than Hadoop, not different than but certainly than, right, and this is where we started with the whole connected platforms indicating of traditional Hadoop comes from traditional thinking of data at rest. So I've got some data, I've stored it and I want to run some analytics and I want to be able to scale it and all that kinds of stuff. Really good stuff, but only part of the issue. The other part of the issue is data that's moving, data that's being created outside of the four walls of the data center. Data that's coming from devices. How do I manage and move and handle all of that? Of course there have been different hype cycles on streaming and streaming analytics and data flow and all those things. What we wanted to do is take a very protracted look at the problem set of the future. We said look it's really about the entire lifecycle of data from inception to demise of the data or data being delayed, delete it, which very infrequently happens these days. >> Or cold storage-- >> Cold storage, whatever. You know it's created at the edge, it moves through, it moves in different places, its landed, its analyzed, there are models built. But as models get deployed back out to the edge, that entire problem set is a problem set that I think we, certainly we at Hortonworks are looking to address with the solutions. That actually is accelerated by the notion of multiple cloud footprints because when you think about a customer that may have multiple cloud footprints and trying to tie the data together, it creates a unique opportunity, I think there's a reversal in the way people need to think about the future of compute. Where having been around for a little bit of time, it's always been let me bring all the data together to the applications and have the applications run and then I'll send answers back. That is impossible in this new world order, whether it be the cloud or the fog or any of the things in between or the data center, data are going to be distributed and data movement will become the expensive thing, so it will be very important to be able to have applications that are deployable across a grid, and applications move to the data instead of data moving to the application. And or at least to have a choice and be able to be selective so that I believe that ultimately scalability five years from now, ten years from now, it's not going to be about how many exabytes I have in my cloud instance, that will be part of it, it will be about how many edge devices can I have computing and analyzing simultaneously and coordinating with each other this information to optimize customer experience, to optimize the way an autonomous car drives or anywhere in between. >> It's totally radical, but it's also innovative. You mentioned the cost of moving data will be the issue. >> Scott: Yeah. >> So that's going to change the architecture of the edge. What are you seeing with customers, cuz we're seeing a lot of people taking a protracted view like you were talking about and looking at the architectures, specifically around okay. There's some pressure, but there's no real gun to the head yet, but there's certainly pressure to do architectural thinking around edge and some of the things you mentioned. Patterns, things you can share, anecdotal stories, customer references. >> You know the common thing is that customers go, "Yep, that's going to be interesting. "It's not hitting me right now, "but I know it's going to be important. "How can I ease into it and kind of without the suspenders "how can I prove this is going to work and all that." We've seen a lot of certainly interest in that. What's interesting is we're able to apply some of that futuristic IoT technology in Hortonworks data flow that includes NiFi and MiNiFi out to the edge to traditional problems like, let me get the data from the branches into the central office and have that roundtrip communication to a banker who's talking to a customer and has the benefit of all the analytics at home, but I can guarantee that roundtrip of data and analytics. Things that we thought were solid before, can be solved very easily and efficiently with this technology, which is then also extensible even out further to the edge. In many instances, I've been surprised by customer adoption with them saying, "Yeah, I get that, but gee this helps me "solve a problem that I've had for the last 20 years "and it's very easy and it sets me up "on the right architectural course, "for when I start to add in those edge devices, "I know exactly how I'm going to go do it." It's been actually a really good conversation that's very pragmatic with immediate ROI, but again positioning people for the future that they know is coming. Doing that, by the way, we're also able to prove the security. Think about security is a big issue that everyone's talking about, cyber security and everything. That's typically security about my data center where I've got this huge fence around it and it's very controlled. Think about edge devices are now outside that fence, so security and privacy and provenance become really, really interesting in that world. It's been gratifying to be able to go prove that technology today and again put people on that architectural course that positions them to be able to go out further to the edge as their business demands it. >> That's such great validation when they come back to you with a different solution based on what you just proposed. >> Scott: Yep. >> That means they really start to understand, they really start to see-- >> Scott: Yep. >> How it can provide value to them. >> Absolutely, absolutely. That is all happening and again like I said this I think the notion of the bigger problem set, where it's not just storing data and analyzing data, but how do I have portable applications and portable applications that move further and further out to the edge is going to be the differentiation. The future successful deployments out there because those deployments and folks are able to adopt that kind of technology will have a time to market advantage, they'll have a latency advantage in terms of interaction with a customer, not waiting for that roundtrip of really being able to push out customized, tailored interactions, whether it be again if it's driving your car and stopping on time, which is kind of important, to getting a coupon when you're walking past a store and anywhere in between. >> It's good you guys have certainly been well positioned for being flexible, being an open source has been a great advantage. I got to ask you the final question for the folks watching, I'm sure you guys answer this either to investors or whatnot and customers. A lot's changed in the past five years and a lot's happening right now. You just illustrated it out, the scenario with the edge is very robust, dynamic, changing, but yet value opportunity for businesses. What's the biggest thing that's changing right now in the Hortonworks view of the world that's notable that you thinks worth highlighting to people watching that are your customers, investors, or people in the industry. >> I think you brought up a good point, the whole notion of open and the whole groundswell around open source, open community development as a new paradigm for delivering software. I talked a little bit about a new paradigm of the gravity of data and sensors and this new problem set that we've got to go solve, that's kind of one piece of this storm. The other piece of the storm is the adoption and the wave of open, open community collaboration of developers versus integrated silo stacks of software. That's manifesting itself in two places and obviously I think we're an example of helping to create that. Open collaboration means quicker time to market and more innovation and accelerated innovation in an increasingly complex world. That's one requirement slash advantage of being in the open world. I think the other thing that's happening is the generation of workforce. When I think about when I got my first job, I typed a resume with a typewriter. I'm dating myself. >> White out. >> Scott: Yeah, with white out. (laughter) >> I wasn't a good typer. >> Resumes today is basically name and get GitHub address. Here's my body of work and it's out there for everybody to see, and that's the mentality-- >> And they have their cute videos up there as well, of course. >> Scott: Well yeah, I'm sure. (laughter) >> So it's kind of like that shift to this is now the new paradigm for software delivery. >> This is important. You've got theCUBE interview, but I mean you're seeing it-- >> Is that the open source? >> In the entertainment. No, we're seeing people put huge interviews on their LinkedIn, so this notion of collaboration in the software engineering mindset. You go back to when we grew up in software engineering, now it went to open source, now it's GitHub is essentially a social network for your body of work. You're starting to see the software development open source concepts, they apply to data engineering, data science is still early days. Media media creation what not so, I think that's a really key point in the data science tools are still in their infancy. >> I think open, and by the way I'm not here to suggest that everything will be open, but I think a majority and-- >> Collaborative the majority of the problem that we're solving will be collaborative, it will be ecosystem driven and where there's an extremely large market open will be the most efficient way to address it. And certainly no one's arguing that data and big data is not a large market. >> Yep. You guys are all on the cloud now, you got the Microsoft, any other updates that you think worth sharing with folks. >> You've got to come back and see us in Munich then. >> Alright. We'll be there, theCUBE will be there in Munich in April. We have the Hortonworks coverage going on in Data Works, the conference is now called Data Works in Munich. This is theCUBE here with Scott Gnau, the CTO of Hortonworks. Breaking it down I'm John Furrier with Jeff Frick. More coverage from Big Data SV in conjunction with Strata Hadoop after the short break. (upbeat music)

Published Date : Mar 15 2017

SUMMARY :

it's theCUBE covering Big good to see you again. and in the intersection of blueprint of the footprint on some of the stuff coming out. of customers that I talk to are hybrid I got to ask you with Microsoft and you have an Amazon relationship of the data center. and be able to be selective You mentioned the cost of and looking at the architectures, and has the benefit on what you just proposed. and further out to the edge I got to ask you the final and the whole groundswell Scott: Yeah, with white out. and that's the mentality-- And they have their cute videos Scott: Well yeah, I'm sure. So it's kind of like that shift to but I mean you're seeing it-- in the data science tools the majority of the you got the Microsoft, You've got to come back We have the Hortonworks

ENTITIES

Entity	Category	Confidence
Scott	PERSON	0.99+
Jeff Frick	PERSON	0.99+
John	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
New York	LOCATION	0.99+
Munich	LOCATION	0.99+
John Furrier	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
April	DATE	0.99+
yesterday	DATE	0.99+
10 seconds	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
San Jose, California	LOCATION	0.99+
99.99	QUANTITY	0.99+
two places	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
first job	QUANTITY	0.99+
GitHub	ORGANIZATION	0.99+
next month	DATE	0.99+
two years ago	DATE	0.98+
today	DATE	0.98+
99.9%	QUANTITY	0.98+
ten years	QUANTITY	0.97+
Big Data	EVENT	0.97+
five years	QUANTITY	0.96+
Big Data Silicon Valley 2017	EVENT	0.96+
this morning	DATE	0.95+
O'Reilly Strata Hadoop	ORGANIZATION	0.95+
One	QUANTITY	0.95+
Data Works	EVENT	0.94+
year end last year	DATE	0.94+
one	QUANTITY	0.93+
Hadoop	TITLE	0.93+
theCUBE	ORGANIZATION	0.93+
one piece	QUANTITY	0.93+
Wei Wang	PERSON	0.91+
NYC	LOCATION	0.9+
Wei	PERSON	0.88+
past five years	DATE	0.87+
first	QUANTITY	0.86+
CTO	PERSON	0.83+
four walls	QUANTITY	0.83+
Big Data SV	ORGANIZATION	0.83+
#BigDataSV	EVENT	0.82+
one function	QUANTITY	0.81+
Big Data SV 17	EVENT	0.78+
EU	LOCATION	0.73+
HDInsight	ORGANIZATION	0.69+
Strata Hadoop	PERSON	0.69+
one requirement	QUANTITY	0.68+
number two	QUANTITY	0.65+

Nick Pentreath, IBM STC - Spark Summit East 2017 - #sparksummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is The Cube, covering Spark Summit East 2017. Brought to you by Data Bricks. Now, here are your hosts, Dave Valente and George Gilbert. >> Boston, everybody. Nick Pentry this year, he's a principal engineer a the IBM Spark Technology Center in South Africa. Welcome to The Cube. >> Thank you. >> Great to see you. >> Great to see you. >> So let's see, it's a different time of year, here that you're used to. >> I've flown from, I don't know the Fahrenheit's equivalent, but 30 degrees Celsius heat and sunshine to snow and sleet, so. >> Yeah, yeah. So it's a lot chillier there. Wait until tomorrow. But, so we were joking. You probably get the T-shirt for the longest flight here, so welcome. >> Yeah, I actually need the parka, or like a beanie. (all laugh) >> Little better. Long sleeve. So Nick, tell us about the Spark Technology Center, STC is its acronym and your role, there. >> Sure, yeah, thank you. So Spark Technology Center was formed by IBM a little over a year ago, and its mission is to focus on the Open Source world, particularly Apache Spark and the ecosystem around that, and to really drive forward the community and to make contributions to both the core project and the ecosystem. The overarching goal is to help drive adoption, yeah, and particularly enterprise customers, the kind of customers that IBM typically serves. And to harden Spark and to make it really enterprise ready. >> So why Spark? I mean, we've watched IBM do this now for several years. The famous example that I like to use is Linux. When IBM put $1 billion into Linux, it really went all in on Open Source, and it drove a lot of IBM value, both internally and externally for customers. So what was it about Spark? I mean, you could have made a similar bet on Hadoop. You decided not to, you sort of waited to see that market evolve. What was the catalyst for having you guys all go in on Spark? >> Yeah, good question. I don't know all the details, certainly, of what was the internal drivers because I joined HTC a little under a year ago, so I'm fairly new. >> Translate the hallway talk, maybe. (Nick laughs) >> Essentially, I think you raise very good parallels to Linux and also Java. >> Absolutely. >> So Spark, sorry, IBM, made these investments and Open Source technologies that had ceased to be transformational and kind of game-changing. And I think, you know, most people will probably admit within IBM that they maybe missed the boat, actually, on Hadoop and saw Spark as the successor and actually saw a chance to really dive into that and kind of almost leap frog and say, "We're going to "back this as the next generation analytics platform "and operating system for analytics "and big debt in the enterprise." >> Well, I don't know if you happened to watch the Super Bowl, but there's a saying that it's sometimes better to be lucky than good. (Nick laughs) And that sort of applies, and so, in some respects, maybe missing the window on Hadoop was not a bad thing for IBM >> Yeah, exactly because not a lot of people made a ton of dough on Hadoop and they're still sort of struggling to figure it out. And now along comes Spark, and you've got this more real time nature. IBM talks a lot about bringing analytics and transactions together. They've made some announcements about that and affecting business outcomes in near real time. I mean, that's really what it's all about and one of your areas of expertise is machine learning. And so, talk about that relationship and what it means for organizations, your mission. >> Yeah, machine learning is a key part of the mission. And you've seen the kind of big debt in enterprise story, starting with the kind of Hadoop and data lakes. And that's evolved into, now we've, before we just dumped all of this data into these data lakes and these silos and maybe we had some Hadoop jobs and so on. But now we've got all this data we can store, what are we actually going to do with it? So part of that is the traditional data warehousing and business intelligence and analytics, but more and more, we're seeing there's a rich value in this data, and to unlock it, you really need intelligent systems. You need machine learning, you need AI, you need real time decision making that starts transcending the boundaries of all the rule-based systems and human-based systems. So we see machine learning as one of the key tools and one of the key unlockers of value in these enterprise data stores. >> So Nick, perhaps paint us a picture of someone who's advanced enough to be working with machine learning with BMI and we know that the tool chain's kind of immature. Although, IBM with Data Works or Data First has a fairly broad end-to-end sort of suit of tools, but what are the early-use cases? And what needs to mature to go into higher volume production apps or higher-value production apps? >> I think the early-use cases for machine learning in general and certainly at scale are numerous and they're growing, but classic examples are, let's say, recommendation engines. That's an area that's close to my heart. In my previous life before IBM, I bought the startup that had a recommendation engine service targeting online stores and new commerce players and social networks and so on. So this is a great kind of example use case. We've got all this data about, let's say, customer behavior in your retail store or your video-sharing site, and in order to serve those customers better and make more money, if you can make good recommendations about what they should buy, what they should watch, or what they should listen to, that's a classic use case for machine learning and unlocking the data that is there, so that is one of the drivers of some of these systems, players like Amazon, they're sort of good examples of the recommendation use case. Another is fraud detection, and that is a classic example in financial services, enterprise, which is a kind of staple of IBM's customer base. So these are a couple of examples of the use cases, but the tool sets, traditionally, have been kind of cumbersome. So Amazon bought everything from scratch themselves using customized systems, and they've got teams and teams of people. Nowadays, you've got this bold into Apache Spark, you've got it in Spark, a machine learning library, you've got good models to do that kind of thing. So I think from an algorithmic perspective, there's been a lot of advancement and there's a lot of standardization and almost commoditization of the model side. So what is missing? >> George: Yeah, what else? >> And what are the shortfalls currently? So there's a big difference between the current view, I guess the hype of the machine learning as you've got data, you apply some machine learning, and then you get profit, right? But really, there's a hugely complex workflow that involves this end-to-end story. You've got data coming from various data sources, you have to feed it into one centralized system, transform and process it, extract your features and do your sort of hardcore data signs, which is the core piece that everyone sort of thinks about as the only piece, but that's kind of in the middle and it makes up a relatively small proportion of the overall chain. And once you've got that, you do model training and selection testing, and you now have to take that model, that machine-learning algorithm and you need to deploy it into a real system to make real decisions. And that's not even the end of it because once you've got that, you need to close the loop, what we call the feedback loop, and you need to monitor the performance of that model in the real world. You need to make sure that it's not deteriorating, that it's adding business value. All of these ind of things. So I think that is the real, the piece of the puzzle that's missing at the moment is this end-to-end, delivering this end-to-end story and doing it at scale, securely, enterprise-grade. >> And the business impact of that presumably will be a better-quality experience. I mean, recommendation engines and fraud detection have been around for a while, they're just not that good. Retargeting systems are too little too late, and kind of cumbersome fraud detection. Still a lot of false positives. Getting much better, certainly compressing the time. It used to be six months, >> Yes, yes. Now it's minutes or second, but a lot of false positives still, so, but are you suggesting that by closing that gap, that we'll start to see from a consumer standpoint much better experiences? >> Well, I think that's imperative because if you don't see that from a consumer standpoint, then the mission is failing because ultimately, it's not magic that you just simply throw machine learning at something and you unlock business value and everyone's happy. You have to, you know, there's a human in the loop, there. You have to fulfill the customer's need, you have to fulfill consumer needs, and the better you do that, the more successful your business is. You mentioned the time scale, and I think that's a key piece, here. >> Yeah. >> What makes better decisions? What makes a machine-learning system better? Well, it's better data and more data, and faster decisions. So I think all of those three are coming into play with Apache Spark, end-to-end's story streaming systems, and the models are getting better and better because they're getting more data and better data. >> So I think we've, the industry, has pretty much attacked the time problem. Certainly for fraud detection and recommendation systems the quality issue. Are we close? I mean, are we're talking about 6-12 months before we really sort of start to see a major impact to the consumer and ultimately, to the company who's providing those services? >> Nick: Well, >> Or is it further away than that, you think? >> You know, it's always difficult to make predictions about timeframes, but I think there's a long way to go to go from, yeah, as you mentioned where we are, the algorithms and the models are quite commoditized. The time gap to make predictions is kind of down to this real-time nature. >> Yeah. >> So what is missing? I think it's actually less about the traditional machine-learning algorithms and more about making the systems better and getting better feedback, better monitoring, so improving the end user's experience of these systems. >> Yeah. >> And that's actually, I don't think it's, I think there's a lot of work to be done. I don't think it's a 6-12 month thing, necessarily. I don't think that in 12 months, certainly, you know, everything's going to be perfectly recommended. I think there's areas of active research in the kind of academic fields of how to improve these things, but I think there's a big engineering challenge to bring in more disparate data sources, to better, to improve data quality, to improve these feedback loops, to try and get systems that are serving customer needs better. So improving recommendations, improving the quality of fraud detection systems. Everything from that to medical imaging and counter detection. I think we've got a long way to go. >> Would it be fair to say that we've done a pretty good job with traditional application lifecycle in terms of DevOps, but we now need the DevOps for the data scientists and their collaborators? >> Nick: Yeah, I think that's >> And where is BMI along that? >> Yeah, that's a good question, and I think you kind of hit the nail on the head, that the enterprise applied machine learning problem has moved from the kind of academic to the software engineering and actually, DevOps. Internally, someone mentioned the word train ops, so it's almost like, you know, the machine learning workflow and actually professionalizing and operationalizing that. So recently, IBM, for one, has announced what's in data platform and now, what's in machine learning. And that really tries to address that problem. So really, the aim is to simplify and productionize these end-to-end machine-learning workflows. So that is the product push that IBM has at the moment. >> George: Okay, that's helpful. >> Yeah, and right. I was at the Watson data platform announcement you call the Data Works. I think they changed the branding. >> Nick: Yeah. >> It looked like there were numerous components that IBM had in its portfolio that's now strung together. And to create that end-to-end system that you're describing. Is that a fair characterization, or is it underplaying? I'm sure it is. The work that went into it, but help us maybe understand that better. >> Yeah, I should caveat it by saying we're fairly focused, very focused at HTC on the Open Source side of things, So my work is predominately within the Apache Spark project and I'm less involved in the data bank. >> Dave: So you didn't contribute specifically to Watson data platform? >> Not to the product line, so, you know, >> Yeah, so its really not an appropriate question for you? >> I wouldn't want to kind of, >> Yeah. >> To talk too deeply about it >> Yeah, yeah, so that, >> Simply because I haven't been involved. >> Yeah, that's, I don't want to push you on that because it's not your wheelhouse, but then, help me understand how you will commercialize the activities that you do, or is that not necessarily the intent? >> So the intent with HTC particularly is that we focus on Open Source and a core part of that is that we, being within IBM, we have the opportunity to interface with other product groups and customer groups. >> George: Right. >> So while we're not directly focused on, let's say, the commercial aspect, we want to effectively leverage the ability to talk to real-world customers and find the use cases, talk to other product groups that are building this Watson data platform and all the product lines and the features, data sans experience, it's all built on top of Apache Apache Spark and platform. >> Dave: So your role is really to innovate? >> Exactly, yeah. >> Leverage and Open Source and innovate. >> Both innovate and kind of improve, so improve performance improve efficiency. When you are operating at the scale of a company such as IBM and other large players, your customers and you as product teams and builders of products will come into contact with all the kind of little issues and bugs >> Right. >> And performance >> Make it better. Problems, yeah. And that is the feedback that we take on board and we try and make it better, not just for IBM and their customers. Because it's an Apache product and everyone benefits. So that's really the idea. Take all the feedback and learnings from enterprise customers and product groups and centralize that in the Open Source contributions that we make. >> Great. Would it be, so would it be fair to say you're focusing on making the core Spark, Spark ML and Spark ML Lib capabilities sort of machine learning libraries and in the pipeline, more robust? >> Yes. >> And if that's the case, we know there needs to be improvements in its ability to serve predictions in real time, like high speed. We know there's a need to take the pipeline and sort of share it with other tools, perhaps. Or collaborate with other tool chains. >> Nick: Yeah. >> What are some of the things that the Enterprise customers are looking for along the lines? >> Yeah, that's a great question and very topical at the moment. So both from an Open Source community perspective and Enterprise customer perspective, this is one of the, if not the key, I think, kind of missing pieces within the Spark machine-learning kind of community at the moment, and it's one of the things that comes up most often. So it is a missing piece, and we as a community need to work together and decide, is this something that we built within Spark and provide that functionality? Is is something where we try and adopt open standards that will benefit everybody and that provides a kind of one standardized format, or way or serving models? Or is it something where there's a few Open Source projects out there that might serve for this purpose, and do we get behind those? So I don't have the answer because this is ongoing work, but it's definitely one of the most critical kind of blockers, or, let's say, areas that needs work at the moment. >> One quick question, then, along those lines. IBM, the first thing IBM contributed to the Spark community was Spark ML, which is, as I understand it, it was an ability to, I think, create an ensemble sort of set of models to do a better job or create a more, >> So are you referring to system ML, I think it is? >> System ML. >> System ML, yeah, yeah. >> What are they, I forgot. >> Yeah, so, so. >> Yeah, where does that fit? >> System ML started out as a IBM research project and perhaps the simplest way to describe it is, as a kind of sequel optimizer is to take sequel queries and decide how to execute them in the most efficient way, system ML takes a kind of high-level mathematical language and compiles it down to a execution plan that runs in a distributed system. So in much the same way as your sequel operators allow this very flexible and high-level language, you don't have to worry about how things are done, you just tell the system what you want done. System ML aims to do that for mathematical and machine learning problems, so it's now an Apache project. It's been donated to Open Source and it's an incubating project under very active development. And that is really, there's a couple of different aspects to it, but that's the high-level goal. The underlying execution engine is Spark. It can run on Hadoop and it can run locally, but really, the main focus is to execute on Spark and then expose these kind of higher level APRs that are familiar to users of languages like R and Python, for example, to be able to write their algorithms and not necessarily worry about how do I do large scale matrix operations on a cluster? System ML will compile that down and execute that for them. >> So really quickly, follow up, what that means is if it's a higher level way for people who sort of cluster aware to write machine-learning algorithms that are cluster aware? >> Nick: Precisely, yeah. >> That's very, very valuable. When it works. >> When it works, yeah. So it does, again, with the caveat that I'm mostly focused on Spark and not so much the System ML side of things, so I'm definitely not an expert. I don't claim to be an expert in it. But it does, you know, it works at the moment. It works for a large class of machine-learning problems. It's very powerful, but again, it's a young project and there's always work to be done, so exactly the areas that I know that they're focusing on are these areas of usability, hardening up the APRs and making them easier to use and easier to access for users coming from the R and Python communities who, again are, as you said, they're not necessarily experts on distributed systems and cluster awareness, but they know how to write a very complex machine-learning model in R, for example. And it's really trying to enable them with a set of APR tools. So in terms of the underlying engine, they are, I don't know how many hundreds of thousands, millions of lines of code and years and years of research that's gone into that, so it's an extremely powerful set of tools. But yes, a lot of work still to be done there and ongoing to make it, in a way to make it user ready and Enterprise ready in a sense of making it easier for people to use it and adopt it and to put it into their systems and production. >> So I wonder if we can close, Nick, just a few questions on STC, so the Spark Technology Centers in Cape Town, is that a global expertise center? Is is STC a virtual sort of IBM community, or? >> I'm the only member visiting Cape Town, >> David: Okay. >> So I'm kind of fairly lucky from that perspective, to be able to kind of live at home. The rest of the team is mostly in San Francisco, so there's an office there that's co-located with the Watson west office >> Yeah. >> And Watson teams >> Sure. >> That are based there in Howard Street, I think it is. >> Dave: How often do you get there? >> I'll be there next week. >> Okay. >> So I typically, sort of two or three times a year, I try and get across there >> Right. And interface with the team, >> So, >> But we are a fairly, I mean, IBM is obviously a global company, and I've been surprised actually, pleasantly surprised there are team members pretty much everywhere. Our team has a few scattered around including me, but in general, when we interface with various teams, they pop up in all kinds of geographical locations, and I think it's great, you know, a huge diversity of people and locations, so. >> Anything, I mean, these early days here, early day one, but anything you saw in the morning keynotes or things you hope to learn here? Anything that's excited you so far? >> A couple of the morning keynotes, but had to dash out to kind of prepare for, I'm doing a talk later, actually on feature hashing for scalable machine learning, so that's at 12:20, please come and see it. >> Dave: A breakout session, it's at what, 12:20? >> 20 past 12:00, yeah. >> Okay. >> So in room 302, I think, >> Okay. >> I'll be talking about that, so I needed to prepare, but I think some of the key exciting things that I have seen that I would like to go and take a look at are kind of related to the deep learning on Spark. I think that's been a hot topic recently in one of the areas, again, Spark is, perhaps, hasn't been the strongest contender, let's say, but there's some really interesting work coming out of Intel, it looks like. >> They're talking here on The Cube in a couple hours. >> Yeah. >> Yeah. >> I'd really like to see their work. >> Yeah. >> And that sounds very exciting, so yeah. I think every time I come to a Spark summit, they always need projects from the community, various companies, some of them big, some of them startups that are pushing the envelope, whether it's research projects in machine learning, whether it's adding deep learning libraries, whether it's improving performance for kind of commodity clusters or for single, very powerful single modes, there's always people pushing the envelope, and that's what's great about being involved in an Open Source community project and being part of those communities, so yeah. That's one of the talks that I would like to go and see. And I think I, unfortunately, had to miss some of the Netflix talks on their recommendation pipeline. That's always interesting to see. >> Dave: Right. >> But I'll have to check them on the video (laughs). >> Well, there's always another project in Open Source land. Nick, thanks very much for coming on The Cube and good luck. Cool, thanks very much. Thanks for having me. >> Have a good trip, stay warm, hang in there. (Nick laughs) Alright, keep it right there. My buddy George and I will be back with our next guest. We're live. This is The Cube from Sparks Summit East, #sparksummit. We'll be right back. (upbeat music) (gentle music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by Data Bricks. a the IBM Spark Technology Center in South Africa. So let's see, it's a different time of year, here I've flown from, I don't know the Fahrenheit's equivalent, You probably get the T-shirt for the longest flight here, need the parka, or like a beanie. So Nick, tell us about the Spark Technology Center, and the ecosystem. The famous example that I like to use is Linux. I don't know all the details, certainly, Translate the hallway talk, maybe. Essentially, I think you raise very good parallels and kind of almost leap frog and say, "We're going to and so, in some respects, maybe missing the window on Hadoop and they're still sort of struggling to figure it out. So part of that is the traditional data warehousing So Nick, perhaps paint us a picture of someone and almost commoditization of the model side. And that's not even the end of it And the business impact of that presumably will be still, so, but are you suggesting that by closing it's not magic that you just simply throw and the models are getting better and better attacked the time problem. to go from, yeah, as you mentioned where we are, and more about making the systems better So improving recommendations, improving the quality So really, the aim is to simplify and productionize Yeah, and right. And to create that end-to-end system that you're describing. and I'm less involved in the data bank. So the intent with HTC particularly is that we focus leverage the ability to talk to real-world customers and you as product teams and builders of products and centralize that in the Open Source contributions sort of machine learning libraries and in the pipeline, And if that's the case, So I don't have the answer because this is ongoing work, IBM, the first thing IBM contributed to the Spark community but really, the main focus is to execute on Spark When it works. and ongoing to make it, in a way to make it user ready So I'm kind of fairly lucky from that perspective, And interface with the team, and I think it's great, you know, A couple of the morning keynotes, but had to dash out are kind of related to the deep learning on Spark. that are pushing the envelope, whether it's research and good luck. My buddy George and I will be back with our next guest.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Valente	PERSON	0.99+
George	PERSON	0.99+
Dave	PERSON	0.99+
Nick Pentreath	PERSON	0.99+
Howard Street	LOCATION	0.99+
San Francisco	LOCATION	0.99+
Nick Pentry	PERSON	0.99+
$1 billion	QUANTITY	0.99+
Nick	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
HTC	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Cape Town	LOCATION	0.99+
South Africa	LOCATION	0.99+
Java	TITLE	0.99+
Linux	TITLE	0.99+
12 months	QUANTITY	0.99+
six months	QUANTITY	0.99+
next week	DATE	0.99+
Boston	LOCATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
IBM Spark Technology Center	ORGANIZATION	0.99+
BMI	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
12:20	DATE	0.99+
three	QUANTITY	0.99+
6-12 month	QUANTITY	0.99+
Watson	ORGANIZATION	0.98+
tomorrow	DATE	0.98+
Spark Technology Center	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Spark Technology Centers	ORGANIZATION	0.98+
this year	DATE	0.97+
Hadoop	TITLE	0.97+
hundreds of thousands	QUANTITY	0.97+
both	QUANTITY	0.97+
30 degrees Celsius	QUANTITY	0.97+
Data First	ORGANIZATION	0.97+
Super Bowl	EVENT	0.97+
single	QUANTITY	0.96+

David Richards, WANdisco - #AWS - #theCUBE - @DavidRichards

>> Announcer: Live from San Jose, in the heart of Silicon Valley, it's theCUBE. Covering AWS Summit 2016. (upbeat electronic music) >> Hello everyone, welcome to theCUBE. Here, live in Silicon Valley, at Amazon Web Services, AWS Summit, in Silicon Valley. I'm John Furrier, this is theCUBE, our flagship program. We go out to the events and extract the signal from the noise. I'm here with my co-host. Introducing Lisa Martin on theCUBE, new host. Lisa, you look great. Our first guest here is David Richards, CEO of WANdisco. Welcome to theCUBE, good to see you. >> Good to see you, John, as always. >> So, I've promised a special CUBE presentation, $20 bill here that I owe David. We played golf on Friday, our first time out in the year. He sandbagged me, he's a golfer, he's a pro. I don't play very often. There's your winnings, there you go, $20, I paid. (smooching) (laughing) I did not well challenge your swing, so it's been paid. Great fun, good to see you. >> It was great fun and I'm sorry that I cheated a little bit, mirror in the bathroom still running through your ears. >> I love the English style. Like all the inner gain and playing music on the course, it was great a great time. When we went golfing last week, we were talking, just kind of had a social get-together but we were talking about some things on the industry mind right now. And you had some interesting color around your business. We talked about your strategy of OEMing your core technology to IBM and also you have other business deals. Can you share some light on your strategy at WANdisco with your core IP, and how that relates to what's going on in this phenom called Amazon Web Services? They've been running the table on the enterprise now and certainly public cloud for years. $10 billion, Wikibon called that years ago. We see that trajectory not stopping but clearly the enterprise cloud is what they want. Do you have a deal with Amazon? Are you talking to them and what is that impact your business? >> Well I mean the wonderful thing is if you go to AWS Marketplace, you go to that front page, we're one of the feature products on the front page of the AWS Marketplace, so I think that tells you that we're pretty strategic with Amazon. We're solving a big problem for them which is the movement of data in and out of public cloud. But you asked an interesting question about our business model. When we first came into the whole big date marketplace we went for the whole direct selling thing like everybody does, but that doesn't give you a lot of operational leverage. I mean we're in accounts with IBM right now, you mentioned earlier, MR technology. At a big automotive company they have 72 enterprise sales guys, 72. We could never get to that scale any time soon. >> And you have relationships too. So it's not like they like, you know, just knocking on doors selling used cars. They are strategic high-end enterprise sales. >> Exactly. That gives us a tremendous amount of operational leverage and AWS is one of the great stories, will be one of the great IT stories of the century. To go from zero to 15 billion. If AWS was an independent company, faster than any other enterprise software company in the history of mankind, is just incredible. >> Yeah, well, enterprise obviously, they care about hybrid cloud, which you know all about through your IBM relationship. Andy Jassy at Amazon, the CEO now of Amazon. Newly announced title, he's certainly SVP, basically he's been the CEO of Amazon. He's been on record, certainly on stage, and on theCUBE saying, why do even companies need data centers? That kind of puts you out of business. You have a data center product, or is the cloud just one big data center? Will there ultimately be no data center at all? What's your thoughts? >> That's a great question. We see the cloud as just one great big data center or actually many great big data centers. And how you actually integrate those together, how you move data between data centers, how you arbitrage been cloud vendors. Are you really going to put all your eggs into one basket? You're going to put everything into AWS. Everything into Azure. I don't think you will. I think you'll need to move data around between those different data centers and then how about high availability? How do you solve that problem? Well WANdisco solves that problem as well. >> So a couple of questions for you David. One of the things that Dr. Wood said in the keynote today was friends don't let friends build data centers. So I wanted to get your take on that as well as from an IBM perspective. We just talked about the OEM opportunity that you're working there to get to those large enterprises. Does that mean that you're shifting your focus for enterprise towards IBM? Where does that leave WANdisco and Amazon as we see Amazon making a big push to the enterprise? >> So I think that was some big news that came out last week that was missed largely by the industry, which was the FCA, the financial regulatory authority in the United Kingdom, came out and said, we see no reason why banks cannot move to cloud from a regulatory perspective. That was one of the big fears that we all had which is are banks actually going to be able to move core infrastructure into a public cloud environment? Well now it turns out they can. So we're all in on cloud. I mean, we can see, if you look at the partnerships that we're focused on, it's the sort of four/five cloud vendors. It's the IBM, the AWS, Azure, Oracle, when they finally built that cloud, and so on. They're the key partnerships that we see in the marketplace. That will be our go-to market strategy. That is our go-to market strategy. >> So one of the things that's clear is the data value and you do a lot of replications. So one of the things that, I forget which CUBE segment we've done over the years, that's Hurricane Sandy I think it was, in New York City. You guys were instrumental in keeping the up-time and availability. >> Lisa mentioned, Amazon vis-a-vis IBM, obviously two different strategies, kind of converging in on the same customer. Amazon's had problems with availability zones and they're rushing and running like the wind to put up new data centers. They just announced a new data center in India just recently. Andy Jassy and team were out there kicking that off. So they're rushing to put points of presence, if you will, for lack of a better word, around the world. Does that fit into your availability concept and how do customers engage with you guys with specifically that kind of architecture developing very fast? >> I think that's a really great question. There are problems, there have been historic problems with general availability in cloud. There are lots of 15-minute outages and so on that cost billions and billions of dollars. We're working very closely and I can't say too much about it with the teams that are focused on enabling availability. Clearly the IBM OEM is very focused on the movement of data from the hybrid cloud, I'm from a data availability perspective. But there's a great deal of value in data that sits in cloud and I think you'll see us do more and more deals around general cloud availability moving forward. >> Is there a specific on that front project that you can share with us where you've really helped a customer gain significant advantage by working with AWS and facilitating those availability objectives, security compliance? >> So, one of the big use cases that we see, and it's kind of all happening at once really, is I built an on-premise infrastructure to store lots and lots of data, now I need to run compute and analytics against that data and I'm not going to build a massive redundant infrastructure on-premise in order to do that, so I need to figure out a way to move that data in and out of cloud without interruption to service. And when we are talking about large volumes of data, you simply can't move transactional data in and out of cloud using existing technology. AWS offers something called Snowball where you put it into a rugged ICE drive and then you ship it to them, but that's not really streaming analytics is it? Most of our use cases today are either involved in either the migration of data from on-premise into cloud infrastructure, or the movement of data for an atemporal basis so I can run compute against that data and taking advantage of the elastic compute available in cloud. They are really the two major use cases that web, and we're working with a lot of customers right now that have those exact problems. >> So majority of your customers are more using hybrid cloud versus all in the public cloud? >> Hybrid falls into two categories. I'm going to use hybrid in order to migrate data because I need to keep on using it while it's moving. And secondly I need to use hybrid because I need to build a compute infrastructure that I simply can't build behind firewall. I need to build it in cloud. >> So the new normal is the cloud. There was a tweet here that says, database migration, now we can have an Oracle Exadata data dispute that we're ready to throw into the river. (David laughs) Database migration is a big thing and you mentioned it on the first question that moving in and out of the cloud is a top concern for enterprises. This is one of those things, it's the elephant in the room, so to speak. No pun intended AKA Hadoop. Moving the data around is a big deal and you don't want to get a roach motel situation where you can check in and can't check out. That is the lock-in that enterprise customers are afraid of with Amazon. You're thoughts there, and what do you guys offer your customers. And if you can give some color on this whole database migration issue, real, not real? >> The big problem that the Hadoop market has had from a growth perspective is applications. And why they had a problem, well it's the concept of data gravity. The way that the AWS execs will look at their business the way that the Azure execs will look at their business at Microsoft. They will look at how much data they actually have. Data gravity. The implication being if I have data then the applications follow. The whole point of cloud is that I can build my applications on that ubiquitous infrastructure. We want to be the kings of moving data around right? Wherever the data lands is where the applications follow. If the applications follow, you have a business. If the applications don't follow, then it's probably a roach motel situation, as you so quaintly put it. But basically the data is temporal. It will move back to where the applications are going to be. So where the applications are, and it's who is going to be the king of applications, will actually win this race. >> So, question, in terms of migration, we're hearing a lot about mass migration. Amazon's even doing partner competency programs for migration. Not to trivialize it, talk to us about some of the challenges that you are helping customers overcome when they sort of don't know where to start when it comes to that data problem? >> If it's batch data, if it's stuff that I'm only going to touch if it's an archive, that I only going to touch once in a blue moon, then I can put it into Snowball and I can ship my Snowball device. I can sort of press the pause button akin to when I'm copying files into a network drive where you can't edit them, and then wait for two months, three months. Wait for them to turn up in AWS and that's fine. If it's transactional data where maybe 80% of my data set changes on a daily basis and I've got petabyte scale data to move, that's a hard problem. That requires active transactional data migration. That's a big mouthful, but that's really important for run-time transactional data. That's the problem that we solve. We enable customers, without interruption to service to move a massive scale active transactional data into cloud without any interruption of service. So I can still use it while it's moving. >> One of the things we were talking about before you came on was the whole global economy situation. I think a year and a half ago, or two years ago, you predicted the housing bubble bursting in London. You're in the London Exchange, you're a public company. Brexit, EU. These are huge issues that are going to impact, certainly North America looking healthy right now but some are saying that there's a big challenge and certainly the uncertainty of the U.S. presidency candidates that are lack of thereof. The general sentiment in the U.S. We're in a world of turmoil. So specifically the Brexit situation. You guys are in London. What does this impact your business and is that going to happen? Or give us some color and insight into what the countrymen are thinking over there. >> Okay, so, I get asked by, I live here of course, and I've lived here for 19 years. It feels like I'm recolonizing sometimes, I have to say. No, I'm joking. I get asked by a lot of Americans what the situation is with Brexit and why it happened. And for that you have to look at economics. If you sort of take a step back, in Northern Europe nine of the 10 poorest parts of Northern Europe are in the U.K. And one, only one of the top 10 richest parts is in the U.K. and that's London. So basically outside of London the U.K. has a really big problem. Those people are dissatisfied. When people are dissatisfied, if they're not benefiting from an economic upturn, if governments make it, like the conservative government for the past four years made huge cuts, those people don't benefit, and they really feel pissed off and they will vote against the government. >> John: So protest vote pretty much? >> Brexit was really, I think, a protest vote. It's people dissatisfied. It's people voting basically anti-immigration which is, being in the U.S., is a really foreign thing to us. >> But there are some implications to business. I mean obviously there's filings, there's legal issues, obviously currency. Have you been impacted positively, negatively and what is the outlook on WANdisco's business going forward with the Brexit uncertainty and/or impact? >> We're in great shape because we buy pounds. We buy labor that's now discounted by 20% in the U.K. I just got back from the U.K. If you want to go on vacation, Americans, anywhere, go to London this summer and go shopping because everything is humongously discounted for us American's right now. It's a great time to be there. So from a WANdisco perspective-- >> John: How does that affect the housing bubble too? >> I said to you about a year ago that the London housing market was akin to the jewelry shops that existed in Hong Kong a few years ago, where the Chinese used to come over and basically launder money by buying huge diamonds and bars of gold and things. If you look at the London housing market it is primarily fueled by the Saudis and by the Russians who have been buying Hyde Park Corner 100 million pounds, $160 million, well $140 million now, apartments and so on in London. Now seven, and I repeat seven housing funds in the U.K. last week canceled redemptions. Which means that they can foresee liquidity problems coming in those funds. I think you're about to see a housing crash in London, the like of which we've never seen before, and I think it would be very sad and I think that will make people really question the Brexit decision. >> John: So sell London property now people? >> Yes. >> Before the crash. >> And go shopping, I heard the go shopping. So following along that, you talked about the significant differential between London and the rest of the U.K. You're from Sheffield, you're very proud of that. You've also been proud of your business really helping to fuel that economy. How do you think Brexit is going to affect WANdisco in your home area of Sheffield. >> I don't think it really will. I think our employees there, relative terms, very well paid. They're working on interesting things. They're working very closely with the AWS team, for example, the S3 team, the MR team. And building our technology, we're liaising very closely with them. They're doing lots of interesting things. I suspect their vacations into Europe and their vacations to the United States have just gone up by about 20% which will reduce the amount of beer that they can drink. It's a big beer drinking part of the world in Sheffield. Sheffield is, in terms of cost of living, is relatively low compared to the rest of the U.K. and I think those people will be pretty happy. >> David, I appreciate you coming on theCUBE. I want to give you the final word here on the segment because you're a chief executive officer of a public company. You've been in the industry for awhile. You've seen the trials and tribulations of the Hadoop ecosystem. Now basically branded as the data ecosystem. As Hortonworks has recently announced, Hadoop Summit is now being called Data Works Summit. They're moving from the word Hadoop to Data. Clearly that's impacting all the trends. Cloud data, mobile is really the key. I want you, and I'm sure you get this question a lot, I would like you to take a minute and explain to the audience that's watching, what's this phenom of Amazon Web Services really all about? What's all the hub-bub about? Why is everyone fawning over Amazon now? When you go back five years ago, or 10 years ago when it started, they were ridiculed. I remember when this started I loved it, but they were looked at as just a kind of a tinkering environment. Now they're the behemoth and just on an unstoppable run and certainly the expansion has been fantastic under Andy Jassy's leadership. How do you explain it to normal people what's going on at Amazon? Take a minute please. >> So Amazon is, and that's a brilliant question, by the way. Amazon is the best investor-relation story ever, and I mean ever. What Bezos did is never talked about the potential size of the market. Never talked about this thing was going to generate lots of cash. He just said, you know what, we're building this little internet thing. It might, it might not work. It's not going to make any money. And then in the blink of an eye, it's a $15 billion revenue business growing faster than any other part of his business and throwing off cash like there's no tomorrow. It is just the most non-obvious story in technology, in business, of any public company ever. I mean AWS, arguably, as a stand alone entity, is almost worth as much as Oracle. An unbelievable, an unbelievable story and to do that with all the complexity. I mean mean running a public company with shareholder expectations, with investor relations where you have to constantly be positive about what's going on. For him to do that and never talk about making a profit, never talk about this becoming a multi-billion dollar segment of their business, is the most incredible thing. >> So they've been living the agile. Certainly that's the business story, but they've been living the agile story relative to announcing the slew of new products. Basic building blocks S3, EC2 to start with, as the story goes from Andy Jassy himself, and then a slew of new services. It's a tsunami of every event of new services. What is the disruptive enabler? What's the disruption under the hood for Amazon? How do you explain that? >> Well, I mean what they did is they took a really simple concept. They said, okay, storage, how do we make storage completely elastic, completely public, in a way that we can use the public internet to get data in and out of it. Right? That sounds simple. What they actually built underneath the covers was an extremely complex thing called object store. Everybody else in the industry completely missed this. Oracle missed it, Microsoft missed it, everybody missed it. Now we're all playing catch-up trying to develop this thing called object store. It's going to take over, I mean, somebody said to me, what's the relevance of Hadoop in cloud? And you have to ask that question. It's a relevant question. Do you really need it when you've got object store? Show me side-by-side, object store versus every, you know, Net Apple, Teradata, or any of those guys. Show me side-by-side the difference between the two things. There ain't a lot. >> Amazon Web Service is a company that can put incumbents out of business. David, thanks so much. As we always say, what inning are we in? It's really a double-header. Game one swept by Amazon Web Services. Game two is the enterprise and that's really the story here at Amazon Web Services Summit in Silicon Valley. Can Amazon capture the enterprise? Their focus is clear. We're theCUBE. I'm John Furrier with Lisa Martin. We'll be right back with more after this short break. (techno music)

Published Date : Jul 27 2016

SUMMARY :

in the heart of Silicon and extract the signal from the noise. there you go, $20, I paid. mirror in the bathroom still and how that relates to what's going on on the front page of the AWS Marketplace, So it's not like they like, you know, and AWS is one of the great stories, basically he's been the CEO of Amazon. We see the cloud as just One of the things that Dr. authority in the United Kingdom, So one of the things and how do customers engage with you guys the movement of data of the elastic compute I need to build it in cloud. the room, so to speak. the way that the Azure execs will look some of the challenges that I can sort of press the pause button and is that going to happen? of Northern Europe are in the U.K. is a really foreign thing to us. Have you been impacted I just got back from the U.K. Saudis and by the Russians between London and the rest of the U.K. of the world in Sheffield. and certainly the expansion It is just the most non-obvious story What is the disruptive enabler? the public internet to that's really the story here

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
John	PERSON	0.99+
London	LOCATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
FCA	ORGANIZATION	0.99+
Andy Jassy	PERSON	0.99+
David Richards	PERSON	0.99+
Lisa Martin	PERSON	0.99+
80%	QUANTITY	0.99+
India	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
$20	QUANTITY	0.99+
two months	QUANTITY	0.99+
zero	QUANTITY	0.99+
Lisa	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
U.K.	LOCATION	0.99+
three months	QUANTITY	0.99+
seven	QUANTITY	0.99+
Hong Kong	LOCATION	0.99+
David Richards	PERSON	0.99+
New York City	LOCATION	0.99+
20%	QUANTITY	0.99+
John Furrier	PERSON	0.99+
Sheffield	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
$15 billion	QUANTITY	0.99+
Wood	PERSON	0.99+
United States	LOCATION	0.99+
WANdisco	ORGANIZATION	0.99+
$160 million	QUANTITY	0.99+
last week	DATE	0.99+
$140 million	QUANTITY	0.99+
$10 billion	QUANTITY	0.99+
19 years	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Friday	DATE	0.99+
Silicon Valley	LOCATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Works: