Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

Mandy Chessell, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Well hello welcome to the Cube I'm James Kobielus. I'm the lead analyst for big data analytics within the Wikibon team of SiliconANGLE Media. I'm hosting the Cube this week at Dataworks Summit 2018 in Berlin, Germany. It's been an excellent event. Hortonworks, the host, had... We've completed two days of keynotes. They made an announcement of the Data Steward Studio as the latest of their offerings and demonstrated it this morning, to address GDPR compliance, which of course is hot and heavy is coming down on enterprises both in the EU and around the world including in the U.S. and the May 25th deadline is fast approaching. One of Hortonworks' prime partners is IBM. And today on this Cube segment we have Mandy Chessell. Mandy is a distinguished engineer at IBM who did an excellent keynote yesterday all about metadata and metadata management. Mandy, great to have you. >> Hi and thank you. >> So I wonder if you can just reprise or summarize the main take aways from your keynote yesterday on metadata and it's role in GDPR compliance, so forth and the broader strategies that enterprise customers have regarding managing their data in this new multi-cloud world where Hadoop and open source platforms are critically important for storing and processing data. So Mandy go ahead. >> So, metadata's not new. I mean it's basically information about data. And a lot of companies are trying to build a data catalog which is not a catalog of, you know, actually containing their data, it's a catalog that describes their data. >> James: Is it different with index or a glossary. How's the catalog different from-- >> Yeah, so catalog actually includes both. So it is a list of all the data sets plus a links to glossary definitions of what those data items mean within the data sets, plus information about the lineage of the data. It includes information about who's using it, what they're using it for, how it should be governed. >> James: It's like a governance repository. >> So governance is part of it. So the governance part is really saying, "This is how you're allowed to use it, "this is how the data's classified," "these are the automated actions that are going to happen "on the data as it's used "within the operational environment." >> James: Yeah. >> So there's that aspect to it, but there is the collaboration side. Hey I've been using this data set it's great. Or, actually this data set is full of errors, we can't use it. So you've got feedback to data set owners as well as, exchange and collaboration between data scientists working with the data. So it's really, it is a central resource for an organization that has a strong data strategy, is interested in becoming a data-driven organization as such, so, you know, this becomes their major catalog over their data assets, and how they're using it. So when a regulator comes in and says, "can you show up, show me that you're "managing personal data?" The data catalog will have the information about where personal data's located, what type of infrastructure it's sitting on, how it's being used by different services. So they can really show that they know what they're doing and then from that they can show how to processes are used in the metadata in order to use the data appropriately day to day. >> So Apache Atlas, so it's basically a catalog, if I understand correctly at least for IBM and Hortonworks, it's Hadoop, it's Apache Atlas and Apache Atlas is essentially a metadata open source code base. >> Mandy: Yes, yes. >> So explain what Atlas is in this context. >> So yes, Atlas is a collection of code, but it supports a server, a graph-based metadata server. It also supports-- >> James: A graph-based >> Both: Metadata server >> Yes >> James: I'm sorry, so explain what you mean by graph-based in this context. >> Okay, so it runs using the JanusGraph, graph repository. And this is very good for metadata 'cause if you think about what it is it's connecting dots. It's basically saying this data set means this value and needs to be classified in this way and this-- >> James: Like a semantic knowledge graph >> It is, yes actually. And on top of it we impose a type system that describes the different types of things you need to control and manage in a data catalog, but the graph, the Atlas component gives you that graph-based, sorry, graph-based repository underneath, but on top we've built what we call the open metadata and governance libraries. They run inside Atlas so when you run Atlas you will have all the open metadata interfaces, but you can also take those libraries and connect them and load them actually into another vendor's product. And what they're doing is allowing metadata to be exchanged between repositories of different types. And this becomes incredibly important as an organization increases their maturity and their use of data because you can't just have knowledge about data in a single server, it just doesn't scale. You need to get that knowledge into every runtime environment, into the data tools that people are using across the organization. And so it needs to be distributed. >> Mandy I'm wondering, the whole notion of what you catalog in that repository, does it include, or does Apache Atlas support adding metadata relevant to data derivative assets like machine learning models-- >> Mandy: Absolutely. >> So forth. >> Mandy: Absolutely, so we have base types in the upper metadata layer, but also it's a very flexible and sensible type system. So, if you've got a specialist machine learning model that needs additional information stored about it, that can easily be added to the runtime environment. And then it will be managed through the open metadata protocols as if it was part of the native type system. >> Because of the courses in analysts, one of my core areas is artificial intelligence and one of the hot themes in artificial, well there's a broad umbrella called AI safety. >> Mandy: Yeah. >> And one of the core subsets of that is something called explicable AI, being able to identify the lineage of a given algorithmic decision back to what machine learning models fed from what data. >> Mandy: Yeah. >> Throw what action like when let's say a self-driving vehicle hits a human being for legal, you know, discovery whatever. So what I'm getting at, what I'm working through to is the extent to which the Hortonworks, IBM big data catalog running Atlas can be a foundation for explicable AI either now or in the future. We see a lot of enterprise, me as an analyst at least, sees lots of enterprises that are exploring this topic, but it's not to the point where it's in production, explicable AI, but where clearly companies like IBM are exploring building a stack or a architecture for doing this kind of thing in a standardized way. What are your thoughts there? Is IBM working on bringing, say Atlas and the overall big data catalog into that kind of a use case. >> Yes, yeah, so if you think about what's required, you need to understand the data that was used to train the AI how, what data's been fed to it since it was deployed because that's going to change its behavior, and then also a view of how that data's going to change in the future so you can start to anticipate issues that might arising from the model's changing behavior. And this is where the data catalog can actually associate and maintain information about the data that's being used with the algorithm. You can also associate the checking mechanism that's constantly monitoring the profile of the data so you can see where the data is changing over time, that will obviously affect the behavior of the machine learning model. So it's really about providing, not just information about the model itself, but also the data that's feeding it, how those characteristics are changing over time so that you know the model is continuing to work into the future. >> So tell us about the IBM, Hortonworks partnership on metadata and so forth. >> Mandy: Okay. >> How is that evolving? So, you know, your partnership is fairly tight. You clearly, you've got ODPI, you've got the work that you're doing related to the big data catalog. What can we expect to see in the near future in terms of, initiatives building on all of that for governance of big data in the multi-cloud environment? >> Yeah so Hortonworks started the Apache Atlas project a couple of years ago with a number of their customers. And they built a base repository and a set of APIs that allow it to work in the Hadoop environment. We came along last year, formed our partnership. That partnership includes this open metadata and governance layer. So since then we worked with ING as well and ING bring the, sort of, user perspective, this is the organization's use of the data. And, so between the three of us we are basically transforming Apache Atlas from an Hadoop focused metadata repository to an enterprise focused metadata repository. Plus enabling other vendors to connect into the open metadata ecosystem. So we're standardizing types, standardizing format, the format of metadata, there's a protocol for exchanging metadata between repositories. And this is all coming from that three-way partnership where you've got a consuming organization, you've got a company who's used to building enterprise middleware, and you've got Hortonworks with their knowledge of open source development in their Hadoop environment. >> Quick out of left field, as you develop this architecture, clearly you're leveraging Hadoop HTFS for storage. Are you looking to at least evaluating maybe using block chain for more distributed management of the metadata in these heterogeneous environments in the multi-cloud, or not? >> So Atlas itself does run on HTFS, but doesn't need to run on HTFS, it's got other storage environments so that we can run it outside of Hadoop. When it comes to block chain, so block chain is, for, sharing data between partners, small amounts of data that basically express agreements, so it's like a ledger. There are some aspects that we could use for metadata management. It's more that we actually need to put metadata management into block chain. So the agreements and contracts that are stored in block chain are only meaningful if we understand the data that's there, what it's quality, where it came from what it means. And so actually there's a very interesting distributor metadata question that comes with the block chain technology. And I think that's an important area of research. >> Well Mandy we're at the end of our time. Thank you very much. We could go on and on. You're a true expert and it's great to have you on the Cube. >> Thank you for inviting me. >> So this is James Kobielus with Mandy Chessell of IBM. We are here this week in Berlin at Dataworks Summit 2018. It's a great event and we have some more interviews coming up so thank you very much for tuning in. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube I'm hosting the Cube this week at Dataworks Summit 2018 and the broader strategies that enterprise customers which is not a catalog of, you know, actually containing How's the catalog different from-- So it is a list of all the data sets plus a links "these are the automated actions that are going to happen in the metadata in order to use So Apache Atlas, so it's basically a catalog, So yes, Atlas is a collection of code, James: I'm sorry, so explain what you mean and needs to be classified in this way that describes the different types of things you need in the upper metadata layer, but also it's a very flexible and one of the hot themes in artificial, And one of the core subsets of that the extent to which the Hortonworks, IBM big data catalog in the future so you can start to anticipate issues So tell us about the IBM, Hortonworks partnership for governance of big data in the multi-cloud environment? And, so between the three of us we are basically of the metadata in these heterogeneous environments So the agreements and contracts that are stored You're a true expert and it's great to have you on the Cube. So this is James Kobielus with Mandy Chessell of IBM.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
ING	ORGANIZATION	0.99+
James	PERSON	0.99+
three	QUANTITY	0.99+
Berlin	LOCATION	0.99+
Mandy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
May 25th	DATE	0.99+
last year	DATE	0.99+
U.S.	LOCATION	0.99+
two days	QUANTITY	0.99+
Atlas	TITLE	0.99+
yesterday	DATE	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Both	QUANTITY	0.98+
EU	LOCATION	0.98+
GDPR	TITLE	0.98+
One	QUANTITY	0.98+
one	QUANTITY	0.98+
Dataworks Summit 2018	EVENT	0.97+
Dataworks Summit EU 2018	EVENT	0.96+
this week	DATE	0.94+
single server	QUANTITY	0.94+
Hadoop	TITLE	0.94+
today	DATE	0.93+
this morning	DATE	0.93+
three-way partnership	QUANTITY	0.93+
Wikibon	ORGANIZATION	0.91+
Hortonworks'	ORGANIZATION	0.9+
Atlas	ORGANIZATION	0.89+
Dataworks Summit Europe 2018	EVENT	0.89+
couple of years ago	DATE	0.87+
Apache Atlas	TITLE	0.86+
Cube	COMMERCIAL_ITEM	0.83+
Apache	ORGANIZATION	0.82+
JanusGraph	TITLE	0.79+
hot themes	QUANTITY	0.68+
Hado	ORGANIZATION	0.67+
Hadoop HTFS	TITLE	0.63+

Dave McDonnell, IBM | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE (relaxing music) covering DataWorks Summit Europe 2018. (relaxing music) Brought to you by Hortonworks. (quieting music) >> Well, hello and welcome to theCUBE. We're here at DataWorks Summit 2018 in Berlin, Germany, and it's been a great show. Who we have now is we have IBM. Specifically we have Dave McDonnell of IBM, and we're going to be talkin' with him for the next 10 minutes or so about... Dave, you explain. You are in storage for IBM, and IBM of course is a partner of Hortonworks who are of course the host of this show. So Dave, have you been introduced, give us your capacity or roll at IBM. Discuss the partnership of Hortonworks, and really what's your perspective on the market for storage systems for Big Data right now and going forward? And what kind of work loads and what kind of requirements are customers coming to you with for storage systems now? >> Okay, sure, so I lead alliances for the storage business unit, and Hortonworks, we actually partner with Hortonworks not just in our storage business unit but also with our analytics counterparts, our power counterparts, and we're in discussions with many others, right? Our partner organization services and so forth. So the nature of our relationship is quite broad compared to many of our others. We're working with them in the analytics space, so these are a lot of these Big Data Data Lakes, BDDNA a lot of people will use as an acronym. These are the types of work loads that customers are using us both for. >> Mm-hmm. >> And it's not new anymore, you know, by now they're well past their first half dozen applications. We've got customers running hundreds of applications. These are production applications now, so it's all about, "How can I be more efficient? "How can I grow this? "How can I get the best performance and scalability "and ease of management to deploy these "in a way that's manageable?" 'cause if I have 400 production applications, that's not off in any corner anymore. So that's how I'd describe it in a nutshell. >> One of the trends that we're seeing at Wikibon, of course I'm the lead analyst for Big Data Analytics at Wikibon under SiliconANGLE Media, we're seeing a trend in the marketplace towards I wouldn't call them appliances, but what I would call them is workload optimized hardware software platforms so they can combine storage with compute and are optimized for AI and machine learning and so forth. Is that something that you're hearing from customers, that they require those built-out, AI optimized storage systems, or is that far in the future or? Give me a sense for whether IBM is doing anything in that area and whether that's on your horizon. >> If you were to define all of IBM in five words or less, you would say "artificial intelligence and cloud computing," so this is something' >> Yeah. that gets a lot of thought in Mindshare. So absolutely we hear about it a lot. It's a very broad market with a lot of diverse requirements. So we hear people asking for the Converged infrastructure, for Appliance solutions. There's of course Hyper Converged. We actually have, either directly or with partners, answers to all of those. Now we do think one of the things that customers want to do is they're going to scale and grow in these environments is to take a software-defined strategy so they're not limited, they're not limited by hardware blocks. You know, they don't want to have to buy processing power and spend all that money on it when really all they need is more data. >> Yeah. >> There's pros and cons to the different (mumbles). >> You have power AI systems, I know that, so that's where they're probably heading, yeah. >> Yes, yes, yes. So of course, we have packages that we've modeled in AI. They feed off of some of the Hortonworks data lakes that we're building. Of course we see a lot of people putting these on new pieces of infrastructure because they don't want to put this on their production applications, so they're extracting data from maybe a Hortonworks data lake number one, Hortonworks data lake number two, some of the EDWs, some external data, and putting that into the AI infrastructure. >> As customers move their cloud infrastructures towards more edge facing environments, or edge applications, how are storage requirements change or evolving in terms of in the move to edge computing. Can you give us a sense for any sort of trends you're seeing in that area? >> Well, if we're going to the world of AI and cognitive applications, all that data that I mighta thrown in the cloud five years ago I now, I'm educated enough 'cause I've been paying bills for a few years on just how expensive it is, and if I'm going to be bringing that data back, some of which I don't even know I'm going to be bringing back, it gets extremely expensive. So we see a pendulum shift coming back where now a lot of data is going to be on host, ah sorry, on premise, but it's not going to stay there. They need the flexibility to move it here, there, or everywhere. So if it's going to come back, how can we bring customers some of that flexibility that they liked about the cloud, the speed, the ease of deployment, even a consumption based model? These are very big changes on a traditional storage manufacturer like ourselves, right? So that's requiring a lot of development in software, it's requiring a lot of development in our business model, and one of the biggest thing you hear us talk about this year is IBM Cloud Private, which does exactly that, >> Right. and it gives them somethin' they can work with that's flexible, it's agile, and allows you to take containerized based applications and move them back and forth as you please. >> Yeah. So containerized applications. So if you can define it for our audience, what is a containerized application? You talk about Docker and orchestrate it through Kubernetes and so forth. So you mentioned Cloud Private. Can you bring us up to speed on what exactly Cloud Private is and in terms of the storage requirements or storage architecture within that portfolio? >> Oh yes, absolutely. So this is a set of infrastructure that's optimized for on-premise deployment that gives you multi-cloud access, not just IBM Cloud, Amazon Web Services, Microsoft Azure, et cetera, and then it also gives you multiple architectural choices basically wrapped by software to allow you to move those containers around and put them where you want them at the right time at the right place given the business requirement at that hour. >> Now is the data storager persisted in the container itself? I know that's fairly difficult to do in a Docker environment. How do ya handle persistence of data for containerized applications within your architecture? >> Okay, some of those are going to be application specific. It's the question of designing the right data management layer depending on the application. So we have software intelligence, some of it from open source, some of which we add on top of open source to bring some of the enterprise resilience and performance needed. And of course, you have to be very careful if the biggest trend in the world is unstructured data. Well, okay fine, it's a lot of sensor data. That's still fairly easy to move around. But once we get into things like medical images, lots of video, you know, HD video, 4K video, those are the things which you have to give a lot of thought to how to do that. And that's why we have lots of new partners that we work with the help us with edge cloud, which gives that on premise-like performance in really a cloud-like set up. >> Here's a question out of left field, and you may not have the answer, but I would like to hear your thoughts on this. How has Blockchain, and IBM's been making significant investments in blockchain technology database technology, how is blockchain changing the face of the storage industry in terms of customers' requirements for a storage systems to manage data in distributed blockchains? Is that something you're hearing coming from customers as a requirement? I'm just tryin' to get a sense for whether that's, you know, is it moving customers towards more flash, towards more distributed edge-oriented or edge deployed storage systems? >> Okay, so yes, yes, and yes. >> Okay. So all of a sudden, if you're doing things like a blockchain application, things become even more important than they are today. >> Yeah. >> Okay, so you can't lose a transaction. You can't have a storage going down. So there's a lot more care and thought into the resiliency of the infrastructure. If I'm, you know, buying a diamond from you, I can't accept the excuse that my $100,000 diamond, maybe that's a little optimistic, my $10,000 diamond or yours, you know, the transaction's corrupted because the data's not proper. >> Right. >> Or if I want my privacy, I need to be assured that there's good data governance around that transaction, and that that will be protected for a good 10, 20, and 30 years. So it's elevating the importance of all the infrastructure to a whole different level. >> Switching our focus slightly, so we're here at DataWorks Summit in Berlin. Where are the largest growth markets right now for cloud storage systems? Is it Apache, is it the North America, or where are the growth markets in terms of regions, in terms of vertical industries right now in the marketplace for enterprise grade storage systems for big data in the cloud? >> That's a great question, 'cause we certainly have these conversations globally. I'd say the place where we're seeing the most activity would be the Americas, we see it in China. We have a lot of interesting engagements and people reaching out to us. I would say by market, you can also point to financial services in more than those two regions. Financial services, healthcare, retail, these are probably the top verticals. I think it's probably safe to assume, and we can the federal governments also have a lot of stringent requirements and, you know, requirements, new applications around the space as well. >> Right. GDPR, how is that impacting your customers' storage requirements. The requirement for GDPR compliance, is that moving the needle in terms of their requirement for consolidated storage of the data that they need to maintain? I mean obviously there's a security, but there's just the sheer amount of, there's a leading to consolidation or centralization of storage, of customer data, that would seem to make it easier to control and monitor usage of the data. Is it making a difference at all? >> It's making a big difference. Not many people encrypt data today, so there's a whole new level of interest in encryption at many different levels, data at rest, data in motion. There's new levels of focus and attention on performance, on the ability for customers to get their arms around disparate islands of data, because now GDPR is not only a legal requirement that requires you to be able to have it, but you've also got timelines which you're expected to act on a request from a customer to have your data removed. And most of those will have a baseline of 30 days. So you can't fool around now. It's not just a nice to have. It's an actual core part of a business requirement that if you don't have a good strategy for, you could be spending tens of millions of dollars in liability if you're not ready for it. >> Well Dave, thank you very much. We're at the end of our time. This has been Dave McDonnell of IBM talking about system storage and of course a big Hortonworks partner. We are here on day two of the DataWorks Summit, and I'm James Kobielus of Wikibon SiliconANGLE Media, and have a good day. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. are customers coming to you with for storage systems now? So the nature of our relationship is quite broad "and ease of management to deploy these One of the trends that we're seeing at Wikibon, and spend all that money on it to the different (mumbles). so that's where they're probably heading, yeah. and putting that into the AI infrastructure. in terms of in the move to edge computing. and one of the biggest thing you hear us and allows you to take containerized based applications and in terms of the storage requirements and put them where you want them at the right time in the container itself? And of course, you have to be very careful and you may not have the answer, and yes. So all of a sudden, Okay, so you can't So it's elevating the importance of all the infrastructure for big data in the cloud? and people reaching out to us. is that moving the needle in terms of their requirement on the ability for customers to get their arms around and of course a big Hortonworks partner.

ENTITIES

Entity	Category	Confidence
Nicola	PERSON	0.99+
Michael	PERSON	0.99+
David	PERSON	0.99+
Josh	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jeremy Burton	PERSON	0.99+
Paul Gillon	PERSON	0.99+
GM	ORGANIZATION	0.99+
Bob Stefanski	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave McDonnell	PERSON	0.99+
amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
James Kobielus	PERSON	0.99+
Keith	PERSON	0.99+
Paul O'Farrell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
BMW	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
David Siegel	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
Nicola Acutt	PERSON	0.99+
Paul	PERSON	0.99+
David Lantz	PERSON	0.99+
Stu Miniman	PERSON	0.99+
three	QUANTITY	0.99+
Lisa	PERSON	0.99+
Lithuania	LOCATION	0.99+
Michigan	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
General Motors	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
America	LOCATION	0.99+
Charlie	PERSON	0.99+
Europe	LOCATION	0.99+
Pat Gelsing	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bobby	PERSON	0.99+
London	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dante	PERSON	0.99+
Switzerland	LOCATION	0.99+
six-week	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Bob	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
100	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
John Walls	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
Sandy Carter	PERSON	0.99+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Alan Gates, Hortonworks | Dataworks Summit 2018

(techno music) >> (announcer) From Berlin, Germany it's theCUBE covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to theCUBE. We're here on day two of DataWorks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm lead analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. And who we have here today, we have Alan Gates whose one of the founders of Hortonworks and Hortonworks of course is the host of DataWorks Summit and he's going to be, well, hello Alan. Welcome to theCUBE. >> Hello, thank you. >> Yeah, so Alan, so you and I go way back. Essentially, what we'd like you to do first of all is just explain a little bit of the genesis of Hortonworks. Where it came from, your role as a founder from the beginning, how that's evolved over time but really how the company has evolved specifically with the folks on the community, the Hadoop community, the Open Source community. You have a deepening open source stack with you build upon with Atlas and Ranger and so forth. Gives us a sense for all of that Alan. >> Sure. So as I think it's well-known, we started as the team at Yahoo that really was driving a lot of the development of Hadoop. We were one of the major players in the Hadoop community. Worked on that for, I was in that team for four years. I think the team itself was going for about five. And it became clear that there was an opportunity to build a business around this. Some others had already started to do so. We wanted to participate in that. We worked with Yahoo to spin out Hortonworks and actually they were a great partner in that. Helped us get than spun out. And the leadership team of the Hadoop team at Yahoo became the founders of Hortonworks and brought along a number of the other engineering, a bunch of the other engineers to help get started. And really at the beginning, we were. It was Hadoop, Pig, Hive, you know, a few of the very, Hbase, the kind of, the beginning projects. So pretty small toolkit. And we were, our early customers were very engineering heavy people, or companies who knew how to take those tools and build something directly on those tools right? >> Well, you started off with the Hadoop community as a whole started off with a focus on the data engineers of the world >> Yes. >> And I think it's shifted, and confirm for me, over time that you focus increasing with your solutions on the data scientists who are doing the development of the applications, and the data stewards from what I can see at this show. >> I think it's really just a part of the adoption curve right? When you're early on that curve, you have people who are very into the technology, understand how it works, and want to dive in there. So those tend to be, as you said, the data engineering types in this space. As that curve grows out, you get, it comes wider and wider. There's still plenty of data engineers that are our customers, that are working with us but as you said, the data analysts, the BI people, data scientists, data stewards, all those people are now starting to adopt it as well. And they need different tools than the data engineers do. They don't want to sit down and write Java code or you know, some of the data scientists might want to work in Python in a notebook like Zeppelin or Jupyter but some, may want to use SQL or even Tablo or something on top of SQL to do the presentation. Of course, data stewards want tools more like Atlas to help manage all their stuff. So that does drive us to one, put more things into the toolkit so you see the addition of projects like Apache Atlas and Ranger for security and all that. Another area of growth, I would say is also the kind of data that we're focused on. So early on, we were focused on data at rest. You know, we're going to store all this stuff in HDFS and as the kind of data scene has evolved, there's a lot more focus now on a couple things. One is data, what we call data-in-motion for our HDF product where you've got in a stream manager like Kafka or something like that >> (James) Right >> So there's processing that kind of data. But now we also see a lot of data in various places. It's not just oh, okay I have a Hadoop cluster on premise at my company. I might have some here, some on premise somewhere else and I might have it in several clouds as well. >> K, your focus has shifted like the industry in general towards streaming data in multi-clouds where your, it's more stateful interactions and so forth? I think you've made investments in Apache NiFi so >> (Alan) yes. >> Give us a sense for your NiFi versus Kafka and so forth inside of your product strategy or your >> Sure. So NiFi is really focused on that data at the edge, right? So you're bringing data in from sensors, connected cars, airplane engines, all those sorts of things that are out there generating data and you need, you need to figure out what parts of the data to move upstream, what parts not to. What processing can I do here so that I don't have to move upstream? When I have a error event or a warning event, can I turn up the amount of data I'm sending in, right? Say this airplane engine is suddenly heating up maybe a little more than it's supposed to. Maybe I should ship more of the logs upstream when the plane lands and connects that I would if, otherwise. That's the kind o' thing that Apache NiFi focuses on. I'm not saying it runs in all those places by my point is, it's that kind o' edge processing. Kafka is still going to be running in a data center somewhere. It's still a pretty heavy weight technology in terms of memory and disk space and all that so it's not going to be run on some sensor somewhere. But it is that data-in-motion right? I've got millions of events streaming through a set of Kafka topics watching all that sensor data that's coming in from NiFi and reacting to it, maybe putting some of it in the data warehouse for later analysis, all those sorts of things. So that's kind o' the differentiation there between Kafka and NiFi. >> Right, right, right. So, going forward, do you see more of your customers working internet of things projects, is that, we don't often, at least in the industry of popular mind, associate Hortonworks with edge computing and so forth. Is that? >> I think that we will have more and more customers in that space. I mean, our goal is to help our customers with their data wherever it is. >> (James) Yeah. >> When it's on the edge, when it's in the data center, when it's moving in between, when it's in the cloud. All those places, that's where we want to help our customers store and process their data. Right? So, I wouldn't want to say that we're going to focus on just the edge or the internet of things but that certainly has to be part of our strategy 'cause it's has to be part of what our customers are doing. >> When I think about the Hortonworks community, now we have to broaden our understanding because you have a tight partnership with IBM which obviously is well-established, huge and global. Give us a sense for as you guys have teamed more closely with IBM, how your community has changed or broadened or shifted in its focus or has it? >> I don't know that it's shifted the focus. I mean IBM was already part of the Hadoop community. They were already contributing. Obviously, they've contributed very heavily on projects like Spark and some of those. They continue some of that contribution. So I wouldn't say that it's shifted it, it's just we are working more closely together as we both contribute to those communities, working more closely together to present solutions to our mutual customer base. But I wouldn't say it's really shifted the focus for us. >> Right, right. Now at this show, we're in Europe right now, but it doesn't matter that we're in Europe. GDPR is coming down fast and furious now. Data Steward Studio, we had the demonstration today, it was announced yesterday. And it looks like a really good tool for the main, the requirements for compliance which is discover and inventory your data which is really set up a consent portal, what I like to refer to. So the data subject can then go and make a request to have my data forgotten and so forth. Give us a sense going forward, for how or if Hortonworks, IBM, and others in your community are going to work towards greater standardization in the functional capabilities of the tools and platforms for enabling GDPR compliance. 'Cause it seems to me that you're going to need, the industry's going to need to have some reference architecture for these kind o' capabilities so that going forward, either your ecosystem of partners can build add on tools in some common, like the framework that was laid out today looks like a good basis. Is there anything that you're doing in terms of pushing towards more Open Source standardization in that area? >> Yes, there is. So actually one of my responsibilities is the technical management of our relationship with ODPI which >> (James) yes. >> Mandy Chessell referenced yesterday in her keynote and that is where we're working with IBM, with ING, with other companies to build exactly those standards. Right? Because we do want to build it around Apache Atlas. We feel like that's a good tool for the basis of that but we know one, that some people are going to want to bring their own tools to it. They're not necessarily going to want to use that one platform so we want to do it in an open way that they can still plug in their metadata repositories and communicate with others and we want to build the standards on top of that of how do you properly implement these features that GDPR requires like right to be forgotten, like you know, what are the protocols around PIII data? How do you prevent a breach? How do you respond to a breach? >> Will that all be under the umbrella of ODPI, that initiative of the partnership or will it be a separate group or? >> Well, so certainly Apache Atlas is part of Apache and remains so. What ODPI is really focused up is that next layer up of how do we engage, not the programmers 'cause programmers can gage really well at the Apache level but the next level up. We want to engage the data professionals, the people whose job it is, the compliance officers. The people who don't sit and write code and frankly if you connect them to the engineers, there's just going to be an impedance mismatch in that conversation. >> You got policy wonks and you got tech wonks so. They understand each other at the wonk level. >> That's a good way to put it. And so that's where ODPI is really coming is that group of compliance people that speak a completely different language. But we still need to get them all talking to each other as you said, so that there's specifications around. How do we do this? And what is compliance? >> Well Alan, thank you very much. We're at the end of our time for this segment. This has been great. It's been great to catch up with you and Hortonworks has been evolving very rapidly and it seems to me that, going forward, I think you're well-positioned now for the new GDPR age to take your overall solution portfolio, your partnerships, and your capabilities to the next level and really in terms of in an Open Source framework. In many ways though, you're not entirely 100% like nobody is, purely Open Source. You're still very much focused on open frameworks for building fairly scalable, very scalable solutions for enterprise deployment. Well, this has been Jim Kobielus with Alan Gates of Hortonworks here at theCUBE on theCUBE at DataWorks Summit 2018 in Berlin. We'll be back fairly quickly with another guest and thank you very much for watching our segment. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of Hortonworks and Hortonworks of course is the host a little bit of the genesis of Hortonworks. a bunch of the other engineers to help get started. of the applications, and the data stewards So those tend to be, as you said, the data engineering types But now we also see a lot of data in various places. So NiFi is really focused on that data at the edge, right? So, going forward, do you see more of your customers working I mean, our goal is to help our customers with their data When it's on the edge, when it's in the data center, as you guys have teamed more closely with IBM, I don't know that it's shifted the focus. the industry's going to need to have some So actually one of my responsibilities is the that GDPR requires like right to be forgotten, like and frankly if you connect them to the engineers, You got policy wonks and you got tech wonks so. as you said, so that there's specifications around. It's been great to catch up with you and

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
Alan	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
Jim Kobielus	PERSON	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Alan Gates	PERSON	0.99+
four years	QUANTITY	0.99+
James	PERSON	0.99+
ING	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Apache	ORGANIZATION	0.99+
SQL	TITLE	0.99+
Java	TITLE	0.99+
GDPR	TITLE	0.99+
Python	TITLE	0.99+
100%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
DataWorks Summit	EVENT	0.99+
Atlas	ORGANIZATION	0.99+
DataWorks Summit 2018	EVENT	0.98+
Data Steward Studio	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
NiFi	ORGANIZATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
Hadoop	ORGANIZATION	0.98+
one platform	QUANTITY	0.97+
2018	EVENT	0.97+
both	QUANTITY	0.97+
millions of events	QUANTITY	0.96+
Hbase	ORGANIZATION	0.95+
Tablo	TITLE	0.95+
ODPI	ORGANIZATION	0.94+
Big Data Analytics	ORGANIZATION	0.94+
One	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+
NiFi	COMMERCIAL_ITEM	0.92+
day two	QUANTITY	0.92+
about five	QUANTITY	0.91+
Kafka	TITLE	0.9+
Zeppelin	ORGANIZATION	0.89+
Atlas	TITLE	0.85+
Ranger	ORGANIZATION	0.84+
Jupyter	ORGANIZATION	0.83+
first	QUANTITY	0.82+
Apache Atlas	ORGANIZATION	0.82+
Hadoop	TITLE	0.79+

Day Two Keynote Analysis | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John Kreisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
May 25th	DATE	0.99+
Berlin	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
five weeks	QUANTITY	0.99+
Alan Gates	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Hotronworks	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Santa Monica, California	LOCATION	0.99+
GDPR	TITLE	0.99+
19	QUANTITY	0.99+
both	QUANTITY	0.99+
100%	QUANTITY	0.99+
today	DATE	0.99+
20%	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
U.S.	LOCATION	0.99+
DSS	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
three main points	QUANTITY	0.98+
Atlas	ORGANIZATION	0.98+
20	QUANTITY	0.98+
about seven years	QUANTITY	0.98+
Accenture	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.97+
One	QUANTITY	0.97+
about three years	QUANTITY	0.97+
Day Two	QUANTITY	0.97+
first gentleman	QUANTITY	0.96+
day two	QUANTITY	0.96+
SAP	ORGANIZATION	0.96+
EU	LOCATION	0.95+
Datawork Summit Europe 2018	EVENT	0.95+
Dataworks Summit	EVENT	0.94+
this morning	DATE	0.91+
About 40%	QUANTITY	0.91+
Wikibon	ORGANIZATION	0.9+
EU	ORGANIZATION	0.9+

Joe Morrissey, Hortonworks | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to theCUBE. I'm James Kobielus. I'm lead analyst at Wikibon for big data analytics. Wikibon, of course, is the analyst team inside of SiliconANGLE Media. One of our core offerings is theCUBE and I'm here with Joe Morrissey. Joe is the VP for International at Hortonworks and Hortonworks is the host of Dataworks Summit. We happen to be at Dataworks Summit 2018 in Berlin! Berlin, Germany. And so, Joe, it's great to have you. >> Great to be here! >> We had a number of conversations today with Scott Gnau and others from Hortonworks and also from your customer and partners. Now, you're International, you're VP for International. We've had a partner of yours from South Africa on theCUBE today. We've had a customer of yours from Uruguay. So there's been a fair amount of international presence. We had Munich Re from Munich, Germany. Clearly Hortonworks is, you've been in business as a company for seven years now, I think it is, and you've established quite a presence worldwide, I'm looking at your financials in terms of your customer acquisition, it just keeps going up and up so you're clearly doing a great job of bringing the business in throughout the world. Now, you've told me before the camera went live that you focus on both Europe and Asia PACS, so I'd like to open it up to you, Joe. Tell us how Hortonworks is doing worldwide and the kinds of opportunities you're selling into. >> Absolutely. 2017 was a record year for us. We grew revenues by over 40% globally. I joined to lead the internationalization of the business and you know, not a lot of people know that Hortonworks is actually one of the fastest growing software companies in history. We were the fastest to get to $100 million. Also, now the fastest to get to $200 million but the majority of that revenue contribution was coming from the United States. When I joined, it was about 15% of international contribution. By the end of 2017, we'd grown that to 31%, so that's a significant improvement in contribution overall from our international customer base even though the company was growing globally at a very fast rate. >> And that's also not only fast by any stretch of the imagination in terms of growth, some have said," Oh well, maybe Hortonworks, "just like Cloudera, maybe they're going to plateau off "because the bloom is off the rose of Hadoop." But really, Hadoop is just getting going as a market segment or as a platform but you guys have diversified well beyond that. So give us a sense for going forward. What are your customers? What kind of projects are you positioning and selling Hortonworks solutions into now? Is it a different, well you've only been there 18 months, but is it shifting towards more things to do with streaming, NiFi and so forth? Does it shift into more data science related projects? Coz this is worldwide. >> Yeah. That's a great question. This company was founded on the premise that data volumes and diversity of data is continuing to explode and we believe that it was necessary for us to come and bring enterprise-grade security and management and governance to the core Hadoop platform to make it really ready for the enterprise, and that's what the first evolution of our journey was really all about. A number of years ago, we acquired a company called Onyara, and the logic behind that acquisition was we believe companies now wanted to go out to the point of origin, of creation of data, and manage data throughout its entire life cycle and derive pre-event as well as post-event analytical insight into their data. So what we've seen as our customers are moving beyond just unifying data in the data lake and deriving post-transaction inside of their data. They're now going all the way out to the edge. They're deriving insight from their data in real time all the way from the point of creation and getting pre-transaction insight into data as well so-- >> Pre-transaction data, can you define what you mean by pre-transaction data. >> Well, I think if you look at it, it's really the difference between data in motion and data at rest, right? >> Oh, yes. >> A specific example would be if a customer walks into the store and they've interacted in the store maybe on social before they come in or in some other fashion, before they've actually made the purchase. >> Engagement data, interaction data, yes. >> Engagement, exactly. Exactly. Right. So that's one example, but that also extends out to use cases in IoT as well, so data in motion and streaming data, as you mentioned earlier since become a very, very significant use case that we're seeing a lot of adoption for. Data science, I think companies are really coming to the realization that that's an essential role in the organization. If we really believe that data is the most important asset, that it's the crucial asset in the new economy, then data scientist becomes a really essential role for any company. >> How do your Asian customers' requirements differ, or do they differ from your European cause European customers clearly already have their backs against the wall. We have five weeks until GDPR goes into effect. Do many of your Asian customer, I'm sure a fair number sell into Europe, are they putting a full court, I was going to say in the U.S., a full court press on complying with GDPR, or do they have equivalent privacy mandates in various countries in Asia or a bit of both? >> I think that one of the primary drivers I see in Asia is that a lot of companies there don't have the years of legacy architecture that European companies need to contend with. In some cases, that means that they can move towards next generation data-orientated architectures much quicker than European companies have. They don't have layers of legacy tech that they need to sunset. A great example of that is Reliance. Reliance is the largest company in India, they've got a subsidiary called GO, which is the fastest growing telco in the world. They've implemented our technology to build a next-generation OSS system to improve their service delivery on their network. >> Operational support system. >> Exactly. They were able to do that from the ground up because they formed their telco division around being a data-only company and giving away voice for free. So they can in some extent, move quicker and innovate a little faster in that regards. I do see much more emphasis on regulatory compliance in Europe than I see in Asia. I do think that GDPR amongst other regulations is a big driver of that. The other factor though I think that's influencing that is Cloud and Cloud strategy in general. What we've found is that, customers are drawn to the Cloud for a number of reasons. The economics sometimes can be attractive, the ability to be able to leverage the Cloud vendors' skills in terms of implementing complex technology is attractive, but most importantly, the elasticity and scalability that the Cloud provides us, hugely important. Now, the key concern for customers as they move to the Cloud though, is how do they leverage that as a platform in the context of an overall data strategy, right? And when you think about what a data strategy is all about, it all comes down to understanding what your data assets are and ensuring that you can leverage them for a competitive advantage but do so in a regulatory compliant manner, whether that's data in motion or data at rest. Whether it's on-prem or in the Cloud or in data across multiple Clouds. That's very much a top of mind concern for European companies. >> For your customers around the globe, specifically of course, your area of Europe and Asia, what percentage of your customers that are deploying Hortonworks into a purely public Cloud environment like HDInsight and Microsoft Azure or HDP inside of AWS, in a public Cloud versus in a private on-premises deployment versus in a hybrid public-private multi Cloud. Is it mostly on-prem? >> Most of our business is still on-prem to be very candid. I think almost all of our customers are looking at migrating, some more close to the Cloud. Even those that had intended to have a Cloud for a strategy have now realized that not all workloads belong in the Cloud. Some are actually more economically viable to be on-prem, and some just won't ever be able to move to the Cloud because of regulation. In addition to that, most of our customers are telling us that they actually want Cloud optionality. They don't want to be locked in to a single vendor, so we very much view the future as hybrid Cloud, as multi Cloud, and we hear our customers telling us that rather than just have a Cloud strategy, they need a data strategy. They need a strategy to be able to manage data no matter where it lives, on which tier, to ensure that they are regulatory compliant with that data. But then to be able to understand that they can secure, govern, and manage those data assets at any tier. >> What percentage of your deals involve a partner? Like IBM is a major partner. Do you do a fair amount of co-marketing and joint sales and joint deals with IBM and other partners or are they mostly Hortonworks-led? >> No, partners are absolutely critical to our success in the international sphere. Our partner revenue contribution across EMEA in the past year grew, every region grew by over 150% in terms of channel contribution. Our total channel business was 28% of our total, right? That's a very significant contribution. The growth rate is very high. IBM are a big part of that, as are many other partners. We've got, the very significant reseller channel, we've got IHV and ISV partners that are critical to our success also. Where we're seeing the most impact with with IBM is where we go to some of these markets where we haven't had a presence previously, and they've got deep and long-standing relationships and that helps us accelerate time to value with our customers. >> Yeah, it's been a very good and solid partnership going back several years. Well, Joe, this is great, we have to wrap it up, we're at the end of our time slot. This has been Joe Morrissey who is the VP for International at Hortonworks. We're on theCUBE here at Dataworks Summit 2018 in Berlin, and want to thank you all for watching this segment and tune in tomorrow, we'll have a full slate of further discussions with Hortonworks, with IBM and others tomorrow on theCUBE. Have a good one. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and Hortonworks is the host of Dataworks Summit. and the kinds of opportunities you're selling into. Also, now the fastest to get to $200 million of the imagination in terms of growth, and governance to the core Hadoop platform Pre-transaction data, can you define what you mean maybe on social before they come in or Engagement data, that that's an essential role in the organization. Do many of your Asian customer, that they need to sunset. the ability to be able to leverage the Cloud vendors' skills and Microsoft Azure or Most of our business is still on-prem to be very candid. and joint deals with IBM that are critical to our success also. and want to thank you all for watching this segment and

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Joe Morrissey	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Asia	LOCATION	0.99+
Europe	LOCATION	0.99+
Joe	PERSON	0.99+
Uruguay	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
India	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
seven years	QUANTITY	0.99+
Wikibon	ORGANIZATION	0.99+
28%	QUANTITY	0.99+
South Africa	LOCATION	0.99+
Onyara	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
United States	LOCATION	0.99+
$100 million	QUANTITY	0.99+
$200 million	QUANTITY	0.99+
31%	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
18 months	QUANTITY	0.99+
GO	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
2017	DATE	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
one example	QUANTITY	0.99+
one	QUANTITY	0.98+
today	DATE	0.98+
U.S.	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
AWS	ORGANIZATION	0.98+
Berlin, Germany	LOCATION	0.98+
over 40%	QUANTITY	0.98+
Microsoft	ORGANIZATION	0.98+
Reliance	ORGANIZATION	0.98+
over 150%	QUANTITY	0.97+
Dataworks Summit	EVENT	0.97+
EMEA	ORGANIZATION	0.97+
first evolution	QUANTITY	0.96+
2018	EVENT	0.96+
European	OTHER	0.96+
SiliconANGLE Media	ORGANIZATION	0.95+
Munich, Germany	LOCATION	0.95+
One	QUANTITY	0.95+
end of 2017	DATE	0.94+
Hadoop	TITLE	0.93+
Cloudera	ORGANIZATION	0.93+
about 15%	QUANTITY	0.93+
past year	DATE	0.92+
theCUBE	ORGANIZATION	0.92+
single vendor	QUANTITY	0.91+
telco	ORGANIZATION	0.89+
Munich Re	ORGANIZATION	0.88+

Fernando Lopez, Quanam | Dataworks 2018

>> Narrator: From Berlin, Germany, it's theCUBE, covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Well hello, welcome to the Cube. I'm James Kobielus, I'm the lead analyst for the Wikibon team within SiliconANGLE Media. I'm your host today here at Dataworks Summit 2018 in Berlin, Germany. We have one of Hortonworks' customers in South America with us. This is Fernando Lopez of Quanam. He's based in Montevideo, Uruguay. And he has won, here at the conference, he and his company have won an award, a data science award so what I'd like to do is ask Fernando, Fernando Lopez to introduce himself, to give us his job description, to describe the project for which you won the award and take it from there, Fernando. >> Hello and thanks for the chance >> Great to have you. >> I work for Quanam, as you already explained. We are about 400 people in the whole company. And we are spread across Latin America. I come from the kind of headquarters, which is located in Montevideo, Uruguay. And there we have a business analytics business unit. Within that, we are about 70 people and we have a big data and artificial intelligence and cognitive computing group, which I lead. And yes, we also implement Hortonworks. We are actually partnering with Hortonworks. >> When you say you lead the group, are you a data scientist yourself, or do you manage a group of data scientists or a bit of both? >> Well a bit of both. You know, you have to do different stuff in this life. So yes, I lead implementation groups. Sometimes the project is more big data. Sometimes it's more data science, different flavors. But within this group, we try to cover different aspects that are related in some sense with big data. It could be artificial intelligence. It could be cognitive computing, you know. >> Yes, so describe how you're using Hortonworks and describe the project for which you won, I assume it's a one project, for which you won the award, here at this conference. >> All right, yes. We are running several projects, but this one, the one about the prize, is one that I like so much because I'm actually a bioinformatics student so I have a special interest in this one. >> James: Okay. >> It's good to clarify that this was a joint effort between Quanam and GeneLifes. >> James: Genelabs. >> GeneLifes. >> James: GeneLifes. >> Yes, it's genetics and bioinformatics company. >> Right. >> That they specialize-- >> James: Is that a Montevideo based company? >> Yes. In a line, they are a startup that was born from the Institut Pasteur, but in Montevideo and they have a lot of people, who are specialists in bioinformatics, genetics, with a long career in the subject. And we come from the other side, from big data. I was kind of in the middle because of my interest with bioinformatics. So something like one year and a half ago, we met both companies. Actually there is a research, an innovation center, ICT4V. You can visit ICT4V.org, which is a non-profit organization after an agreement between Uruguay and France, >> Oh okay. >> Both governments. >> That makes possible different private or public organizations to collaborate. We have brainstorming sessions and so on. And from one of that brainstorming sessions, this project was born. So, after that we started to discuss ideas of how to bring tools to the medical genetiticists in order to streamline his work, in order to put on the top of his desktop different tools that could make his work easier and more productive. >> Looking for genetic diseases, or what are they looking for in the data specifically? >> Correct, correct. >> I'm not a geneticist but I try to explain myself as good as I can. >> James: Okay, that's good. You have a great job. >> If I am-- >> If I am the doctor, then I will spend a lot of hours researching literature. Bear in mind that we have nearly 300 papers each day, coming up in PubMed, that could be related with genetics. That's a lot. >> These are papers in Spanish that are published in South America? >> No, just talking about, >> Or Portuguese? >> PubMed from the NIH, it's papers published in English. >> Okay. >> PubMed or MEDLINE or-- >> Different languages different countries different sources. >> Yeah but most of it or everything in PubMed is in English. There is another PubMed in Europe and we have SciELO in Latin America also. But just to give you an idea, there's only from that source, 300 papers each day that could be related to genetics. So only speaking about literature, there's a huge amount of information. If I am the doctor, it's difficult to process that. Okay, so that's part of the issue. But on the core of the solution, what we want to give is, starting from the sequence genome of one patient, what can we assert, what can we say about the different variations. It is believed that we have around, each one of us, has about four million mutations. Mutation doesn't mean disease. Mutation actually leads to variation. And variation is not necessarily something negative. We can have different color of the eyes. We can have more or less hair. Or this could represent some disease, something that we need to pay attention as doctors, okay? So this part of the solution tries to implement heuristics on what's coming from the sequencing process. And this heuristics, in short, they tell you, which is the score of each variant, variation, of being more or less pathogenic. So if I am the doctor, part of the work is done there. Then I have to decide, okay, my diagnosis is there is this disease or not. This can be used in two senses. It can be used as prevention, in order to predict, this could happen, you have this genetic risk or this could be used in order to explain some disease and find a treatment. So that's the more bioinformatics part. On the other hand we have the literature. What we do with the literature is, we ingest this 300 daily papers, well abstracts not papers. Actually we have about three million abstracts. >> You ingest text and graphics, all of it? >> No, only the abstract, which is about a few hundred words. >> James: So just text? >> Yes >> Okay. >> But from there we try to identify relevant identities, proteins, diseases, phenotypes, things like that. And then we try to infer valid relationships. This phenotype or this disease can be caused because of this protein or because of the expression of that gene which is another entity. So this builds up kind of ontology, we call it the mini-ontology because it's specific to this domain. So we have kind of mini-semantic network with millions of nodes and edges, which is quite easy to interrogate. But the point is, there you have more than just text. You have something that is already enriched. You have a series of nodes and arrows, and you can query that in terms of reasoning. What leads to what, you know? >> So the analytical tools you're using, they come from, well Hortonworks doesn't make those tools. Are they coming from another partner in South America? Or another partner of Hortonworks' like an IBM or where does that come from? >> That's a nice question. Actually, we have an architecture. The core of the architecture is Hortonworks because we have scalability topics >> James: Yeah, HDP? >> Yes, HDFS, High-von-tessa, Spark. We have a number of items that need to be easily, ultra-escalated because when we talk about genome, it's easy to think about one terrabyte per patient of work. So that's one thing regarding storage and computing. On the other hand, we use a graph database. We use Neo4j for that. >> James: Okay the Neo4j for graph. The Neo4j, you have Hortonworks. >> Yes and we also use, in order to process natural language processing, we use Nine, which is based here in Berlin, actually. So we do part of the machine learning with Nine. Then we have Neo4j for the graph, for building this semantic network. And for the whole processing we have Hortonworks, for running this analysis and heuristics, and scoring the variance. We also use Solr for enterprise search, on top of the documents, or the conclusions of the documents that come from the ontology. >> Wow, that's a very complex and intricate deployment. So, great, in terms of the takeaways from this event, we only just have a little bit more time, what of all the discussions, the breakouts and the keynotes did you find most interesting so far about this show? Data stewardship was a theme of Scott Knowles, with that new solution, you know, in terms of what you're describing as operational application, have you built out something that can be deployed, is being deployed by your customers on an ongoing basis? It wasn't a one-time project, right? This is an ongoing application they can use internally. Is there a need in Uruguay or among your customers to provide privacy protections on this data? >> Sure. >> Will you be using these solutions like the data studio to enable a degree of privacy, protection of data equivalent to what, say, GDPR requires in Europe? Is that something? >> Yes actually we are running other projects in Uruguay. We are helping the, with other companies, we are helping the National Telecommunications Company. So there are security and privacy topics over there. And we are also starting these days a new project, again with ICT4V, another French company. We are in charge of their big data part, for an education program, which is based on the one laptop per child initiative, from the times of Nicholas Negroponte. Well, that initiative has already 10 years >> James: Oh from MIT, yes. >> Yes, from MIT, right. That initiative has already 10 years old in Uruguay, and now it has evolved also to retired people. So it's a kind of going towards the digital society. >> Excellent, I have to wrap it up Fernando, that's great you have a lot of follow on work. This is great, so clearly a lot of very advanced research is being done all over the world. I had the previous guest from South Africa. You from Uruguay so really south of the Equator. There's far more activity in big data than, we, here in the northern hemisphere, Europe and North America realize so I'm very impressed. And I look forward to hearing more from Quanam and through your provider, Hortonworks. Well, thank you very much. >> Thank you and thanks for the chance. >> It was great to have you here on theCUBE. I'm James Kobielus, we're here at DataWorks Summit, in Berlin and we'll be talking to another guest fairly soon. (mood music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. to describe the project for which you won the award And there we have a business analytics business unit. Sometimes the project is more big data. and describe the project for which you won, the one about the prize, is one that I like so much It's good to clarify that this was a joint effort from the Institut Pasteur, but in Montevideo So, after that we started to discuss ideas of how to explain myself as good as I can. You have a great job. Bear in mind that we have nearly 300 papers each day, On the other hand we have the literature. But the point is, there you have more than just text. So the analytical tools you're using, The core of the architecture is Hortonworks We have a number of items that need to be James: Okay the Neo4j for graph. to process natural language processing, we use Nine, So, great, in terms of the takeaways from this event, from the times of Nicholas Negroponte. and now it has evolved also to retired people. You from Uruguay so really south of the Equator. It was great to have you here on theCUBE.

ENTITIES

Entity	Category	Confidence
Fernando	PERSON	0.99+
James	PERSON	0.99+
James Kobielus	PERSON	0.99+
Uruguay	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Fernando Lopez	PERSON	0.99+
Berlin	LOCATION	0.99+
Europe	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Hortonworks'	ORGANIZATION	0.99+
South Africa	LOCATION	0.99+
MIT	ORGANIZATION	0.99+
NIH	ORGANIZATION	0.99+
Scott Knowles	PERSON	0.99+
South America	LOCATION	0.99+
300 papers	QUANTITY	0.99+
Nicholas Negroponte	PERSON	0.99+
10 years	QUANTITY	0.99+
ICT4V	ORGANIZATION	0.99+
GeneLifes	ORGANIZATION	0.99+
both companies	QUANTITY	0.99+
Institut Pasteur	ORGANIZATION	0.99+
PubMed	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
North America	LOCATION	0.99+
Montevideo	LOCATION	0.99+
Montevideo, Uruguay	LOCATION	0.99+
Latin America	LOCATION	0.99+
one year and a half ago	DATE	0.99+
GDPR	TITLE	0.99+
two senses	QUANTITY	0.99+
Quanam	ORGANIZATION	0.99+
MEDLINE	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
English	OTHER	0.98+
Dataworks Summit	EVENT	0.98+
Wikibon	ORGANIZATION	0.98+
one-time	QUANTITY	0.97+
about 70 people	QUANTITY	0.97+
Portuguese	OTHER	0.97+
Equator	LOCATION	0.97+
one thing	QUANTITY	0.97+
2018	EVENT	0.97+
one project	QUANTITY	0.97+
each variant	QUANTITY	0.97+
National Telecommunications Company	ORGANIZATION	0.97+
millions of nodes	QUANTITY	0.97+
each one	QUANTITY	0.97+
about 400 people	QUANTITY	0.96+
both	QUANTITY	0.96+
one patient	QUANTITY	0.96+
nearly 300 papers	QUANTITY	0.95+
DataWorks Summit	EVENT	0.95+
one laptop	QUANTITY	0.94+
Both governments	QUANTITY	0.94+

Muggie van Staden, Obsidian | Dataworks Summit 2018

>> Voiceover: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Hi, hello, welcome to theCUBE, I'm James Kobielus. I'm the lead analyst for Big Data Analytics at the Wikibon, which is the team inside of SiliconANGLE Media that focuses on emerging trends and technologies. We are here, on theCUBE at DataWorks Summit 2018 in Berlin, Germany. And I have a guest here. This is, Muggie, and if I get it wrong, Muggie Van Staden >> That's good enough, yep. >> Who is with Obsidian, which is a South Africa-based partner of Hortonworks. And I'm not familiar with Obsidian, so I'm going to ask Muggie to tell us a little bit about your company, what you do, your focus on open source, and really the opportunities you see for big data, for Hadoop, in South Africa, really the African continent as a whole. So, Muggie? >> Yeah, James great to be here. Yes, Obsidian, we started it 23 years ago, focusing mostly on open source technologies, and as you can imagine that has changed a lot over the last 23 years when we started the concept of selling Linux was basically a box with a hat and maybe a T-shirt in it. Today that's changed. >> James: Hopefully there's a stuffed penguin in there, too. (laughing) I could use that right now. >> Maybe a manual. So our business has evolved a lot over the last 23 years. And one of the technologies that has come around is Hadoop. And we actually started with some of the other Hadoop vendors out there as our first partnerships, and probably three or four years ago we decided to take on Hortonworks as one of our vendors. We found them an amazing company to work with. And together with them we've now worked in four of the big banks in South Africa. One of them is actually here at DataWorks Summit. They won an award last night. So it's fantastic to be part of all of that. And yes, South Africa being so far removed from the rest of the world. They have different challenges. Everybody's nervous of Cloud. We have the joys that we don't really have any Cloud players locally yet. The two big players are in Microsoft and Amazon are planning some data centers soon. So the guys have different challenges to Europe and to the States. But big data, the big banks are looking at it, starting to deploy nice Hadoop clusters, starting to ingest data, starting to get real business value out of it, and we're there to help, and hopefully the four is the start for us and we can help lots of customers on this journey. >> Are South African-based companies, because you are so distant in terms of miles on the planet from Europe, from the EU, is any company in South Africa, or many companies, concerned at all about the global, or say the general data protection regulation, GDPR? US-based companies certainly are 'cause they operate in Europe. So is that a growing focus for them? And we have five weeks until GDPR kicks in. So tell me about it. >> Yeah, so from a South African point of view, some of the banks and some of the companies would have subsidiaries in Europe. So for them it's a very real thing. But we have our own Act called PoPI, which is the protection of private information, so very similar. So everybody's keeping an eye on it. Everybody's worried. I think everybody's worried for the first company to be fined. And then they will all make sure that they get their things right. But, I think not just because of a legislation, I think it's something that everybody should worry about. How do we protect data? How do we make sure the right people have access to the correct data when they should and nobody violates that because I mean, in this day and age, you know, Google and Amazon and those guys probably know more about me than my family does. So it's a challenge for everybody. And I think it's just the right thing for companies to do is to make sure that the data that they do have that they really do take good care of it. We trust them with our money and now we're trusting them with our data. So it's a real challenge for everybody. >> So how long has Obsidian been a partner of Hortonworks and how has your role, or partnership I should say, evolved over that time, and how do you see it evolving going forward. >> We've been a partner about three or four years now. And started off as a value added reseller. We also a training partner in South Africa for them. And as they as company have evolved, we've had to evolve with them. You know, so they started with HTTP as the Hadoop platform. Now they're doing NiFi and HDF, so we have to learn all of those technologies as well. But very, very excited where they're going with DataPlane service just managing a customer's data across multiple clusters, multiple clouds, because that's realistically where we see all the customers going, is you know clusters, on-premise clusters in typically multiple Clouds and how do you manage that? And we are very excited to walk this road together with Hortonworks and all the South African customers that we have. >> So you say your customers are deploying multiple Clouds. Public Clouds or hybrid private-public Clouds? Give us a sense, for South Africa, whether public Cloud is a major, or is a major deployment option or choice for financial services firms that you work with. >> Not necessarily financial services, so most of them are kicking tires at this stage, nobody's really put major work loads in there. As I mentioned, both Amazon and Microsoft are planning to put data centers down in South Africa very soon, and I think that will spur a big movement towards Cloud, but we do have some customers, unfortunately not Hortonworks customers, that are actually mostly in the Cloud. And they are now starting to look at a multi-Cloud strategy. So to ideally be in the three or four major Cloud providers and spinning up the right workloads in the right Cloud, and we're there to help. >> One of the most predominant workloads that your customers are running in the Cloud, is it backend in terms of data ingest and transformation? Is it a bit of maybe data warehousing with unstructured data? Is it a bit of things like queriable archiving. I want to get a sense for, what is predominant right now in workloads? >> Yeah I think most of them start with (mumble) environments. (mumbles) one customer that's heavily into Cloud from a data point of view. Literally it's their data warehouse. They put everything in there. I think from the banking customers, most of them are considering DR of their existing Hadoop clusters, maybe a subset of their data and not necessarily everything. And I think some of them are also considering putting their unstructured data outside on the Cloud because that's where most of it's coming from. I mean, if you have Twitter, Facebook, LinkedIn data, it's a bit silly to pull all of that into your environment, why not just put it in the Cloud, that's where it's coming from, and analyze that and connect it back to your data where relevant. So I think a lot of the customers would love to get there, and now Hortonworks makes it so much easier to do that. I think a lot of them will start moving in that direction. Now, excuse me, so are any or many of your customers doing development and training of machine learning algorithms and models in their Clouds? And to the extent that they are, are they using tools like the IBM Data Science Experience that Hortonworks resells for that? >> I think it's definitely on the radar for a lot of them. I'm not aware of anybody using it yet, but lots of people are looking at it and excited about the partnership between IBM and Hortonworks. And IBM has been a longstanding player in the South African market, and it's exciting for us as well to bring them into the whole Hortonworks ecosystem, and together solve real world problems. >> Give us a sense for how built out the big data infrastructure is in neighboring countries like Botswana or Angola or Mozambique and so forth. Is that an area that your company, are those regions that your company operates in? Sells into? >> We don't have offices, but we don't have a problem going in and helping customers there, so we've had projects in the past, not data related, that we've flown in and helped people. Most of the banks from a South African point of view, have branches into Africa. So it's on the roadmap, some are a little bit ahead of others, but definitely on the roadmap to actually put down Hadoop clusters in some of the major countries all throughout Africa. There's a big debate, do you put it down there, do you leave the data in South Africa? So they're all going through their own legislation, but it's definitely on the roadmap for all of them to actually take their data, knowledge in data science, up into Africa. >> Now you say that in South Africa Proper, there are privacy regulations, you know, maybe not the same as GDPR, but equivalent. Throughout Africa, at least throughout Southern Africa, how is privacy regulation lacking or is it emerging? >> I think it's emerging. A lot of the countries do have the basic rule that their data shouldn't leave the country. So everybody wants that data sovereignty and that's why a lot of them will not go to Cloud, and that's part of the challenges for the banks, that if they have banks up in Botswana, etc. And Botswana rules are our data has to stay in country. They have to figure out a way how do they connect that data to get the value for all of their customers. So real world challenges for everybody. >> When you're going into and selling into an emerging, or developing nation, of you need to provide upfront consulting to help the customer bootstrap their own understanding of the technology and making the business case and so forth. And how consultative is the selling process... >> Absolutely, and what we see with the banks, most of them even have a consultative approach within their own environment, so you would have the South African team maybe flying into the team at (mumbles) Botswana, and share some of the learnings that they've had. And then help those guys get up to speed. The reality is the skills are not necessarily in country. So there's a lot of training, a lot of help to go and say, we've done this, let us upscale you. And be a part of that process. So we sometimes send in teams to come and do two, three day training, basics, etc., so that ultimately the guys can operationalize in each country by themselves. >> So, that's very interesting, so what do you want to take away from this event? What do you find most interesting in terms of the sessions you've been in around the community showcase that you can take back to Obsidian, back in your country and apply? Like the announcement this morning of the Data Steward Studio. Do you see a possible, that your customers might be eager to use that for curation of their data in their clusters? >> Definitely, and one of the key messages for me was Scott, the CTO's message about your data strategy, your Cloud strategy, and your business strategy. It is effectively the same thing. And I think that's the biggest message that I would like to take back to the South African customers is to go and say, you need to start thinking about this. You know, as Cloud becomes a bigger reality for us, we have to align, we have to go and say, how do we get your data where it belongs? So you know, we like to say to our customers, we help the teams get the right code to the right computer and the right data, and I think it's absolutely critical for all of the customers to go and say, well, where is that data going to sit? Where is the right compute for that piece of data? And can we get it then, can we manage it, etc.? And align to business strategy. Everybody's trying to do digital transformation, and those three things go very much hand-in-hand. >> Well, Muggie, thank you very much. We're at the end of our slot. This has been great. It's been excellent to learn more about Obsidian and the work you're doing in South Africa, providing big data solutions or working with customers to build the big data infrastructure in the financial industry down there. So this has been theCUBE. We've been speaking with Muggie Van Staden of Obsidian Systems, and here at DataWorks Summit 2018 in Berlin. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. I'm the lead analyst for Big Data Analytics at the Wikibon, and really the opportunities you see for big data, and as you can imagine that has changed a lot I could use that right now. So the guys have different challenges to Europe or say the general data protection regulation, GDPR? And I think it's just the right thing for companies to do and how do you see it evolving going forward. And we are very excited to walk this road together So you say your customers are deploying multiple Clouds. And they are now starting to look at a multi-Cloud strategy. One of the most predominant workloads and now Hortonworks makes it so much easier to do that. and excited about the partnership the big data infrastructure is in neighboring countries but definitely on the roadmap to actually put down you know, maybe not the same as GDPR, and that's part of the challenges for the banks, And how consultative is the selling process... and share some of the learnings that they've had. around the community showcase that you can take back for all of the customers to go and say, and the work you're doing in South Africa,

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Muggie Van Staden	PERSON	0.99+
Africa	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Muggie van Staden	PERSON	0.99+
Botswana	LOCATION	0.99+
Mozambique	LOCATION	0.99+
Angola	LOCATION	0.99+
Muggie	PERSON	0.99+
Scott	PERSON	0.99+
South Africa	LOCATION	0.99+
James	PERSON	0.99+
Southern Africa	LOCATION	0.99+
two	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
three day	QUANTITY	0.99+
three	QUANTITY	0.99+
GDPR	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Obsidian Systems	ORGANIZATION	0.99+
first company	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
four	QUANTITY	0.99+
first partnerships	QUANTITY	0.99+
three	DATE	0.99+
Today	DATE	0.98+
Linux	TITLE	0.98+
23 years ago	DATE	0.98+
DataWorks Summit 2018	EVENT	0.98+
both	QUANTITY	0.97+
EU	LOCATION	0.97+
Wikibon	ORGANIZATION	0.97+
one	QUANTITY	0.97+
PoPI	TITLE	0.97+
Data Steward Studio	ORGANIZATION	0.97+
each country	QUANTITY	0.97+
Cloud	TITLE	0.97+
US	LOCATION	0.96+
last night	DATE	0.96+
SiliconANGLE Media	ORGANIZATION	0.96+
four years	QUANTITY	0.96+
DataWorks Summit	EVENT	0.96+
Hadoo	ORGANIZATION	0.96+
One	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.95+
Hadoop	ORGANIZATION	0.93+
about three	QUANTITY	0.93+
two big players	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+

Andreas Kohlmaier, Munich Re | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's The Cube. Covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to The Cube. I'm James Kobielus. I'm the Lead Analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. We are here at DataWorks Summit 2018 in Berlin. Of course, it's hosted by a Hortonworks. We are in day one of two days of interviews with executives, with developers, with customers. And this morning the opening keynote, one of the speaker's was a customer of Hortonworks from Munich Re, the reinsurance company based of course in Munich, Germany. Andreas Kohlmaier, who's the the head of Data Engineering I believe, it was an excellent discussion you've built out of data lake. And the first thing I'd like to ask you Andreas is right now it's five weeks until GDPR, the general data protection regulation, goes into full force on May 25th. And of course it applies to the EU, to anybody who does business in the EU including companies based elsewhere, such as in the US, needs to start complying with GDPR in terms of protecting personal data. Give us a sense for how Munich Re is approaching the deadline, your level of readiness to comply with GDPR, and how your investment in your data lake serves as a foundation for that compliance. >> Absolutely. So thanks for the question. GDPR, of course, is the hot topic across all European organizations. And we actually pretty well prepared. We compiled all the processes and the necessary regulations and in fact we are now selling this also as a service product to our customers. This has been an interesting side effect because we have lots of other insurance companies and we started to think about why not offer this as a service to other insurance companies to help them prepare for GDPR. This is actually proving to be one of the exciting interesting things that can happen about GDPR. >> Maybe that would be your new line of business. You make more money doing that then. >> I'm not sure! (crosstalk) >> Well that's excellent! So you've learned a lot of lessons. So already so you're ready for May 25th? You have, okay, that's great. You're probably far ahead of I know a lot of U.S. based firms. We're, you know in our country and in other countries, we're still getting our heads around all the steps that are needed so you know many companies outside the EU may call on you guys for some consulting support. That's great! So give us a sense for your data lake. You discussed it this morning but can you give us a sense for the business justification for building it out? How you've rolled it out? What stage it's in? Who's using it for what? >> So absolutely. So one of the key things for us at Munich Re is the issue about complexity or data diversity as it was also called this morning. So we have so many different areas where we are doing business in and we have lots of experts in the different areas. And those people and I really have they are very knowledgeable in the area and now they also get access to new sources of information. So to give you a sense we have people for example that are really familiar with weather and climate change, also with satellites. We have captains for ships and pilots for aircraft. So we have lots of expertise in all the different areas. Why? Because we are taking those risks in our books. >> Those are big risks too. You're a reinsurance company so yeah. >> And these are actually complex risks where we really have people that really are experts on their field. So we have sometimes have people that have 20 years plus of experience in the area and then they change to the insurer to actually bring their expertise on the field also to the risk management side. And all those people, they now get an additional source of input which is the data that is now more or less readily available everywhere. So first of all, we are getting new data with the submissions and the risks that we are taking and there are also interesting open data sources to connect to so that those experts can actually bring their knowledge and their analytics to a new level by adding the layer of data and analytics to their existing knowledge. And this allows us, first of all, to understand the risks even better, to put a better price tag on that, and also to take up new risks that have not been possible to cover before. So one of the things is also in the media I think is that we are also now covering the Hyperloop once it's going to be built. So those kind of new things are only possible with data analytics. >> So you're a Hortonworks customer. Give us a sense for how you're using or deploying Hortonworks data platform or data plane service and whatnot inside of your data lake. It sounds like it's a big data catalog, is that a correct characterization? >> So one of the things that is key to us is actually finding the right information and connecting those different experts to each other. So this is why the data catalog plays a central role. Here we have selected Alation as a catalog tool to connect the different experts in the group. The data lake at the moment is an on-prem installation. We are thinking about moving parts of that workload to the cloud to actually save operation costs. >> On top of HTP. >> Yeah so Alation is actually as far as I know technically it's a separate server that indexes the hive tables on HTP. >> So essentially the catalog itself is provides visualization and correlation across disparate data sources that are managing your hadoop. >> Yeah, so the the catalog actually is a great way of connecting the experts together. So that's you know okay if we have people on one part of the group that are very knowledgeable about weather and they have great data about weather then we'd like to connect them for example to the guys that doing crop insurance for India so that they can use the weather data to improve the models for example for crop insurance in Asia. And there the data catalog helps us to connect those experts because you can first of all find the data sources and you can also see who is the expert on the data. You can then also call them up or ask them a question in the tool. So it's essentially a great way to share knowledge and to connect the different experts of the group. >> Okay, so it's also surfacing up human expertise. Okay, is it also serving as a way to find training datasets possibly to use to build machine learning models to do more complex analyses? Is that something that you're doing now or plan to do in the future? >> Yes, so we are doing some of course machine learning also deep learning projects. We are also just started a Center of Excellence for artificial intelligence to see okay how we can use deep learning and machine learning also to find different ways of pricing insurance lists for example and this of course for all those cases data is key and we really need people to get access to the right data. >> I have to ask you. One of the things I'm seeing, you mentioned Center of Excellence for AI. I'm seeing more companies consider, maybe not do it, consider establishing a office of the chief AI officer like reporting to the CEO. I'm not sure that that's a great idea for a lot of businesses but since an insurance company lives and dies by data and calculations and so forth, is that something that Munich Re is doing or considering in a C-Suite level officer of that sort responsible for this AI competency or no? >> Could be in the future. >> Okay. >> We sort of just started with the AI Center of Excellence. That is now reporting to our Chief Data Officer so it's not yet a C-Suite. >> Is the Center of Excellence for AI, is it simply like a training institute to provide some basic skill building or is there something more there? Do you do development? >> Actually they are trying out and developing ways on how we can use AI on deep learning for insurance. One of the core things of course is also about understanding natural language to structure the information that we are getting in PDFs and in documents but really also while using deep learning as a new way to build tariffs for the insurance industry. So that's one of the the core things to find and create new tariffs. And we also experimenting, haven't found the product yet there, whether or not we can use deep learning to create better tariffs. That could also then be one of the services, again we are providing to our customers, the insurance companies and they build that into their products. Something like yeah the algorithms is powered by Munich Re. >> Now your users of your data lake, these are expert quantitative analysts, right, for the most part? So you mentioned using natural language understanding AI capabilities. Is that something that you have a need to do in high volume as a reinsurance company? Take lots of source documents and be able to as it were identify the content and high volume and important you know not OCR but rather the actual build a graph of semantic graph of what's going on inside the document? >> I'm going to give you an example of the things that we are doing with natural language processing. And this one is about the energy business in the US. So we are actually taking up or seeing most of the risks that are related to oil and gas in the U.S. So all the refineries, all the larger stations, and the the petroleum tanks. They are all in our books and for each and every one of them we get a nice report on risks there with a couple of hundred of pages. And inside these reports there's also some paragraph written in where actually the refinery or the plants gets its supplies from and where it ships its products to. And thence we are seeing all those documents. That's in the scale of a couple of thousands so it's not really huge but all together a couple of hundred thousand pages. We use NLP and AI on those documents to extract the supply chain information out of it so in that way we can stitch together a more or less complete picture of the supply chain for oil and gas in the U.S. which helps us again to better understand that risk because supply chain breakdown is one of the major risk in the world nowadays. >> Andreas, this has been great! We can keep on going on. I'm totally fascinated by your use of AI but also your use of a data lake and I'm impressed by your ability to get your, as a company get your as we say in the U.S. get your GDPR ducks in a row and that's great. So it's been great to have you on The Cube. We are here at DataWorks Summit in Berlin. (techno music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. And the first thing I'd like to ask you Andreas of the exciting interesting things Maybe that would be your new line of business. all the steps that are needed so you know So one of the key things for us at Munich Re You're a reinsurance company so yeah. on the field also to the risk management side. of your data lake. So one of the things that is key to us the hive tables on HTP. So essentially the catalog itself experts of the group. or plan to do in the future? for artificial intelligence to see okay how we One of the things I'm seeing, That is now reporting to our Chief Data Officer so to structure the information that we are getting on inside the document? of the risks that are related to oil and gas in the U.S. So it's been great to have you on The Cube.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Munich Re	ORGANIZATION	0.99+
Andreas Kohlmaier	PERSON	0.99+
May 25th	DATE	0.99+
Andreas	PERSON	0.99+
20 years	QUANTITY	0.99+
US	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Asia	LOCATION	0.99+
GDPR	TITLE	0.99+
two days	QUANTITY	0.99+
U.S.	LOCATION	0.99+
five weeks	QUANTITY	0.99+
Center of Excellence for AI	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
India	LOCATION	0.99+
Berlin, Germany	LOCATION	0.98+
One	QUANTITY	0.98+
Munich Re	LOCATION	0.98+
one	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
one part	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
2018	EVENT	0.96+
EU	ORGANIZATION	0.96+
Munich, Germany	LOCATION	0.96+
each	QUANTITY	0.96+
Dataworks Summit EU 2018	EVENT	0.93+
first thing	QUANTITY	0.88+
Hyperloop	TITLE	0.87+
this morning	DATE	0.86+
Center of Excellence for artificial intelligence	ORGANIZATION	0.85+
Alation	TITLE	0.84+
EU	LOCATION	0.83+
hundred thousand pages	QUANTITY	0.82+
one of	QUANTITY	0.79+
Alation	ORGANIZATION	0.77+
Wikibon	ORGANIZATION	0.74+
couple of hundred of pages	QUANTITY	0.73+
couple of thousands	QUANTITY	0.7+
Cube	ORGANIZATION	0.7+
C-Suite	TITLE	0.69+
first	QUANTITY	0.67+
European	LOCATION	0.6+
Data	PERSON	0.57+
services	QUANTITY	0.56+
every	QUANTITY	0.53+
Europe	LOCATION	0.52+
AI	ORGANIZATION	0.52+
Data	ORGANIZATION	0.5+
couple	QUANTITY	0.43+
Cube	PERSON	0.42+

Bernard Marr | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Well, hello, and welcome to the Cube. I'm James Kobielus. I'm the lead analyst for Big Data Analytics with the Wikibon team within SiliconANGLE Media. We are here at the DataWorks Summit 2018 in Berlin, Germany. And I have a special guest, we have a special guest, Bernard Marr, one of the most influential, thought leaders in the big data analytics arena. And it's not just me saying that. You look at anybody's rankings, Bernard's usually in the top two or three of influentials. He publishes a lot. He's a great consultant. He keynoted this morning on the main stage at Dataworks Summit. It was a very fascinating discussion, Bernard. And I'm a little bit star struck 'cause I assumed you were this mythical beast who just kept putting out these great books and articles and so forth. And I'm glad to have you. So, Bernard, I'd like for you to stand back, we are here in Berlin, in Europe. This is April of 2018, in five weeks time, the general data protection, feels global 'cause it sort of is. >> It is. >> The general data protection regulation will take full force, which means that companies that do business in Europe, in the EU, must under the law protect the personal data they collect on EU citizens ensuring the right to privacy, the right to be forgotten, ensuring user's, people's ability to withhold consent to process and profile and so forth. So that mandate is coming down very fast and so forth. What is your thoughts on GDPR? Is it a good thing, Bernard, is it high time? Is it a burden? Give us your thoughts on GDPR currently. >> Okay, first, let me return all the compliments. It's really great to be here. I think GDPR can be both. And for me it will come down very much to the way it gets implemented. So, in principle for me, it is a good thing because what I've always made companies do and advise them to do is to be completely transparent in the way they're collecting data and using data. I believe that the big data world can't thrive if we don't develop this trust and have this transparency. So in principle, it's a great thing. For me will come down to the implementation of all of this. I had an interesting chat just minutes ago with the event photographer saying that once GDPR kicks in he can't actually publish any photographs without getting written consent for everyone in the photograph. That's a massive challenge and he was saying he can't afford to lose 4% of his global revenue. So I think it will be very interesting to see how this will-- >> How it'll be affecting face recognition, I'm sorry go ahead. >> Bernard: Yeah maybe. >> Well maybe that's a bad thing, maybe it's a good thing. >> Maybe it is, yeah, maybe. So for me, in principle a very good thing. In practice, I'm intrigued to see how this will get implemented. >> Of the clients you consult, what percentage in the EU, without giving away names, what percentage do you think are really ready right now or at least will be by May 25th to comply with the letter of the law? Is it more than 50%? Is it more than 80%? Or will there be a lot of catching up to do in a short period of time? >> My sense is that there's a lot of catching up to do. I think people are scrambling to get ready at the moment. But the thing is nobody really knows what being ready really means. I think there are lots of different interpretations. I've been talking to a few lawyers recently. And everyone has a slightly different interpretation of how far they can push the boundaries, so, again, I'm intrigued to see what will actually happen. And I very much hope that common sense prevails and it will be seen as a good force and something that is actually good for everyone in the field of big data. >> So slightly changing track, in the introduction of you this morning, I think it was John Christ of Hortonworks said that you made a prediction about this year that AI will be used to automate more things than people realize and it'll come along fairly fast. Can you give us a sense for how automation, AI is enabling greater automation, and whether, you know, this is the hot button topic, AI will put lots of people out of work fairly quickly by automating everything that white collar workers and so forth are doing, what are your thoughts there? Is it cause for concern? >> Yes, and it's probably one of the questions I get asked the most and I wish I had a very good answer for it. If we look back at the other, I believe that we are experiencing a new industrial revolution at the moment, and if you look at what the World Economic Forums CEO and founder, Klaus Schwab, is preaching about, it is that we are experiencing this new industrial revolution that will truly transform the workplace and our lives. In history, all of the other three previous industrial revolutions have somehow made our lives better. And we have always found something to do for us. And they have changed the jobs. Again, there was a recent report that looked at some of the key AI trends and what they found is that actually AI produces more new jobs than it destroys. >> Will we all become data scientists under, as AI becomes predominant? Or what's going on here? >> No I don't, and this is, I wish I had the answer to this. For me is the advice I give my own children now is to focus on the really human element of it and probably the more strategic element. The problem is five, six years ago this was a lot easier. I could talk about emotional, intelligence, creativity, with advances in machine learning, this advice is no longer true. And lots of jobs, even some of the things I do, I write for Forbes on a regular basis. I also know that AIs write for Forbes. A lot of the analyst reports are now machine generated. >> Natural language generation, a huge use case for AI that people don't realize. >> Bernard: Absolutely. >> Yeah. >> So, for me I see it, as an optimist I see it positively. I also question whether we as human beings should be going to work eight hours a day doing lots of stuff we quite often don't enjoy. So for me, the challenge is adjusting our economic model to this new reality, and I see that there will be significant disruption over the next 20 years that with all the technology coming in and really challenging our jobs. >> Will AI put you and me out of a job. In other words, will it put the analysts and the consultants out of work and allow people to get expert advice on how to manage technology without having to go through somebody like a you or a me? >> Absolutely, and for me, my favorite example is looking at medicine. If you look at doctors, traditionally you send a doctor to medical school for seven years. You then hope that they retain 10% of what they've learned if you're lucky. Then they gain some experience. You then turn up in the practice with your conditions. Again, if you're super lucky, they might have skim read some of your previous conditions, and then diagnose you. And unless you have something that's very common, the chance that they get this right is very low. So compare this with your old stomping ground IBMs Watson, so they are able to feed all medical knowledge into that cognitive computing platform. They can update this continuously, and you think, and could then talk to Watson eight hours a day if I wanted to about my symptoms. >> But can you trust that advice? Why should you trust the advice that's coming from a bot? Yeah, that's one of the key issues. >> Absolutely, and I think at the moment maybe not quite because there's still a human element that a doctor can bring because they can read your emotions, they can understand your tone of voice. This is going to change with affective computing and the ability for machines to do more of this, too. >> Well science fiction authors run amok of course, because they imagine the end state of perfection of all the capabilities like you're describing. So we perfect robotics. We perfect emotion analytics and so forth. We use machine learning to drive conversational UIs. Clearly a lot of people imagine that the technology, all those technologies are perfected or close to it, so, you know. But clearly you and I know that it's a lot of work to do to get them-- >> And we both have been in the technology space long enough to know that there are promises and there's lots of hype, and then there's a lot of disappointment, and it usually takes longer than most people predict. So what I'm seeing is that every industry I work in, and this is what my prediction is, automation is happening across every industry I work in. More things, even things I thought five years ago couldn't be automated. But to get to a state where it really transforms our world, I think we are still a few years away from that. >> Bernard, in terms of the hype factor for AI, it's out of sight. What do you think is the most hyped technology or application under the big umbrella of AI right now in terms of the hype far exceeds the utility. I don't want to put words in your mouth. I've got some ideas. Your thoughts? >> Lots of them. I think that the two areas I write a lot about and talk to companies a lot about is deep learning, machine learning, and blockchain technology. >> James: Blockchain. >> So they are, for me, they have huge potential, some amazing use cases, at the same time the hype is far ahead of reality. >> And there's sort of an intersection between AI and blockchain right now, but it's kind of tentative. Hey, Bernard, we are at the end of this segment. It's been so great. We could just keep going on and on and on. >> I know we could just be... >> Yeah, there's a lot I've been wanting to ask you for a long time. I want to thank you for coming to theCUBE. >> Pleasure. >> This has been Bernard Marr. I'm James Kobielus on theCUBE from DataWorks Summit in Berlin, and we'll be back with another guest in just a little while. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. And I'm glad to have you. ensuring the right to privacy, I believe that the big data world can't thrive I'm sorry go ahead. In practice, I'm intrigued to see I think people are scrambling to get ready at the moment. in the introduction of you this morning, and if you look at what the World Economic Forums and probably the more strategic element. a huge use case for AI that people don't realize. and I see that there will be significant disruption and allow people to get expert advice the chance that they get this right is very low. Yeah, that's one of the key issues. and the ability for machines to do more of this, too. Clearly a lot of people imagine that the technology, I think we are still a few years away from that. Bernard, in terms of the hype factor for AI, and talk to companies a lot about at the same time the hype is far ahead of reality. Hey, Bernard, we are at the end of this segment. to ask you for a long time. and we'll be back with another guest in just a little while.

ENTITIES

Entity	Category	Confidence
Bernard	PERSON	0.99+
James Kobielus	PERSON	0.99+
Berlin	LOCATION	0.99+
Bernard Marr	PERSON	0.99+
John Christ	PERSON	0.99+
April of 2018	DATE	0.99+
Europe	LOCATION	0.99+
Klaus Schwab	PERSON	0.99+
10%	QUANTITY	0.99+
seven years	QUANTITY	0.99+
May 25th	DATE	0.99+
IBMs	ORGANIZATION	0.99+
James	PERSON	0.99+
4%	QUANTITY	0.99+
more than 80%	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
three	QUANTITY	0.99+
more than 50%	QUANTITY	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
two areas	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
eight hours a day	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
World Economic Forums	ORGANIZATION	0.97+
DataWorks Summit 2018	EVENT	0.97+
2018	EVENT	0.97+
five years ago	DATE	0.96+
five	DATE	0.96+
six years ago	DATE	0.96+
five weeks	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
Dataworks Summit 2018	EVENT	0.92+
Forbes	ORGANIZATION	0.92+
this year	DATE	0.89+
EU	LOCATION	0.88+
three previous industrial revolutions	QUANTITY	0.87+
this morning	DATE	0.86+
EU	ORGANIZATION	0.8+
Big Data Analytics	ORGANIZATION	0.78+
one of	QUANTITY	0.77+
next 20 years	DATE	0.75+
Watson	TITLE	0.75+
SiliconANGLE Media	ORGANIZATION	0.71+
Watson	ORGANIZATION	0.68+
minutes	DATE	0.61+
two	QUANTITY	0.54+
questions	QUANTITY	0.51+
morning	DATE	0.5+
Cube	ORGANIZATION	0.48+
issues	QUANTITY	0.46+

Abhas Ricky, Hortonwork | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the CUBE covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Welcome to the CUBE, we're here at Dataworks Summit 2018 in Berlin. I'm James Kobielus. I am the lead analyst for big data analytics on the Wikibon team of SiliconANGLE Media On the CUBE, we extract the signal from the noise and here at Dataworks Summit, the signal is big data analytics and increasingly the imperative for many enterprises is compliance with GDPR, the General Data Protection Regulation comes in five weeks, May 25th. There's more things going on so what I'm going to be doing today for the next 20 minutes or so is from Hortonworks I have Abhas Ricky who is the director of strategy and innovation. He helps customers, and he'll explain what he does, but at a high level, he helps customers to identify the value of investments in big data, analytics, big data platforms in their business. And Abhas, how do you justify the value of compliance with GDPR. I guess, the value would be avoid penalties for noncompliance, right? Can you do it as an upside as well? Is there an upside in terms of if you make an investment, and you probably will need to make an investment to comply, Can you turn this around as a strategic asset, possibly? Yeah, so I'll take a step back first. >> James: Like a big data catalog and so forth. >> Yeah, so if you look at the value part which you said, it's interesting that you mentioned it. So there's a study which was done by McKinsey which said that only 15% of executives can understand what is the value of a digital initiative, let alone big data initiative. >> James: Yeah. >> Similarly, Gardner says that if you look at the various portraits and if you look at various issues, the fundamental thing which executives struggle with identifying the value which they will get. So that is where I pitch in. That is where I come in and do a data perspective. Now if you look at GDPR specifically, one of the things that we believe, and I've done multiple blogs around that and webinars, GDPR should be treated at a business opportunity because of the fact that -- >> James: Any opportunity? Business opportunity. It shouldn't necessarily be seen as a compliance burden on costs or your balance sheets because of the fact, it is the one single opportunity which allows you to clean up your data supply chain. It allows you to look at your data assets with a holistic view, and if you create a transparent data supply chain, and your IT systems talk to each other. So some of the provisions, as you know, in addition to right to content, right to portability, etc. It is also privacy by design which says that you have to be proactive in defining your IT systems and architecture. It's not necessarily reactive. But guess what? If you're able to do that, you will see the benefits in other use cases like single view of customer or fraud or anti-money laundering because at the end of the day, all GDPR is allowing you to say is that where do you store your data, what's the lineage, what's the provenance? Can you identify what the personally identifiable information is for any particular customer? And can you use that to your effect as you go forward? So it's a great opportunity because to be able to comply with the provisions, you've got to take steps before that which is essentially streamlining your data operations which obviously will have a domino effect on the efficiency of other use cases. So I believe it's a business opportunity. >> Right, now part of that opportunity in terms of getting your arms around what data you have, when the GDPR is concerned, the customer has a right to withhold consent for you and the enterprise that holds that data to use that personal data of theirs which they own for various and sundry reasons. Many enterprises and many of Hortonworks customers are using their big data for things like AI and machine learning. Won't this compliance with GDPR limit their ability to seize the opportunity to build deep learning and so forth? What are customers saying about that? Is that going to be kind of a downer or a chilling effect on their investments in AI and so forth? >> So there's two elements around it. The first thing which you said, there are customers, there's machine learning in AI, yes, there are. But broadly speaking, before you're able to do machine learning and AI, you need to get your data sets onto a particular platform in a particular fashion, clean data, otherwise, you can't do AI or machine learning on top of it. >> James: Right. So the reason why I say it's an opportunity is that because you're being forced by compliance to get that data from every other place onto this platform. So obviously those capabilities will get enhanced. Having said, I do agree if I'm an organization which does targeting, retargeting of customers based on multiple segmentations and then one of the things is online advertisements. In that case, yes, your ability might get affected, but I don't think you'll get prohibited. And that affected time span will be only small because you just adapt. So the good thing about machine learning and AI is that you don't create rules, you don't create manual rules. They pick up the rules based on the patterns and how the data sets have been performing. So obviously once you have created those structures in place, initially, yes, you'll have to make an investment to alter your programs of work. However, going forward, it will be even better. Because guess what? You just cleaned your entire data supply chain. So that's how I would see that, yes, a lot of companies, ecommerce you do targeting and retargeting based on the customer DNA, based on their shopping profiles, based on their shopping ad libs and then based off that, you give them the next best offer or whatever. So, yes, that might get affected initially, but that's not because GDPR is there or not. That's just because you're changing your program software. You're changing the fundamental way by which you're sourcing the data, the way they are coming from and which data can you use. But once you have tags against each of those attributes, once you have access controls, once you know exactly which customer attributes you can touch and you cannot for the purposes, do you have consent or not, your life's even better. The AI tools or the machine learning algorithms will learn from themselves. >> Right, so essentially, once you have a tight ship in terms of managing your data in line with the GDPR strictures and so forth, it sounds like what you're saying is that it gives you as an enterprise the confidence and assurance that if you want to use that data and need to use that data, you know exactly how you've the processes in place to gain the necessary consents from customers. So there won't be any nasty surprises later on of customers complaining because you've got legal procedures for getting the consent and that's great. You know, one of the things, Abhas, we're hearing right now in terms of compliance requirements that are coming along, maybe not apart of GDPR directly yet, but related to it is the whole notion of algorithmic transparency. As you build machine learning models and these machine learning models are driven into working applications, being able to transparently identify if those models make, in particular, let's say autonomous action based on particular data and particular variables, and then there is some nasty consequences like crashing an autonomous vehicle, the ability, they call it explicably AI to roll that back and determine who's liable for that event. Does Hortonworks have any capability within your portfolio to enable more transparency into the algorithmic underpinnings of a given decision? Is that something that you enable in your solutions or that your partner IBM enables through DSX and so forth? Give us a sense whether that's a capability currently that you guys offer and whether that's something in terms of your understand, are customers asking for that yet or is that too futuristic? >> So I would say that it's a two-part question. >> James: Yeah. >> The first one, yes, there are multiple regulations coming in, like Vilica Financial Markets, there's Mid Fair, the BCBS, etc. and organizations have to comply. You've got the IFRS which span to brokers, the insurance, etc., etc. So, yes, a lot of organizations across industries are getting affected by compliance use cases. Where does Hortonworks come into the picture is to be able to be compliant from a data standpoint, A you need to be able to identify which of those data sources you need to implement a particular use case. B you need to get them to a certain point whereby you can do analytics on that And then there's the whole storage and processing and all of that. But also which you might have heard at the keynote today, from a cloud perspective, it's starting to get more and more complex because everyone's moving to the cloud which means, if you look at any large multi-national organization, most of them have a hybrid cloud structure because they work with two or three cloud vendors which makes the process even more complex because now you have multiple clusters, you have have on premise and you have multiple different IT systems who need to talk to each other. Which is where the Hortonworks data plan services come into the picture because it gives you a unified view of your global data assets. >> James: Yes. >> Think of it like a single pane of glass which whereby you can do security and governance across all data assets. So from those angles, yes, we definitely enable those use cases which will help with compliance. >> Making the case to the customer for a big data catalog along the lines of what you guys offer, in making the case, there's a lot of upfront data architectural work that needs to be done to get all you data assets into shape within the context of the catalog. How do they justify making that expense in terms of hiring the people, the data architects and so forth needed to put it all in shape. I mean, how long does it take before you can really stand up in your working data catalog in most companies? >> So again, you've asked two questions. First of all is how do they justify it? Which is where we say that the platform is a means to an end. It's enabling you to deliver use cases. So I look at it in terms of five key value drivers. Either it's a risk reduction or it's a cost reduction or it's a cost avoidance. >> James: Okay. >> Or it's a revenue optimization, or it's time to market. Against each one of these value drivers, or multiple of them or a combination of them, each of the use cases that you're delivering on the platform will lead you to benefits around that. My job, obviously, is to work with the customers and executes to understand what will that be to quantify the potential impact which will then form the basis and give my customer champions enough ammunition so that they can go back and justify those investments. >> James: Abhas, we're going to have to cut it short, but I'm going to let you finish your point here, but we have to end this segment so go ahead. >> That's fine. >> Okay, well, anyway, have had Abhas Ricky who is the director of strategy and innovation at Hortonworks. We're here at Dataworks Summit Berlin. And thank you very much Sorry to cut it short, but we have to move to the next guest. >> No worries, pleasure, thank you very much. >> Take care, have a good one. >> Thanks a lot, yes. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and you probably will need to make an investment to comply, Yeah, so if you look at the value part which you said, the various portraits and if you look at various issues, So some of the provisions, as you know, the customer has a right to withhold consent for you you need to get your data sets onto a particular platform the way they are coming from and which data can you use. and need to use that data, you know exactly come into the picture because it gives you which whereby you can do security and governance a big data catalog along the lines of what you guys offer, the platform is a means to an end. will lead you to benefits around that. but I'm going to let you finish your point here, And thank you very much Thanks a lot, yes.

ENTITIES

Entity	Category	Confidence
James	PERSON	0.99+
James Kobielus	PERSON	0.99+
two	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
two questions	QUANTITY	0.99+
BCBS	ORGANIZATION	0.99+
two-part	QUANTITY	0.99+
General Data Protection Regulation	TITLE	0.99+
Abhas	PERSON	0.99+
Gardner	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
15%	QUANTITY	0.99+
two elements	QUANTITY	0.99+
Vilica Financial Markets	ORGANIZATION	0.99+
each	QUANTITY	0.99+
Abhas Ricky	PERSON	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
May 25th	DATE	0.99+
today	DATE	0.98+
First	QUANTITY	0.98+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
one	QUANTITY	0.98+
first	QUANTITY	0.97+
first one	QUANTITY	0.97+
single	QUANTITY	0.97+
Dataworks Summit	EVENT	0.96+
five weeks	QUANTITY	0.95+
five key value drivers	QUANTITY	0.95+
first thing	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
one single opportunity	QUANTITY	0.93+
single pane	QUANTITY	0.91+
McKinsey	ORGANIZATION	0.9+
CUBE	ORGANIZATION	0.9+
Mid Fair	ORGANIZATION	0.89+
three cloud vendors	QUANTITY	0.89+
IFRS	TITLE	0.87+
each one	QUANTITY	0.87+
Dataworks Summit Europe 2018	EVENT	0.86+
DSX	TITLE	0.8+
Hortonwork	ORGANIZATION	0.78+
next 20 minutes	DATE	0.72+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Keynote Analysis | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering DataWorks Summit, Europe 2018. (upbeat music) Brought to you by Hortonworks. (upbeat music) >> Hello, and welcome to theCUBE. I'm James Kobielus. I'm the lead analyst for Big Data analytics in the Wikibon team of SiliconANGLE Media, and we're here at DataWorks Summit 2018 in Berlin, Germany. And it's an excellent event, and we are here for two days of hard-hitting interviews with industry experts focused on the hot issues facing customers, enterprises, in Europe and the world over, related to the management of data and analytics. And what's super hot this year, and it will remain hot as an issue, is data privacy and privacy protection. Five weeks from now, a new regulation of the European Union called the General Data Protection Regulation takes effect, and it's a mandate that is effecting any business that is not only based in the EU but that does business in the EU. It's coming fairly quickly, and enterprises on both sides of the Atlantic and really throughout the world are focused on GDPR compliance. So that's a hot issue that was discussed this morning in the keynote, and so what we're going to be doing over the next two days, we're going to be having experts from Hortonworks, the show's host, as well as IBM, Hortonworks is one of their lead partners, as well as a customer, Munich Re, will appear on theCUBE and I'll be interviewing them about not just GDPR but really the trends facing the Big Data industry. Hadoop, of course, Hortonworks got started about seven years ago as one of the solution providers that was focused on commercializing the open source Hadoop code base, and they've come quite a ways. They've had their recent financials were very good. They continue to rock 'n' roll on the growth side and customer acquisitions and deal sizes. So we'll be talking a little bit later to Scott Gnau, their chief technology officer, who did the core keynote this morning. He'll be talking not only about how the business is doing but about a new product announcement, the Data Steward Studio that Hortonworks announced overnight. It is directly related to or useful, this new solution, for GDPR compliance, and we'll ask Scott to bring us more insight there. But what we'll be doing over the next two days is extracting signal from noise. The Big Data space continues to grow and develop. Hadoop has been around for a number of years now, but in many ways it's been superseded in the agenda as the priorities of enterprises that are building applications from data by some newer primarily open source technology such as Apache Spark, TensorFlow for building deep learning and so forth. We'll be discussing the trends towards the deepening of the open source data analytics stack with our guest. We'll be talking with a European based reinsurance company, Munich Re, about the data lake that they have built for their internal operations, and we'll be asking their, Andres Kohlmaier, their lead of data engineering, to discuss how they're using it, how they're managing their data lake, and possibly to give us some insight about it will serve them in achieving GDPR compliance and sustaining it going forward. So what we will be doing is that we'll be looking at trends, not just in compliance, not just in the underlying technologies, but the applications that Hadoop and Spark and so forth, these technologies are being used for, and the applications are really, the same initiatives in Europe are world-wide in terms of what enterprises are doing. They're moving away from Big Data environments built primarily on data at rest, that's where Hadoop has been, the sweet spot, towards more streaming architectures. And so Hortonworks, as I said the show's host, has been going more deeply towards streaming architectures with its investments in NiFi and so forth. We'll be asking them to give us some insight about where they're going with that. We'll also be looking at the growth of multi-cloud Big Data environments. What we're seeing is that there's a trend in the marketplace away from predominately premises-based Big Data platforms towards public cloud-based Big Data platforms. And so Hortonworks, they are partners with a number of the public cloud providers, the IBM that I mentioned. They've also got partnerships with Microsoft Azure, with Amazon Web Services, with Google and so forth. We'll be looking, we'll be asking our guest to give us some insight about where they're going in terms of their support for multi-clouds, support for edge computing, analytics, and the internet of things. Big Data increasingly is evolving towards more of a focus on serving applications at the edge like mobile devices that have autonomous smarts like for self-driving vehicles. Big Data is critically important for feeding, for modeling and building the AI needed to power the intelligence and endpoints. Not just self-driving cars but intelligent appliances, conversational user interfaces for mobile devices for our consumer appliances like, you know, Amazon's got their Alexa, Apple's got their Siri and so forth. So we'll be looking at those trends as well towards pushing more of that intelligence towards the edge and the power and the role of Big Data and data driven algorithms, like machine learning, and driving those kinds of applications. So what we see in the Wikibon, the team that I'm embedded within, we have published just recently our updated forecast for the Big Data analytics market, and we've identified key trends that are... revolutionizing and disrupting and changing the market for Big Data analytics. So among the core trends, I mentioned the move towards multi-clouds. The move towards a more public cloud-based big data environments in the enterprise, I'll be asking Hortonworks, who of course built their business and their revenue stream primarily on on-premises deployments, to give us a sense for how they plan to evolve as a business as their customers move towards more public cloud facing deployments. And IBM, of course, will be here in force. We have tomorrow, which is a Thursday. We have several representatives from IBM to talk about their initiatives and partnerships with Hortonworks and others in the area of metadata management, in the area of machine learning and AI development tools and collaboration platforms. We'll be also discussing the push by IBM and Hortonworks to enable greater depths of governance applied to enterprise deployments of Big Data, both data governance, which is an area where Hortonworks and IBM as partners have achieved a lot of traction in terms of recognition among the pace setters in data governance in the multi-cloud, unstructured, Big Data environments, but also model governments. The governing, the version controls and so forth of machine learning and AI models. Model governance is a huge push by enterprises who increasingly are doing data science, which is what machine learning is all about. Taking that competency, that practice, and turning into more of an industrialized pipeline of building and training and deploying into an operational environment, a steady stream of machine-learning models into multiple applications, you know, edge applications, conversational UIs, search engines, eCommerce environments that are driven increasingly by machine learning that's able to process Big Data in real time and deliver next best actions and so forth more intelligence into all applications. So we'll be asking Hortonworks and IBM to net out where they're going with their partnership in terms of enabling a multi-layered governance environment to enable this pipeline, this machine-learning pipeline, this data science pipeline, to be deployed it as an operational capability into more organizations. Also, one of the areas where I'll be probing our guest is to talk about automation in the machine learning pipeline. That's been a hot theme that Wikibon has seen in our research. A lot of vendors in the data science arena are adding automation capabilities to their machine-learning tools. Automation is critically important for productivity. Data scientists as a discipline are in limited supply. I mean experienced, trained, seasoned data scientists fetch a high price. There aren't that many of them, so more of the work they do needs to be automated. It can be automated by a mature tool, increasingly mature tools on the market, a growing range of vendors. I'll be asking IBM and Hortonworks to net out where they're going with automation in sight of their Big Data, their machine learning tools and partnerships going forward. So really what we're going to be doing over the next few days is looking at these trends, but it's going to come back down to GDPR as a core envelope that many companies attending this event, DataWorks Summit, Berlin, are facing. So I'm James Kobielus with theCUBE. Thank you very much for joining us, and we look forward to starting our interviews in just a little while. Our first up will be Scott Gnau from Hortonworks. Thank you very much. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and enterprises on both sides of the Atlantic

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Andres Kohlmaier	PERSON	0.99+
Apple	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Scott	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Munich Re	ORGANIZATION	0.99+
Thursday	DATE	0.99+
Siri	TITLE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Wikibon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Data Steward Studio	ORGANIZATION	0.98+
both	QUANTITY	0.98+
tomorrow	DATE	0.98+
DataWorks Summit	EVENT	0.98+
Atlantic	LOCATION	0.98+
one	QUANTITY	0.98+
Berlin	LOCATION	0.98+
both sides	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
Apache	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
Alexa	TITLE	0.94+
this year	DATE	0.94+
Spark	TITLE	0.92+
2018	EVENT	0.91+
EU	ORGANIZATION	0.91+
Dataworks Summit 2018	EVENT	0.88+
TensorFlow	ORGANIZATION	0.81+
this morning	DATE	0.77+
about seven years ago	DATE	0.76+
Azure	TITLE	0.7+
next two days	DATE	0.68+
Five weeks	QUANTITY	0.62+
NiFi	TITLE	0.59+
European	LOCATION	0.59+
theCUBE	ORGANIZATION	0.58+

Jean English, NetApp | NetApp Insight Berlin 2017

>> Announcer: Live from Berlin, Germany. It's theCube, covering NetApp Insight 2017. Brought to you by, NetApp. >> Welcome back to theCube's live coverage of NetApp Insight 2017, I'm your host Rebecca Knight along with my co-host Peter Burris. We are joined by Jean English. She is the Senior Vice President and Chief Marketing Officer of NetApp, thanks so much for comin' on the show. >> Thank you for having me, we're glad you're here with us to join us at Insight Berlin. >> We're always excited to do anything with NetApp. So, talk a little bit about NetApp's digital transformation. You're now at a year's long transformation from storage, your legacy, to data. Talk a little bit about your positioning in the market. >> Sure, so I think people have previously thought of NetApp as storage, and what we're so focused on now is data. And why data? Because that's what we hear from our customers, our partners, the analysts, is what is really topping their needs right now. If we think about how companies are transforming, they're having to think about digital transformation is topping the list. It's topping the most strategic agendas of most CEOs. But what happens is they have to think about the data. It has become a life blood of their business, and as it seamlessly flows through that business, and what does it mean to either optimize their operations, if they've gotta increase their customer touch points, do they have to create new product services, and even businesses. So we feel like right now that is where our focus is on data, and it's so much a part of our heritage that we look to the future as well. >> One of the things that you're working on now is helping customers use data in new, exciting, innovative, creative ways, can you talk broadly about your approach to that, and how you're drawing inspiration on customers and then empowering them? >> Absolutely, so we really try to think about, what is our purpose? And our purpose could be true to our heritage from 25 years ago, we just celebrated our 25 year anniversary this past spring, and it is to empower our customers to change the world with data. Just a few of those, we've seen now, especially in hybrid cloud environments, customers have to think about how are they gonna simplify to integrate data across on-prem, cloud environments, to accelerate digital transformation. One example of that is EidosMedia. We love their story, because their talking about how to get news stories, real time, through a cloud platform, into the hands of journalists that can publish real time live insights. Real time journalism, and so when you think about the speed that has to happen with creating stories, getting 'em published, getting 'em out to news networks, that's data. And it's a good data story. >> When you think about the data story though, a lot of people talk about how data is a fuel, or data is. And we tend to think, at least at SiliconANGLE Wikibon, that that's probably not the best analogy, because data's different from other resources. Most resources share the economics of scarcity, you can do this, or you can do that, but data's different because data could be copied, data can be shared. But data also can be appropriated inappropriately. Could you talk a little bit about the relationship or the direction that NetApp's taking to on the one hand, facilitate the sharing of data strategically while at the same time, ensuring that proper security and IP controls are placed on it. >> So I think people are looking to make sure that they can share freely data, and seamlessly integrate data across multiple sources. Right now what we find is that whether it's because you've had data that's been on-prem, and maybe that's more structured. Now we're startin' to see more unstructured data. So data's becoming a lot more diverse. People are constantly looking for the latest source of truth of data, so dynamic, and because it's so distributed across environments, people are trying to figure out, how do you integrate data, how do you share data, but it's all about simplicity, 'cause they need it to be efficient. They need to make sure that it's protected, so security is top of minds, so data protection is the upmost of importance. They're looking for ways to embrace future technologies. And whether that's thinking about different cloud environments, SAS applications, and then how do they create the most open opportunities. A lot of people aren't just putting their data in one cloud, what we're finding is, is it's a multi-cloud world, and they're looking for a holistic solution to more easily and seamlessly manage their data through those environments. >> But the infrastructure has to move from as you said, a storage orientation towards something that's going to facilitate the appropriate sharing and integration of data. Like a fabric. >> Yes. >> Can you talk a little bit about that. >> So we started the conversation around data fabric, it was one of the first people to really talk about data fabric in the market back in 2013. And this vision was about how do you seamlessly be able to share and integrate data across cloud and on-prem environments. That has become so true in how we've been building out that data fabric today. We just launched a few weeks ago that we are the first industry leading storage data service in the Microsoft Azure console, so that people can easily be able to, can do complete storage capabilities in cloud storage, in Microsoft, we've also been developing solutions to make sure that, maybe if you're not wanting to do everything in Office365 and Azure, you wanna back it up to AWS, so how do you have better backup capabilities? Sharing of data across clouds. We're also seeing that you may wanna sync data, so maybe once you put data into the cloud, and you run analytics or even machine learning, how do you then get data back? Because you wanna make sure that you're constantly being able to look holistically at your customers. This notion of one cloud, to back to on-prem, multi-cloud environments, has been critical as we think about customers and where they're going. >> One of the things we're also hearing about at this conference is, this is the day of the data visionary, and this is where people who are thinking about how to store data, use data, extract data, find value in the data. The demands on them, the pressures on them, are so intense. How is NetApp helping those people, sort of understanding where they are, not only in their businesses, but also in their trajectories of their careers, and then helping them move forward. >> We've been really thinking about who is really using data to disrupt, and are this disruptive use of data to really drive business results. It's not just about having the data, it's about how are you gonna have an impact on the business. So we start to think about this notion of who is a data thriver? Who's thriving with data versus who's just surviving and in fact, some are even resisting. So we actually partner with IDC to launch a study on data thrivers to look at who is truly looking at driving new revenue streams, attracting new customers, how are they able to use data as correlistic part of their business. Not some one off or side project to help do the digital transformation, but what was gonna drive really good business results. Data as an asset. Data across business and IT. And we see new roles are emerging from this. We're seeing that, Chief Data Officers, there's Chief Digital Officers, Chief Data Scientists, Chief Transformation Officers. All new roles that have been emerging in the last couple of years, but these data thrivers are seeing tremendous business impact. >> So, what is it that separates those people, I mean I think that, those really, those companies and those business models, and what are sort of the worst case scenarios for those companies that are just surviving and not necessarily thriving, in this new environment. >> Yeah, I think it's interesting, we're seeing that companies that actually put data at the center of what they do. So we think of it as a data-centric organization, are seeing 6x in what they're seeing in terms of being able to drive real customer acquisition. When we think about what it means to drive operational efficiency, when we think about 2x times in terms of profitability, real bottom line results, compared to people that are simply just surviving with data. What's interesting is that when we start to think about what are the attributes of these people, so business and IT working together in unison. These roles in fact that are emerging are starting to become those catalysts and change agents that are bringing IT and the business more together. We're also seeing that when you think of data as an asset, even to the bottom line, how does data become more critical in terms of what they see in terms of being a differentiated advantage for the company. Also, thinking through quality, quality, quality. You've gotta make sure that the data is of highest quality and it's constantly being cleansed. Then in terms of how do we think of it being used across the business, it's not just about holding data and locking it away behind a firewall. Data more today is so dynamic, distributed and diverse, that you have to let it be utilized and activated across the business. And then to think through, it's starts not just in terms of what customers are using and seeing from data, what they can actually see in terms of customer touch points and having a better customer experience, but then how do you make sure it even comes back to the development to create new products, create new services, maybe even eliminate waste. Stop doing product lines based on what they're seeing from actual usage. So it's a pretty fascinating space right now, but the data thriver is the new thought we're thinking in terms of getting that out in the market and really sharing more so with our clients, so that they can benchmark themselves as well. >> So, you're a CMO. >> Yes. >> You're telling a story, but you also have operational responsibilities. How would you tell your peers to use data differently? >> Well, I think there's a couple things. I mean, for me data's the life blood of how we think about how we actually create a better customer experience. We're using data constantly to better understand what are our customer's needs, and those customers are evolving. Before, in the loyalist that we love was storage architects and admins, we're starting to see that people are thinking about how to use more hybrid cloud data services with CIOs. How are they gonna look at a cloud strategy? With DevOps, how are they gonna create, deploy, and, applications at speed? How are they gonna be able to help to really think through, what are they gonna do to drive more analytics and better workload usage, and efficiencies? Our clients are evolving, and when we think about how do you reach those clients differently, we have to know who they are. We have to use data to understand them. We have to be more personalized. We just relaunched our entire digital experience, so that when we try to look at how do you bring people into something that's more customized, more personalized, what does it mean to be a cloud architect that's thinking about a data backup and protection plan. What does it mean for someone in DevOps who's thinking about how do I actually create and deploy an application at speed? How do you think about someone that's gonna look at the needs from a CIO, so much differently than before. But, using data, using customization, thinking about an engaging experience, bringing 'em through that experience so that we solve their business challenges. We use data and analytics every day. I think of us as being the new data scientists. People say, is it art or is it science and marketing? I'm like, it's a little bit of the storytelling, absolutely, we have to lead with stories, but the data and the analytics is where we really understand our customers best. So using analytic models, using predictive models, using more ways in which we can actually reach customers in new ways we never have before through social. But bring them into a new conversation. Analytics, analytics, storytelling, and understanding, getting closer to new clients like we never have before, and then thinking through how do we use that full-circle loop of learning to get better and better in how we engage our customers in ways they want to engage with us. >> I wanna switch gears just a second, and I know that you've just been nominated as an International Board Member. You were a Board Member before, of Athena of the Triangle, which is about supporting and inspiring women in the technology industry. As we know that this is the dearth of women, technologists, is a big problem in the US and globally. Can you tell us a little more about the organization and what you're doing? >> So, Athena International is really about, how do you promote women's leadership? It's across the world, in fact we just launched some very exciting initiatives in China where I lived for a year, and the President of Athena International is a friend of mine, and she was really looking at how do you foster growth, especially in emerging markets and countries where women's leadership can be so profound in terms of how do you impact the business, government, and market, and really overall global success. Athena is focused on, is technology, but it's also with women in many industries. But really, how do you gain the powerful mentorships, how do you gain powerful access to programs, to having more access to expertise that can help them to think through business models, business cases. How do they grow their business, it might be from financial to career counseling, to mentoring on marketing, but it's really thinking through women's leadership as a whole. >> And is NetApp also working on behalf of those, of that cause too? >> We're really focused on, today in fact we're gonna be hosting the, the annual Women in Technology Summit. So we're so focused on how do we think about developing women in technology, how to think about that across not only our employees, but our partners and our customers, and it's not just about women, this is men and women working together to determine how do we stop the fact that we've got to get more access to women in mentorships and sponsorships, and really really driving how we have leadership as we grow, really grow into our careers, and can drive more business impact. >> Great. Well Jean, thanks so much for coming on theCUBE, >> Thank you. >> It was really fun talking to you. >> Absolutely, thank you both. >> I'm Rebecca Knight for Peter Burris, we will have more from NetApp Insight, here in Berlin, Germany in just a little bit.

Published Date : Nov 16 2017

SUMMARY :

Brought to you by, NetApp. of NetApp, thanks so much for comin' on the show. Thank you for having me, we're glad you're here We're always excited to do anything with NetApp. and what does it mean to either optimize their operations, about the speed that has to happen that that's probably not the best analogy, So I think people are looking to make sure But the infrastructure has to move This notion of one cloud, to back to on-prem, One of the things we're also hearing about in the last couple of years, but these data thrivers and what are sort of the worst case scenarios that actually put data at the center of what they do. How would you tell your peers to use data differently? Before, in the loyalist that we love and what you're doing? and the President of Athena International is a friend how to think about that across not only our employees, Well Jean, thanks so much for coming on theCUBE, talking to you. we will have more from NetApp Insight,

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Jean	PERSON	0.99+
Peter Burris	PERSON	0.99+
2013	DATE	0.99+
AWS	ORGANIZATION	0.99+
China	LOCATION	0.99+
6x	QUANTITY	0.99+
Jean English	PERSON	0.99+
US	LOCATION	0.99+
Athena International	ORGANIZATION	0.99+
NetApp Insight	ORGANIZATION	0.99+
IDC	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
EidosMedia	ORGANIZATION	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
25 years ago	DATE	0.98+
One	QUANTITY	0.96+
Athena	ORGANIZATION	0.95+
a year	QUANTITY	0.95+
2017	DATE	0.94+
first people	QUANTITY	0.94+
NetApp	TITLE	0.94+
one	QUANTITY	0.93+
Women in Technology Summit	EVENT	0.93+
Berlin	LOCATION	0.89+
One example	QUANTITY	0.88+
25 year anniversary	QUANTITY	0.85+
NetApp Insight 2017	TITLE	0.85+
Office365	TITLE	0.83+
first industry	QUANTITY	0.83+
Insight Berlin	ORGANIZATION	0.82+
one cloud	QUANTITY	0.81+
about 2x times	QUANTITY	0.8+
few weeks ago	DATE	0.78+
SiliconANGLE Wikibon	ORGANIZATION	0.77+
couple things	QUANTITY	0.73+
a second	QUANTITY	0.7+
President	PERSON	0.69+
Azure	TITLE	0.67+
last couple of years	DATE	0.64+
past spring	DATE	0.64+
theCube	COMMERCIAL_ITEM	0.62+
the Triangle	ORGANIZATION	0.51+

Mark Bregman, NetApp | NetApp Insight Berlin 2017

Live from Berlin Germany, it's the queue Covering NetApp insight 2017 brought to you by Neda Welcome back to the cubes live coverage of net app insight here in Berlin Germany I'm your host Rebecca Knight along with my co-host Peter Burris. We are joined by Mark Bregman. He is the CTO of net app Thanks so much for coming on the cube Thanks for taking the time so you have been recently looking into your crystal ball to predict the future and you have some some fun sometimes counterintuitive Predictions about what we're going to be seeing in the next Year and decade to come right so so your first pitch in you said data will become Self-aware right what do you mean by that? Well the title is kind of provocative really the idea is that? Data is going to carry with it much more of its metadata Metadata becomes almost more important than the data in many cases and we can anticipate Sort of architectures in which the data drives the processing whereas today? We always have data is sort of a pile of data over here And then we have a process that we execute against the data that's our been our tradition in the computing world for a long long time as data becomes more self-aware the data as it passes through Will determine what processes get executed on it? So let me give you a simple analogy from a different field from the past in The communications world we used to have circuit switched systems There was some central authority that understood the whole network If you and I wanted to communicate it would figure out the circuit set up the circuit And then we would communicate and that's sort of similar to traditional Processing of data the process knows everything it wants to do it knows where to find the data. It does that it puts It somewhere else But in the communications world we move to packets which data, so now the packet the data Carries with it the information about what should happen to it And I no longer have to know everything about the network nobody has to know everything about the network I pass it to the nearest neighbor who says well I don't know where it's ultimately going, but I know it's going generally in that direction and eventually it gets there now Why is that better? It's very robust it's much more scalable and Particularly in a world where the rules might be changing. I don't have to necessarily redo the program I can change the the markup if you will the tagging of the data You can think of different examples imagine the data That's sitting in a autonomous vehicle and there's an accident now There are many people who want access to that data the insurance company the authorities the manufacturer the data has contained within it the Knowledge of who can do what would that data? So I don't have to now have a separate program that can determine Can I use that data or not the data says sorry you're not allowed to see this. This is private data You can't see this part of it Maybe the identify our data for the obviously the insurance company needs to know who the car owner is But maybe they don't need to know something else like where I came from The authorities might need both well he came from a bar So you can imagine that as an example if you the implications, yes marker are important for example if I Wanted to develop an application. That would be enhanced by having access to data I had to do programming to get to that data because some other application control that data and that data was defined contextually by that application right and so everything was handled by the application by moving the metadata into the data now I can bring that data to my Application more easily less overhead and that's crucial because the value of data accretes It grows as you can combine it in new and interesting ways so by putting the metadata end of the data I can envision a world where it becomes much faster much more Fasil to combine data and new and Exactly it. Also is easier to move the Processing through the data to the data because the processing is no longer a monolithic program It's some large set of micro services and the data organizes which ones to execute So I think we'll see I mean this is not a near-term prediction This is not one for next year because it requires rethinking How we think about data and processing, but I think we'll see it with the emergence of micro services compositional programming Metadata together with the data will see more functional programs little programs well That's your quick rush before we go on to the next one. It's almost like in the early night or the late 1970s It was networks of devices ARPANET the became the Internet and then the web was networks of pages And then we moved into networks of application services Do you foresee a day where it's going to be literally networks of data? Yes, and in fact That's a great example because if you think about what happened in the evolution of the web through what we called web 2.0 That the pages were static data They came alive in the web 2.0, and there was a much less of a distinction between the data and the program In the web layer right so that's what we're saying we see that emerging even further Next prediction was about virtual machines becoming rideshare machines well this is somewhat complementary to the first one they all kind of fit together and Here the idea is you know if we go back in the earlier days of IT it wasn't that long ago that if you needed? Something you ordered the server, and you installed it you owned it and then we got to the model of the public cloud, which is like a rental and by the same analogy if in the past if I wanted a vehicle I had to buy it and Then the rental car agencies came up, and I said well, you know when I go to Berlin I'm not gonna buy a car for three days I'll rent a car, but I can choose which car I want do I want the BMW, or do I want you know of Volkswagen That's very similar to the way the cloud works today. I pick what instances I want and They they meet my needs And if I make the right choice great and by the way I pay for it while I have it not for the work It's getting done so if I forget to return that instance. I'm still getting charged But the rideshare is kind of like uber and we're starting to see that with things like serverless computing In the model that I say I want to get this work done The infrastructure decides what shows up in the same way that when I call uber I don't get to pick what car shows up they send me the one that's most convenient for them and me and I get charged for the work going from point A to point B. Not for the amount of time There's some differentiation if there is so cool Ah, they come to that and and so that's more like a rideshare But as you point out even in the rideshare world. I have some choices. I can't choose if I want a large SUV I might get a BMW SUV or I might get a Mercedes SUV I can't choose that I can't choose it the silver or black But I get a higher class and what we're seeing with the cloud Or these kind of instances virtual solutions is they are also becoming more specialized I might it might be that for a particular workload I want some instance that has have GPUs in them or some neural chip or something else In much the same way that The rental model would say go choose the exact one you want The rideshare model would say I need to get this work done and the infrastructure might decide this is best serviced by five instances with GPU or Because of availability and cost maybe it's 25 instances of standard processors because you don't care about how long it takes so It's this compromise and it's really very analogous to the rideshare model now coming back to the earlier discussion as The units of work gets smaller and smaller and smaller and become really micro services Now I can imagine the data driving that decision hailing the cab hailing the rideshare and driving What needs to be done? So that's why I see them in somewhat complementary and so what's the upshot though? For the employee and for the company I think there are two things one is you got to make the right decision? You know if I were to use uber to commute to Sunnyvale every day It'd break the bank, and it would be kind of stupid so for that particular task I own my vehicle But if I'm gonna go to Tahoe for the weekend, and I meet an SUV I'm not gonna buy one neither am I going to take an uber I'm in a rent one because that's the right vehicle on the other hand when I'm going from you know where I live to the marina within San Francisco, that's a 15 minute drive I On demand I take an uber and I don't really care now if I have 10 friends I might pick a big one or a small one But again that the distinction is there so I think for companies They need to understand the implications and a lot of times as with many people they make the wrong initial choice And then they have then they learn from it so You know there are people who take uber everywhere And I talked and I said I had a friend who was commuting to HP every day by uber from the city from San Francisco That just didn't make sense he kind of knew that but The next one is data will grow faster than the ability to transport it, but that's ok it doesn't sound ok it Doesn't sound ok and for a long time. We've worried about that. We've done compression, and we've done all kinds of things We've built bigger pipes And we've but we were fundamentally transporting data between data centers or more recently between the data center and the cloud big chunks of data What this really talks about is with the emergence of quality IOT in a broad sense? Telematics IOT digital health many different cases there's going to be more and more and more data both generated and ultimately stored at the edge and That will not be able to be shipped all of that will not be able to be shipped back to the core And it's okay not to do that because there's also Processing at the edge so in an autonomous vehicle where you may be generating 20 megabytes per hour or more You're not gonna ship that all back You're gonna store it you're gonna do some local processing you're gonna send the summary of it the appropriate summary back But you're also gonna keep it there for a while because maybe there's an accident and now I do need all that data I didn't ship it back from every vehicle But that one I care about and now I'm gonna bring it back or I'm gonna do some different processing than I originally Thought I would do so again the ability to Manage this is going to be important, but it's managed in a different way. It means we need to figure out ways to do overall Data lifecycle management all the way from the edge where historically that was a silo we didn't care about it Probably all the way through the archive or through the cloud where we're doing machine learning rules generation and so on but it also suggests that we're going to need to do a better job of Discriminating or demarcating different characteristic yen classes of data, and so that data at the edge Real-world data that has real-world implications right now is different from data that summarizes business events which is different from data that Summarized as things models that might be integrated something somewhere else And we have to do a better job of really understanding the relationships between data It's use its asset characteristics etcetera, would you agree with that absolutely and maybe you see the method in my madness now? Which is that data will have? Associated with it the metadata that describes that so that I don't misuse it you know think about The video data off of a vehicle I might want to have a sample of that every I don't know 30 seconds, but now if there's really a problem and it may be not an accident Maybe it's a performance problem. You skidded I'd like to go back and see why was there a Physical issue with the vehicle that I need to think about as an engineering problem was it Your driving ability was it a cat jumped in front of the car so But I need to be able to as you pointed out in a systematic way distinguish what data I'm looking at and where it belongs and where it came from The final prediction it concerns the evolution from Big Data to huge data so that is Really driven by the Increasing need we have to do machine learning AI Very large amounts of data being analyzed in near real time to meet new needs for business And there's again a little like many of these things There's a little bit of a feedback loop so that drives us to new architectures for example being able to do in memory analytics But in-memory analytics with all that important data. I want to have persistence technologies are coming along like Storage class memories that are allowing us to build persistent storage persistent memory We'll have to re our Kotak the applications, but at the same time that persistent memory data I don't want to lose it so it has to be thought of also as a part of the storage system Historically we've had systems the compute system, and there's a pipe and there's a storage system And they're separate they're kind of coming together, and so you're seeing the storage Impinge on the system the compute system our announcement of Plexus store acquisition is how we're getting there But at the same time you see what might have been thought of is the memory of the computer System really be an extended part of the storage system with all the things related to copy management backup and and And so on so that's really what that's talking about and you know it's being driven by another factor I think which is a higher level factor. We started in the first 50 years of the IT industry was all about automating processes That ran the business they didn't change the business. They made it more efficient accounting systems etc since probably 2000 there's been a little bit of a shift Because of the web and mobile to say oh I can use this to change the relationship with my customer Customer in density I can use mobile and and I can change the banking business Maybe you don't ever come to the bank for cash anymore even to an ATM because they've changed that The wave that's starting now which is driving This is the realization in many organizations, and I truly believe eventually in all organizations that They can have new data-driven businesses That are transforming their fundamental view of their business so an example I would use is imagine a shoe maker a shoe manufacturer well for 50 years. They made better shoes They had better distribution, and they could do better inventory management and get better cost and all of that with IT in the last Seven or ten years, they've started to be able to build a relationship with their client. Maybe they put some Sensors in the shoe, and they're doing you know Fitbit like stuff mostly for them That was about a better client relationship, so they could sell better shoes cuz I wrench eiated now The next step is what happens if they wake up and say wait a minute We could take all this data and sell it to the insurance companies or healthcare companies or the city planners Because we now know where everyone's walking all the time That's a completely different business But that requires new kind of lytx that we can't almost not imagine in the current storage model so it drives these new architectures And there is one more prediction, okay? Which is that and it comes back again? It kind of closed the whole cycle as we see these Intelligence coming to the data and new processing forms and so on we also need a way to change data management to give us really Understanding of data through its whole lifecycle one of the one example would be how can I ensure? That I understand the chain of custody of data the example of an automobile there's an accent well How do I know that data was an alter or? how can I know whose touch this data along the way because I might have an audit trail and So we see the emergence of a new Distributed and mutable management framework if when I say those two words together you probably think Blockchain which is the right thing to think but it's not the blockchain. We know today there may be something It's something like that But it will be a distributed and immutable ledger that will give us new ways to access and understand our data Once you open up the once you open up Trying to get the metaphor once you decide to put the metadata next to the data Then you're going to decide to put a lot more control information in that metadata Exactly, so this is just an extension said it kind of closes the loop exactly Mark well, thanks so much for coming on the show and for talking about the future with us It was really fun to have you on the show we should come back in a year and see if maybe you're right exactly exactly Thank you. I'm Rebecca night. We will have more from NetApp insight. Just after this

Published Date : Nov 14 2017

SUMMARY :

I can change the the markup if you will the tagging of the data

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Mark Bregman	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Volkswagen	ORGANIZATION	0.99+
25 instances	QUANTITY	0.99+
50 years	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
BMW	ORGANIZATION	0.99+
10 friends	QUANTITY	0.99+
30 seconds	QUANTITY	0.99+
15 minute	QUANTITY	0.99+
five instances	QUANTITY	0.99+
uber	ORGANIZATION	0.99+
2000	DATE	0.99+
three days	QUANTITY	0.99+
Berlin	LOCATION	0.99+
two words	QUANTITY	0.99+
first pitch	QUANTITY	0.99+
Sunnyvale	LOCATION	0.99+
Tahoe	LOCATION	0.99+
Mark	PERSON	0.99+
Rebecca	PERSON	0.99+
one	QUANTITY	0.99+
late 1970s	DATE	0.99+
next year	DATE	0.98+
first 50 years	QUANTITY	0.98+
today	DATE	0.98+
Mercedes	ORGANIZATION	0.97+
two things	QUANTITY	0.97+
both	QUANTITY	0.97+
NetApp	ORGANIZATION	0.97+
next Year	DATE	0.95+
Berlin Germany	LOCATION	0.94+
first one	QUANTITY	0.93+
early night	DATE	0.88+
HP	ORGANIZATION	0.87+
20 megabytes per	QUANTITY	0.84+
ten years	QUANTITY	0.83+
rideshare	ORGANIZATION	0.82+
2017	DATE	0.82+
one example	QUANTITY	0.8+
Neda	PERSON	0.79+
Fitbit	ORGANIZATION	0.78+
SUV	COMMERCIAL_ITEM	0.75+
a year	QUANTITY	0.7+
NetApp insight	ORGANIZATION	0.69+
one more	QUANTITY	0.68+
Plexus	TITLE	0.66+
wave	EVENT	0.65+
every	QUANTITY	0.61+
Kotak	ORGANIZATION	0.57+
last Seven	DATE	0.56+
ARPANET	ORGANIZATION	0.48+
decade	DATE	0.41+
Insight	EVENT	0.33+

Wrap | NetApp Insight Berlin 2017

>> [Announcer] Live from Berlin, Germany, It's The Cube, covering NetApp Insight 2017, brought to you by NetApp. >> We are wrapping up a day of coverage at NetApp Insight on The Cube. I'm Rebecca Knight, along with My cohost, Peter Burris. So, we've had a lot of great interviews here today. We've heard from NetApp executives, customers, partners about this company's transformation, and about what it's doing now to help other companies have a similar transformation. What have been some of your impressions of where NetApp is right now, and what it's saying? >> I think it starts with the observation that NetApp realized a number of years ago that if it was just going to be a commodity storage company, it was gonna have a hard time, and so NetApp itself went through a digital transformation to try to improve its understanding of how customers really engaged with it, how it could improve its operational profile's financial footprint, and the result of that was a company that, first off, was more competitive, but also that had learned something about digital transformation, and realized the relationship between the products that they were selling, the services that they were providing, the ecosystem they had that they could tap, been working with customers, and said, what is we took this knowledge, applied it to those things, what would we end up with? And so we now have a company that is still talking about products, but very much it's also talking about what businesses could do in day to day differently to effect the type of transformation that NetApp itself has been going through, and it's a compelling story. >> And you're describing this introspection that the company did, as you said, if we can't survive with our old business model, what can we do differently, and now eating it's own dog food, but then telling other companies about its story, and how its made changes. I mean, do you think NetApp is where it should be today? Are you pleased with the progress you've seen? >> Well that's one of the great challenges in the tech industry today, is nobody's quite sure where they should be. >> [Rebecca] There are no benchmarks. >> Because nobody's sure what's going on underneath them. So many years ago, in response to a reporter's questions about IBM, they said, well what do you think? Is IBM going to be successful at moving the aircraft, turning the aircraft carrier? And I said, you don't get it. IBM's problem is not that they're trying to turn the aircraft carrier, it's that they're trying to rotate the ocean, so that they could go straight, and everybody else's position would change, and that's a lot of what's happening in the technology industry today, as the people are turning, the ocean's being rotated, and there are a couple of companies, like AWS, that seem to have their fingerprint, or their finger on some of those changes. I'm not sure NetApp has that kind of a presence in the industry, but what is clear is that the direction that NetApp has taken is generating improved financial results, a lot better customer satisfaction, and it's putting them into position to play in the next round, so to speak, of competition in this industry, and in an industry that's changing this fast, that, all by itself, is a pretty good position to be in. >> Well, you know, and you're talking about the changing industry, and then also the changing employment needs that this company has in terms of getting people in their workforce who really understand, not just that data in an asset, which is what we keep hearing today, too, but really understanding how to capture the data, tease out the right insights from the data, and then deploy a strategy based on those insights that actually will create value to the business, whether that's acquiring new customers, or saving money, or earning new lines of business, too. >> Well, for example, we had a great conversation with Sheila Fitzpatrick about GDPR, this phenomenal conversation. Sheila is in charge of privacy at NetApp, and the decision that she drove was to not just to GDPR, NetApp have to GDPR here in Europe, but to GDPR across the entire company. Now two years ago, I don't know that a NetApp person would have come onto The Cube and talked about GDPR, but that is a problem, that is a challenge that every business is facing, and bringing somebody on that has made some really consequential decisions for a company like NetApp to be able to say, here's how other businesses need to think about GDPR, think about data privacy, is a clear example of NetApp trying to establish itself as a thought leader about data, and not just a thought leader about commodity storage. So I think there's a lot of changes that NetApp's gonna go through. They still are talking about on tap, they still are talking about HCI, they're talking about all the various flash products that they have, so that's still part of their conversation, but increasingly they're positioning those products, not in terms of price performance, but in terms of applications to the business based on the practical realities of data. >> And I also think we've heard a number of executives talk about NetApp having a more consultative relationship with its clients and partners, and really learning from them, how they're doing things, and then sharing the learnings at events like NetApp Insight, here, and just really on the ground more, working in partnership with these companies, too. >> Data is a physical thing, and I think a lot of people forget that. A lot of people just look at data and say, oh it's this ephemeral thing, it's out there, and I don't much have to worry about it, but physics is an issue when you're working with data. Adam Steltzner, Dr. Adam, the gentleman from NASA, he talked about the role that data science is playing in NASA Mars exploration, talked about the need to worry about sparse data, because they have dial up speeds to send data back from a place like Mars. They're working on problems, but when you start thinking in those terms, the physical limitations, the physical realities, the physical constraints of data become very real. GDPR is not a physical constraint, but it's a legal constraint, and it might as well be physics. If a company does something, we heard, for example, that there are companies out there, based on their practices and how they were hacked, would have found themselves facing $160 billion liability. >> [Rebecca] Yeah. >> Now that may not be physics, you know, I can only move so much data back from Mars, but that is a very real legal constraint that would have put those companies out of business if GDPR governance rules had been in place. So what's happening today is companies, or enterprises are looking to work with people who understand the very physical, practical, legal, and intellectual property realities of data, and if NetApp is capable of demonstrating that, and showing how you could turn that into applications, and into infrastructure that works for the business, then that is a great partner for any enterprise. >> Well do you think that other companies get it? I mean, the sense of where we are today? You use this example of GDPR, and how it really could have sent companies out of business if those rules had been in place, and they had been hacked, or suffered some huge data breach. Do you think that NetApp is setting itself up as the thought leader, and in many ways is the thought leader? Are there companies on the same level? >> No, they're not, and certainly there are a lot of tech companies that are moving in that direction, and that they're comparable with NetApp, and working both close with NetApp, and in opposition to NetApp, at least competitively, but the reality is that most enterprises are, how best to put this? Well, what I like to say is William Gibson, the famous author who coined the term cyberspace, for example, once said, the future's here, it's just evenly distributed. So there are pockets of individuals in every company who are very cognizant of these challenges, the physical realities of data, what it means, what role data actually plays, what does it mean to actually call data an asset? What's the implications on the business of looking at data as a asset? That's in place in pockets, but it's not something that's broadly diffused within most businesses, certainly not our client base, not the Wikibon angle client base, is certainly not broadly aware of some of these challenges. A lot of things have to happen over the course of the next few years for executives, and rank and file folks to comprehend the characteristics, or the nature of these changes, start to internalize, start to act in concert with the possibilities of data, as opposed to in opposition to the impacts of data. >> And those are the people who, we had guests on today just talked about the data resisters, because there are those in companies, maybe they're just an individual in a company, but that can have a real impact on the company's strategy of moving forward, deploying its data smartly. >> Yeah, absolutely, and we also had the gentleman from The Economist who made the observation that concerns about artificial intelligence impacts employment might be a little overblown. >> [Rebecca] Right, right. >> So a lot of those data resisters might be sitting there asking the question, what will be the impact of additional data on my job? And it's a reasonable question to ask, because if your business, we also talked about physicians. A radiologist, for example, someone who looks at x-rays has historically not been a patient facing person. They would sit in the back and look at the x-rays, they would write up the results, and they would give them to the clinician, who would actually talk to the patient. I, not too long ago, saw this interesting television ad where radiologists presented themselves as being close to the patient. Why? Because radiology is one of those disciplines in medicine that's likely to be strongly impacted by AI, because AI can find those patterns better than, often, a physician can. Now the clinician may be a little less effected by AI, because the patient is a human being that needs to have their hand held. >> [Rebecca] And their life is on the line. >> Their life is on the line. The healing and treatment is about whether or not the person is able to step up and heal themselves. >> [Rebecca] Right. >> So there's going to be this kind of interesting observation over the next few years. Folks that work with other people will use data to inform. Folks that work with machines, folks that don't work with other people, are likely to find that other machines end up being really, really good at their job. >> [Rebecca] Right. >> Because of the speeds of data, at the compactness of data, human beings just cannot respond to data as fast as a machine, but machines still cannot respond to people as well as people can. >> And they don't have empathy. >> And they don't have empathy, so if I were to make a prediction, I would say that, in the future, if your job is more tied to using machines, yeah, you got a concern, but if your job is tied to working with people, your job is gonna be that much more important, and increasingly, the people that are working with machines are gonna have to find jobs that have them work with other people. >> Right, right. Well it's been a great day. It's fun to work with you. This is our first time together on The Cube. It was a great day. >> Well The Cube is a blast. >> The Cube is a blast. It's a constant party. I'm Rebecca Knight for Peter Burris, this has been NetApp Insight 2017 in Berlin. We will see you next time.

Published Date : Nov 14 2017

SUMMARY :

brought to you by NetApp. and about what it's doing now to help other companies and the result of that was a company that, that the company did, as you said, in the tech industry today, like AWS, that seem to have their fingerprint, and then deploy a strategy based on those insights and the decision that she drove was to not just to GDPR, and just really on the ground more, talked about the need to worry about sparse data, and if NetApp is capable of demonstrating that, and how it really could have sent companies out of business and that they're comparable with NetApp, but that can have a real impact and we also had the gentleman from The Economist that needs to have their hand held. Their life is on the line. kind of interesting observation over the next few years. Because of the speeds of data, and increasingly, the people that are working with machines It's fun to work with you. The Cube is a blast.

ENTITIES

Entity	Category	Confidence
Adam Steltzner	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Sheila	PERSON	0.99+
Peter Burris	PERSON	0.99+
Rebecca	PERSON	0.99+
Europe	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Sheila Fitzpatrick	PERSON	0.99+
William Gibson	PERSON	0.99+
$160 billion	QUANTITY	0.99+
Adam	PERSON	0.99+
Berlin	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Mars	LOCATION	0.99+
NASA	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
first time	QUANTITY	0.98+
today	DATE	0.98+
both	QUANTITY	0.98+
Berlin, Germany	LOCATION	0.98+
one	QUANTITY	0.96+
two years ago	DATE	0.96+
HCI	ORGANIZATION	0.94+
Dr.	PERSON	0.93+
NetApp	TITLE	0.93+
Wikibon	ORGANIZATION	0.86+
of years ago	DATE	0.83+
NetApp Insight	ORGANIZATION	0.83+
The Economist	TITLE	0.77+
years ago	DATE	0.73+
Insight	TITLE	0.68+
Insight 2017	EVENT	0.65+
next few years	DATE	0.65+
first	QUANTITY	0.6+
couple	QUANTITY	0.59+
GD	ORGANIZATION	0.58+
years	DATE	0.53+
next	DATE	0.52+
The Cube	ORGANIZATION	0.49+
Cube	TITLE	0.46+
2017	DATE	0.45+
Cube	ORGANIZATION	0.34+

Alfred Manhart, NetApp & Lars Göbel, DARZ | NetApp Insight Berlin 2017

>> Announcer: Live from Berlin, Germany, it's The Cube covering NetApp Insight 2017. Brought to you by NetApp. >> Welcome back to The Cube's live coverage of NetApp Insight here in Berlin, Germany. I'm your host Rebecca Knight along with my co-host Peter Burris. We are join by Alfred Manhart. He is the Senior Direct Channel and System Integrator Ischemia for NetApp, and Lars Gobel, who is the Head of Strategy and Innovation for DARZ. Thanks so much for joining us. >> Thank you. >> Thank you for the invitation. >> So Manfred, I mean Alfred, before the cameras were rolling, you were talking a little bit about key partnerships and why they are so critical to helping NetApp manage the data and help it flow freely. Can you tell our viewers a little bit more about the partnerships aspect? >> So we have, of course, partnering with NetApp is a base of our strategy. It's not just a initiative. So partnering is key for us. And what we currently see is that the partner landscape has to change. The existing partner that what we are trying to help them to transform to the digital world change the world with data on one side and on the other side we need additional new partner that make the complex customer-oriented offering become reality. This is an example probably DARZ's staff anyhow, but they build up this kind of multiple partnerships to offer the customer-related offering and solution for the end customers. >> Great, great. So tell us how you fit in here Lars? I mean, as important of partnerships. >> So, we are in a situation that IT is getting more and more complex. And we also get into the position that the understand is now clear that not the company can internally are the best at every part. So, for example, Global Innovation Index makes analyzes with the outcome that everywhere where partnerships exists, the innovation is much higher. And today we talk over new business model, we talk over innovation, scalability, flexibility, and for these topics and all the for the new size of environments and also of the challenges the customers have. They need the best for every part of the solutions and we at DARZ, a full IT service provider, try to bring that together. So we offer from co-location housing over private co-hosting up to a public cloud and hyper cloud scenarios complete bandwidth. So we bring together Amazon Web Service and Microsoft Azure to realize one solution for the customer. >> So, every large enterprise is gonna have multiple relationships like the one that they have with you. And while you are helping to bring Amazon and Azure and others under the DARZ umbrella of services, there is gonna have to be something that connects them a little bit more deeply, right? That's probably gonna be data. >> Lars: Yeah. >> So tell us a little bit about that underlying fabric that's going to be required to ensure that data can be rendered in all of these different environments and sourced from all of these different environments according to the needs of business. What do you think? What will NetApp's role in that be? >> That's an interesting one. I think the world from a partnership perspective is even getting more complex, yeah? Instead of making everything as a single one st-- One initial shot, more technical, it's more outcome-based, longer-term based. So if you're not thinking that way, what should be my desired outcome of what-- How my world should look like in a year, in two years from now, you probably choose the wrong partner from the beginning. So this kind of being relevant and being prepared for the future, for all the challenges that are coming up, is very, very important. And data is a short-term issue and of course you have to consider what you want to do with data long term. That is the challenge to balance out the short-term benefits with the long-term objective you have. And thus makes the world more complex. >> So what do you look for in a partner? As you said, you could realize too late you chose the wrong partner from the beginning. But what are sort of the key characteristics and attributes that you want? >> OK, from our perspective we also, we do two things. On the one side, we concentrate on the existing partners and support them on their way to the new world. Yeah? Not all of them will make it. Yeah? And on the other side, we have an acquisition program in place, that we address the partner that are needed for the future and also expand the ecosystem with partners, which are probably we are not even aware of. Talking about coder partners, alliance partners, cloud partners we currently have not in our portfolio. So it's both, driving the existing channel ecosystem to the digital world and acquiring partners that are needed for the future. >> Great. Well Alfred, Lars, thank you so much for coming on the show. It's been great having you. >> Thank you >> Thank you very much for inviting us. >> I'm Rebecca Knight for Peter Burris, we will have more from NetApp Insight just after this. (upbeat music)

Published Date : Nov 14 2017

SUMMARY :

Brought to you by NetApp. He is the Senior Direct Channel So Manfred, I mean Alfred, before the cameras and on the other side we need additional So tell us how you fit in here Lars? for the customer. multiple relationships like the one that they have with you. and sourced from all of these different environments That is the challenge to balance out and attributes that you want? And on the other side, we have Well Alfred, Lars, thank you so much for coming on the show. Thank you very much we will have more from NetApp Insight just after this.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Peter Burris	PERSON	0.99+
Lars Gobel	PERSON	0.99+
Lars	PERSON	0.99+
Manfred	PERSON	0.99+
Alfred	PERSON	0.99+
NetApp	ORGANIZATION	0.99+
Alfred Manhart	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Lars Göbel	PERSON	0.99+
DARZ	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
both	QUANTITY	0.99+
NetApp Insight	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
today	DATE	0.98+
Microsoft	ORGANIZATION	0.98+
single	QUANTITY	0.96+
Amazon Web Service	ORGANIZATION	0.95+
a year	QUANTITY	0.95+
one solution	QUANTITY	0.95+
two years	QUANTITY	0.9+
one side	QUANTITY	0.88+
Azure	TITLE	0.86+
2017	DATE	0.84+
Berlin	LOCATION	0.8+
One initial	QUANTITY	0.73+
Azure	ORGANIZATION	0.69+
one st	QUANTITY	0.66+
NetApp Insight 2017	TITLE	0.59+
Global Innovation Index	OTHER	0.53+
Insight	TITLE	0.48+
Cube	ORGANIZATION	0.37+

Matt Watts, NetApp & Kenneth Cukier, The Economist | NetApp Insight Berlin 2017

>> Narrator: Live from Berlin, Germany, it's theCUBE. Covering NetApp Insight 2017. Brought to you by NetApp. (techno music) Welcome back to theCUBE's live coverage of NetApp Insight here in Berlin, Germany. I'm your host, Rebecca Knight, along with my cohost Peter Burris. We have two guests for this segment. We have Matt Watts, he is the director and data strategist and director of technology at NetApp, and Kenneth Cukier, a senior editor at The Economist, and author of the best-selling book Big Data, and author of a soon to be best-selling book on AI. Welcome. Thank you. Thank you much for coming on the show. Pleasure to be here. So, this is the, we keep hearing NetApp saying this is the day of the data visionary. I'd love to hear both of you talk about what a data visionary is, and why companies, why this is a necessary role in today's companies. Okay, so I think if you look at the generations that we've been through in the late nineties, early 2000's, it was all about infrastructure with a little bit of application and some data associated to it. And then as we kind of rolled forward to the next decade the infrastructure discussion became less. It became more about the applications and increasingly more about the data. And if we look at the current decade that we're in right now, the infrastructure discussions have become less, and less, and less. We're still talking about applications, but the focus is on data. And what we haven't seen so much of during that time is the roles changing. We still have a lot of infrastructure people doing infrastructure roles, a lot of application people doing application roles. But the real value in this explosion of data that we're seeing is in the data. And it's time now that companies really look to put data visionaries, people like that in place to understand how do we exploit it, how do we use it, what should we gather, what could we do with the information that we do gather. And so I think the timing is just right now for people to be really considering that. Yeah, I would build on what Matt just said. That, functionally in the business and the enterprise we have the user of data, and we have the professional who collected the data. And sometimes we had a statistician who would analyze it. But pass it along to the user who is an executive, who is an MBA, who is the person who thinks with data and is going to present it to the board or to make a decision based on it. But that person isn't a specialist on data. That person probably doesn't, maybe doesn't even know math. And the person is thinking about the broader issues related to the company. The strategic imperatives. Maybe he speaks some languages, maybe he's a very good salesperson. There's no one in the middle, at least up until now, who can actually play that role of taking the data from the level of the bits and the bytes and in the weeds and the level of the infrastructure, and teasing out the value, and then translating it into the business strategy that can actually move the company along. Now, sometimes those people are going to actually move up the hierarchy themselves and become the executive. But they need not. Right now, there's so much data that's untapped you can still have this function of a person who bridges the world of being in the weeds with the infrastructure and with the data itself, and the larger broader executives suite that need to actually use that data. We've never had that function before, but we need to have it now. So, let me test you guys. Test something in you guys. So what I like to say is, we're at the middle of a significant break in the history of computing. The first 50 years or so it was known process, unknown technology. And so we threw all our time and attention at understanding the technology. >> Matt: Yeah. We knew accounting, we knew HR, we even knew supply-chain, because case law allowed us to decide where a title was when. [Matt] Yep. But today, we're unknown process, known technology. It's going to look like the cloud. Now, the details are always got to be worked out, but increasingly we are, we don't know the process. And so we're on a road map of discovery that is provided by data. Do you guys agree with that? So I would agree, but I'd make a nuance which is I think that's a very nice way of conceptualizing, and I don't disagree. But I would actually say that at the frontier the technology is still unknown as well. The algorithms are changing, the use cases, which you're pointing out, the processes are still, are now unknown, and I think that's a really important way to think about it, because suddenly a lot of possibility opens up when you admit that the processes are unknown because it's not going to look like the way it looked in the past. But I think for most people the technology's unknown because the frontier is changing so quickly. What we're doing with image recognition and voice recognition today is so different than it was just three years ago. Deep learning and reinforcement learning. Well it's going to require armies of people to understand that. Well, tell me about it. This is the full-- Is it? For the most, yes it's a full employment act for data scientists today, and I don't see that changing for a generation. So, everyone says oh what are we going to teach our kids? Well teach them math, teach them stats, teach them some coding. There's going to be a huge need. All you have to do is look at the society. Look at the world and think about what share of it is actually done well, optimized for outcomes that we all agree with. I would say it's probably between, it's in single percents. Probably between 1% and 5% of the world is optimized. One small example: medical science. We collect a lot of data in medicine. Do we use it? No. It's the biggest scandal going on in the world. If patients and citizens really understood the degree to which medical science is still trial and error based on the gumption of the human mind of a doctor and a nurse rather than the data that they actually already collect but don't reuse. There would be Congressional hearings everyday. People, there would be revolutions in the street because, here it is the duty of care of medical practitioners is simply not being upheld. Yeah, I'd take exception to that. Just, not to spend too much time on this, but at the end of the day, the fundamental role of the doctor is to reduce the uncertainty and the fear and the consequences of the patient. >> Kenneth: By any means necessary and they are not doing that. Hold on. You're absolutely right that the process of diagnosing and the process of treatment from a technical standpoint would be better. But there's still the human aspect of actually taking care of somebody. Yeah, I think that's true, and think there is something of the hand of the healer, but I think we're practicing a form of medicine that looks closer to black magic than it does today to science. Bring me the data scientist. >> Peter: Alright. And I think an interesting kind of parallel to that is when you jump on a plane, how often do you think the pilot actually lands that plane? He doesn't. No. Thank you. So, you still need somebody there. Yeah. But still need somebody as the oversight, as that kind of to make a judgment on. So I'm going to unify your story, my father was a cardiologist who was also a flight surgeon in the Air Force in the U.S., and was one of the few people that was empowered by the airline pilots association to determine whether or not someone was fit to fly. >> Matt: Right. And so my dad used to say that he is more worried about the health of a bus driver than he is of an airline pilot. That's great. So, in other words we've been gah-zumped by someone who's father was both a doctor and a pilot. You can't do better than that. So it turns out that we do want Sully on the Hudson, when things go awry. But in most cases I think we need this blend of the data on one side and the human on the other. The idea that the data just because we're going to go in the world of artificial intelligence machine learning is going to mean jobs will be eradicated left and right. I think that's a simplification. I think that the nuance that's much more real is that we're going to live in a hybrid world in which we're going to have human beings using data in much more impressive ways than they've ever done it before. So, talk about that. I mean I think you have made this compelling case that we have this huge need for data and this explosion of data plus the human judgment that is needed to either diagnose an illness or whether or not someone is fit to fly a plane. So then where are we going in terms of this data visionary and in terms of say more of a need for AI? Yeah. Well if you take a look at medicine, what we would have is, the diagnosis would probably be done say for a pathology exam by the algorithm. But then, the health care coach, the doctor will intervene and will have to both interpret this for, first of what it means, translate it to the patient, and then discuss with the patient the trade-offs in terms of their lifestyle choices. For some people, surgery is the right answer. For others, you might not want to do that. And, it's always different with all of the patients in terms of their age, in terms of whether they have children or not, whether they want the potential of complications. It's never so obvious. Just as we do that, or we will do that in medicine, we're going to do that in business as well. Because we're going to take data that we never had about decisions should we go into this market or that market. Should we take a risk and gamble with this product a little bit further, even though we're not having a lot of sales because the profit margins are so good on it. There's no algorithm that can tell you that. And in fact you really want the intellectual ambition and the thirst for risk taking of the human being that defies the data with an instinct that I think it's the right thing to do. And even if we're going to have failures with that, and we will, we'll have out-performance. And that's what we want as well. Because society advances by individual passions, not by whatever the spreadsheet says. Okay. Well there is this issue of agency right? So at the end of the day a human being can get fired, a machine cannot. A machine, in the U.S. anyway, software is covered under the legal strictures of copywriting. Which means it's a speech act. So, what do you do in circumstances where you need to point a finger at something for making a stupid mistake. You keep coming back to the human being. So there is going to be an interesting interplay over the next few years of how this is going to play out. So how is this working, or what's the impact on NetApp as you work with your customers on this stuff? So I think you've got the AI, ML, that's kind of one kind of discussion. And that can lead you into all sorts of rat holes or other discussions around well how do we make decisions, how do we trust it to make decisions, there's a whole aspect that you have to discuss around that. I think if you just bring it back to businesses in general, all the businesses that we look at are looking at new ways of creating new opportunities, new business models, and they're all collecting data. I mean we know the story about General Electric. Used to sell jet engines and now it's much more about what can we do with the data that we collect from the jet engines. So that's finding a new business model. And then you vote with a human role in that as well, is well is there a business model there? We can gather all of this information. We can collect it, we can refine it, we can sort it, but is there actually a new business model there? And I think it's those kind of things that are inspiring us as a company to say well we could uncover something incredible here. If we could unlock that data, we could make sure it's where it needs to be when it needs to be there. You have the resources to bring to bed to be able to extract value from it, you might find a new business model. And I think that's the aspect that I think is of real interest to us going forward, and kind of inspires a lot of what we're doing. Great. Kenneth, Matt, thank you so much for coming on the show. It was a really fun conversation. Thank you. Thank you for having us. We will have more from NetApp Insight just after this. (techno music)

Published Date : Nov 14 2017

SUMMARY :

and the enterprise we and the consequences of the patient. of the hand of the healer, in the Air Force in the U.S., You have the resources to bring to bed

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Matt Watts	PERSON	0.99+
Kenneth	PERSON	0.99+
Kenneth Cukier	PERSON	0.99+
Peter	PERSON	0.99+
General Electric	ORGANIZATION	0.99+
Matt	PERSON	0.99+
1%	QUANTITY	0.99+
U.S.	LOCATION	0.99+
NetApp	ORGANIZATION	0.99+
5%	QUANTITY	0.99+
two guests	QUANTITY	0.99+
NetApp Insight	ORGANIZATION	0.99+
late nineties	DATE	0.99+
both	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
three years ago	DATE	0.98+
next decade	DATE	0.98+
NetApp	TITLE	0.98+
one	QUANTITY	0.98+
first 50 years	QUANTITY	0.97+
NetApp Insight	ORGANIZATION	0.97+
early 2000's	DATE	0.97+
2017	DATE	0.95+
theCUBE	ORGANIZATION	0.95+
first	QUANTITY	0.95+
Berlin	LOCATION	0.94+
The Economist	ORGANIZATION	0.93+
today	DATE	0.93+
single percents	QUANTITY	0.93+
one side	QUANTITY	0.92+
Big Data	TITLE	0.89+
One small example	QUANTITY	0.82+
NetApp Insight 2017	EVENT	0.67+
on the Hudson	TITLE	0.63+
Sully	PERSON	0.56+
current	DATE	0.54+
few	QUANTITY	0.53+
years	DATE	0.5+

Sheila FitzPatrick, NetApp & Paul Stringfellow, Gardner Systems | NetApp Insight Berlin 2017

>> Announcer: Live from Berlin, Germany, it's theCUBE, covering NetApp Insight 2017. Brought to you by NetApp. (upbeat music) >> Welcome back to theCUBE's live coverage of NetApp Insight 2017, here in Berlin, Germany. I'm your host, Rebecca Knight, along with my co-host, Peter Burris. We are joined by Shelia Fitzpatrick, she is the Chief Privacy Officer of NetApp, and Paul Stringfellow who is a Technical Director at Gardner Systems. Shelia, Paul, thanks so much for joining us. >> Thank you. >> Thank you for inviting us. >> So, I want to talk about data privacy. The general data protection regulation, the EU's forthcoming laws, GDPR, are going to take effect in May of next year. They represent a huge fundamental change about the way that companies use data. Can you just set the scene for our viewers and explain what these changes mean? >> Sure, happy to. As you said, GDPR is the newest regulation, it will replace the current EU directive, goes into effect May 25th of 2018. It has some fundamental changes that are massively different than any other data privacy laws you've ever seen. First and foremost, it is a legal, compliance and business issue as opposed to a technology issue. It's also the first extra-territorial regulation, meaning, it will apply to any organization anywhere in the world, regardless of whether or not they have a presence in Europe. But if they provide goods and services to an EU resident, or they have a website that EU residents would go to to enter data, they are going to have to comply with GDPR, and that is a massive change for companies. Not to mention the sanctions, the sanctions can be equal to 20 million Euro or 4% of a company's annual global turnover, pretty phenomenal sanctions. There are a lot of fundamental changes, but those are probably the biggest right there. >> What are some of the biggest challenges that companies are... I mean, you talked about the threat of sanctions and just the massive implications of what companies need to do to prepare? >> To really prepare, as I'm talking to customers, they really need, unfortunately a lot of companies are just thinking about security. And they're thinking, well as long as we have encryption, as long as we have tokenization, as long as we're locking down that data, we're going to be okay. I'm saying, no. It first and foremost starts with building that legal compliance program. What does your data privacy program look like? What personal data are you collecting? Why are you collecting it? Do you have the legal right to collect it? Part of GDPR requires unambiguous, explicit, freely-given consent. Companies can no longer force or imply consent. A lot of times when you go on to websites the terms and conditions are so impossible to understand that people just tick the box (laughs). Well, under GDPR, that will no longer be valid because it has to be very transparent, very easily understandable, very readable. And people have to know what organizations are doing with their data. And it puts ownership and more control of data back into the hands of the data subject, as opposed to the organizations that are collecting data. SO those are some of the fundamental changes. For the Cloud environment, for instance, for a lot of big hyperscalers, GDPR now puts obligations on data processors which is very different from the current regulation. SO that's going to be a fundamental change of business for a lot of organizations. >> Now, is it just customers or is it customers and employees as well? >> It's customers, employees, suppliers, it's any personal data that an organization collects, regardless of the relationship. >> SO what does it mean? Does it mean that I'm renting your data? Does it mean that I, 'cause you now own it, it's not me owning it. >> I own it, that's right. >> What are some of the implications of how folks are going to monetize some of these resources? >> SO what it actually means is, as an organization that's collecting data, you have to have a legal and valid business reason for needing that data. SO part of GDPR requires what's called, data minimization. You should only be collecting the minimal amount of data you need in order to provide the service you're going to provide, or manage the relationship you're going to manage. And you are never, as an organization, the owner of that data, you're the data steward. I am giving you permission to use my data for a very specific reason. You can't take liberties with that data. You can't do, what I call, scope-creep which is, once you have the data, "Oh, I can do whatever I want "with that data," no you can't. Unless I have consented to it, you cannot use that data. And so, that is going to be a major change for organizations to deal with and it doesn't matter if it's your employee data, your customer data, your partner data, your alternative worker data, your supplier data. Whose ever data you have, you better be transparent about that data. >> Shelia, you haven't once mentioned technology. Paul, what does this mean from a technology perspective? >> I suppose it's my job to mention technology? >> As Shelia will tell you, the GDPR, it should not be driven by IT. Because it's not an IT problem, it's absolutely a legal and compliance issue. However, I think there's a technology problem in there. So for lots of things that Shelia is talking about, in terms of understanding your data, in terms of being able to find data, being able to remove data when you no longer need to use it, that's absolutely a technology problem. And I think, actually, maybe something you won't hear said very often, I'm a real fan of GDPR, I think a it's long overdue it's probably because Shelia's been beating me round the head for the last 12 months >> I have. >> about it. But, I think it's one of those things that's long overdue to all of us within enterprises, within business, who hold and look after data. Because what we've done, traditionally, is that we just collected tons and tons of data and we bought storage 'cause storage could be relatively cheap, we're moving things to the Cloud. And, we've got absolutely no control, no management, no understanding of what the data is, where it is, who has access to it? Does anybody even access it, I'm paying for it, does anybody even use it? And I think what this is, for me, if GDPR wasn't a regulatory thing that we had to do, I think it's a set of really good practices that, as organizations, we should be looking to follow anyway. And technology plays a small part in that, it will enable organizations to understand the data better, it will enable those organizations to be able to find information as and when they need it. When somebody makes a subject access request, how are you going to find that data without appropriate technology? And I think, first and foremost, it's something that is forcing organizations to look at the way they culturally look after data within their business. This is no longer about, "Let me just keep things forever and I won't worry about it." This is a cultural shift that says data is actually an asset in your business. And as Shelia actually mentioned before, and something I'll pinch in future, the data is not mine, I'm just the custodian of that data while you allow me to be so. So I should treat that like anything else I'm looking after on your behalf. SO I think it's those kind of fundamental shifts that will drive technology adoption, no doubt, to allow you to do that, but actually, it's much more of a cultural shift in the way that we think of data and the way that we manage data in our businesses. >> Well you're talking about it as this regulation that is long overdue, and it will cause this cultural shift. So what will be different in the way that companies do business and the way that they treat their customer data, and their customer's privacy? And their employee's privacy, too, as you pointed out? >> Well, and part of the difference is going to be that need for transparency. So companies are going to have to be very upfront about what they're doing with the data, as Paul said. You know, why are they collecting that data, and they need to think differently about the need for data. Instead of collecting massive amounts of data that you really don't need, they need to take a step back and say, "This is the type of relationship "I'm trying to manage." Whether it's an employment relationship, whether it's a customer relationship, whether it's a partner relationship. What is the minimum amount of information I need in order to manage that relationship? So if I have an employee, for instance, I don't need to know what my employee does on their day off. Maybe that's a nice thing to know because I think well, maybe we can offer them a membership to a gym because they like to work out? That's not a must-have, that's a nice-to-have. And GDPR is going to force must-haves. In order to manage the employment relationship I have to be able to pay you, I have to be able to give you a job, I have to be able to provide benefits, I have to be able to provide performance evaluations and other requirements, but if it's not legally required, I don't need that data. And so it's going to change the way companies think about developing programs, policies, even technology. As they start to think about how they're developing new technology, what data do they need to make this technology work? And technology has actually driven the need for more privacy laws. If you think about IoT, artificial intelligence, Cloud. >> Mobile. >> Absolutely. Great technology, but from a privacy perspective, the privacy was never a part of the planning process. >> In fact, in many respects it was the exact opposite. There were a whole bunch of business models, I mean if you think about it in the technology industry, there's two fundamental business models. There's the ad-based business model, which is, "Give us all your data "and we'll figure out a way to monetize it." >> Absolutely. >> And there's a transaction-based business model which says, "We'll provide you a service "and you pay us, and we promise to do something "and only something with your data." >> Absolutely. >> It's the difference between the way Google and Facebook work, and say, Apple and Microsoft work. SO how is this going to impact these business models in ways of thinking about engaging customers at least where GDPR is the governing model? >> Well, it is going to force a fundamental change in their business model. SO the companies that you mentioned, that their entire business model is based on the collection and aggregation of data, and in some cases, the selling of personal data. >> Some might say screwing you. >> Some might definitely say that, especially if you're a privacy attorney, you might say that. They offer fabulous services and people willingly give up their privacy, that's part of the problem, is that they're ticking the box to say, "I want to use Facebook, I want to use Twitter, "I want to use LinkedIn "because these are great technologies." But, it's the scope-creep. It's what you're doing behind the scenes that I don't know how you're using my data. SO transparency is going to become more and more critical in the business model and that's going to be a cultural, as Paul said, a cultural shift for companies that their entire business model's based on personal data. They're struggling because they're the companies that, no matter what they do, they're going to have to change. They can't just make a simple, change their policy or procedure, they have to change their entire business model to meet the GDPR obligations. >> And I think from, like Shelia says there, and obviously GDPR's very much around, kind of, private data. Well, the conversation we're having with our customers is, is a much wider scope than that, it is all of the data that you own. And it's important, I think, organizations need to stop being fast and loose with the information that they hold because not only is the private information about those people there that, you know, me and you, and that we don't want that necessarily leaked across the well to somebody who might look to exploit that for some other reason. But, that might be, business confidential information, that might be price list, it might be your customer list. And, at the moment, I think in lots of organizations we have a culture where people from top to bottom in an organization don't necessarily understand that. SO they might be doing something where, we had a case in UK recently where some records, security arrangements for Heathrow Airport were found on a bus. So somebody copied them to a USB stick, no encryption, somebody copied it to a USB stick, thought it was okay to take home and leave in the back of, probably didn't think it was okay to leave in the back of the taxi, but certainly thought it was okay to take that information home. And you look at that and think, well, what other business asset that that organization held would they have treated with such disdain, almost to say "I just don't care, this is just ones and zeroes, "why would I care about it?" It's that shift that I think we're starting to see. And I think it's that shift that organizations should have taken a long time ago. We talk to customers, and you hear of events like this all the time, data is the new gold, data is the new precious material of your choice. >> Which it really isn't. It really isn't, here's why I say that because this is the important thing and leads to the next question I was going to ask you. Every asset that's ever been conceived follows the basic laws in economic scarcity. Take gold, you can apply to that purpose, you can make connectors for a chip, or you can use it as a basis for making jewelry or some other purpose. But, data is fungible in so many ways. You can connect it and in many respects, we talked about it a little bit earlier, the act of making it private is, in many respects, the act of turning it into an asset. SO one of the things I want to ask you about, if you think about it, is that, there will still be a lot of net new ways to capture data that's associated with a product or service in a relationship. SO we're not saying that GDPR is going to restrict the role that data plays, it's just going to make it more specific. We're still going to see more IoT, we're still going to see more mobile services, as long as the data that's being collected is in service to the relationship or the product that's being offered. >> Yeah, you're absolutely right. I mean, one of the things that I always say is that, GDPR's intent is not stop organizations from collecting data, data is your greatest asset, you need data to manage any kind of relationship. But, you're absolutely right in what it's going to do is force transparency, so instead of doing things behind the scenes where nobody has any idea what you're doing with my data, companies are going to have to be extremely transparent about it and think about how it's being used. You talked about data monetization, healthcare data today is ten times more valuable than financial data. It is the data that all hackers want. And the reason is, is because you take even aggregate and statistical information through, say trial clinics, information that you think there's no way to tie it back to a person, and by adding just little elements to it, you have now turned that data into greater value and you can now connect it back to a person. SO data that you think does not have value, the more we add to it and the more, sort of, profiling we do, the more valuable that data is going to become. >> But it's even more than that, right? Because not only are you connecting it back to a person, you're connecting it back to a human being. Whereas financial data is highly stylized, it's defined, it's like this transaction defining, and there's nothing necessarily real about it other than that's the convention that we used to for example, do accounting. But, healthcare data is real. It ties back to, what am I doing, what drugs am I taking, why am I taking them, when am I visiting somebody? This is real, real data that provides deep visibility into the human being, who they are, what they face, and any number of other issues. >> Well, if you think about GDPR, too, they expanded the definition of personal data under GDPR. SO it now includes data, like biometric and genetic information that is heavily used in the healthcare industry. It also includes location data, IP information, unique identifiers. SO a lot of companies say, "Well, we don't collect personal data "but we have the unique identifiers." Well, if you can go through any kind of process to tie that back to a person, that's now personal data. SO GDPR has actually the first entry into the digital age as opposed to the old fashioned processing. Where you can now take different aspects of data and combine it to identify a human being, as you say. >> So, I got one more question. This is something of a paradox, sorry for jumping in, but I'm fascinated by this subject. Something of a paradox. Because the act of making data private, at least to the corporation, is an act of creating an asset, and because the rules of GDPR are so much more specific and well thought through than most rules regarding data, does it mean that companies that follow GDPR are likely, in the long run, to be better at understanding, taking advantage of, and utilizing their data assets? That's the paradox. Most people say, "I need all the data." Well, GDPR says, "Maybe you need to be more specific "about how you handle your data assets." What do you think, is this going to create advantages for certain kinds of companies? >> I think it absolutely is going to create advantages in two ways. One, I see organizations that comply with GDPR as having a competitive advantage. Because, number one it goes down to trust. If I'm going to do business with Company A or Company B, I'm going to do business with the company that actually takes my personal data seriously. But, looking' at it from your point of view, absolutely. As companies become more savvy when it comes to data privacy compliance, not just GDPR, but data privacy laws around the world, they're also going to see more of that value in the data, be more transparent about it. But, that's also going to allow them to use the data for other purposes, because they're going to get very creative in how having your data is actually going to benefit you as an individual. SO they're going to have better ways of saying, "But, by having your data I can offer you these services." >> GDPR may be a catalyst for increased data maturity. >> Absolutely. >> Well, I wanna ask you about the cultural shift. We've been talking so much about it from the corporate standpoint, but will it actually force a cultural shift from the customer standpoint, too? I mean, this idea of forcing transparency and having the customer understand why do you need this from me, what do you want? I mean, famously, Europeans are more private than Americans. >> Oh much so. As you've said, "Just click accept, okay, fine, "tell me what I need to know, "or how can I use this website?" >> Well, the thing is that, it's not necessarily from a consumer point of view, but I do think it's from a personal point of view from everybody. SO whether you work inside an organization that keeps data, that's starting to understand just how valuable that data might be. And just to pick up on something, that just to pop at something you were saying before, I think one of the other areas where this has business benefit is that that better and increased management and maturity, actually I think is actually a great way, that better maturity around how we look after our data, has huge impact. Because, it has huge impact in the cost of storing' it, if we want to use Cloud services why am I putting things there that nobody looks at? And then, looking at maintaining this kind of cultural shift that says, "If I'm going to have data in my organization, "I'm no longer going to have it on a USB stick "and leave it in the back of a cab "when it's got security information "of a global major airport on it. "I'm going to think about that "because I'm now starting to understand." And this big drive about, people starting to understand how the information that people keep about you has a potential bigger impact, and it has a potential bigger impact if that data, yeah, we've seen data breach, after data breach after data breach. You can't look at the news any day of the week without some other data breach and that's partly because, a bit like health and safety legislation, GDPR's there because you can't trust all those organizations to be mature enough with the way that we look after our data to do these things. SO legislation and regulations come across and said, "Well, actually this stuff's really important "to me and you as individuals, "so stop being fast and loose with it, "stop leaving it in the back of taxis, "stop letting it leak out your organization "because nobody cares." And that's driving a two-way thing, here, it's partly we're having to think more about that because actually, we're not trusting organizations who are looking after our data. But, as Shelia said, if you become an organization that has a reputation for being good with the way they lock their data, and look after data, that will give you a competitive edge alongside, actually I'm being much more mature, I'm being much more controlled and efficient with how I look after my data. That's got big impact in how I deliver technology and certainly, within a company. Which is why I'm enthusiastic about GDPR, I think it's forcing lots and lots of long-overdue shift in the way that we, as people, look after data, architect technology, start to think about the kind of solutions and the kind of things that we do in the way that we deliver IT into business and enterprise across the globe. >> I think one of the things, too, and Paul brought it up, is he mentioned security several times. And, as Paul knows, one of my pet peeves is when companies say, "We have world-class security, "therefore we're compliant with GDPR." And I go, "Really, so you're basically locking down data "you're not legally allowed to have? That's "what you're telling me." >> Like you said earlier, it's not just about having encryption everywhere. >> Exactly, and it's funny how many companies say "Well, we're compliant with GDPR "because we encrypt the data." And I go, "Well, if you're not legally allowed "to have that data, that's not going to help you at all." And, unfortunately, I think that's what a lot of companies think, that as long as we're looking at the security side of the house, we're good. And they're missing the whole boat on GDPR. >> It's got to be secure. >> It's got to be secure. >> But-- >> You got to legally have it first. >> Exactly. The chicken and the egg. >> But, what's always an issue with security, around data and the stuff that Shelia talked about is quite a lot, is that one of the risks you have, is you can have all the great security in the world but, if the right person with the right access to the right data has all the things that they should have, that doesn't mean that they can't steal that data, lose that data, do something with that data that they shouldn't be doing, just because we've got it secured. SO we need to have policies and procedures in place that allow us to manage that better, a culture that understands the risk of doing those kinds of things, and maybe, alongside technologies that identify, unusual use of data are important within that. >> Well, Paul, Shelia, thank you so much for coming on the show, it's been a fascinating conversation. >> Thank you very much, appreciate it. >> Yeah, thanks for having us on, appreciate it. >> I'm Rebecca Knight for Peter Burris, we will have more from NetApp Insight here in Berlin in just a little bit. (upbeat music)

Published Date : Nov 14 2017

SUMMARY :

Brought to you by NetApp. she is the Chief Privacy Officer of NetApp, the EU's forthcoming laws, GDPR, are going to take effect and business issue as opposed to a technology issue. and just the massive implications of what companies need the terms and conditions are so impossible to understand regardless of the relationship. Does it mean that I, 'cause you now own it, And so, that is going to be a major change for organizations Shelia, you haven't once mentioned technology. being able to remove data when you no longer need to use it, to allow you to do that, but actually, it's much more And their employee's privacy, too, as you pointed out? Well, and part of the difference is going to be the privacy was never a part of the planning process. I mean if you think about it in the technology industry, which says, "We'll provide you a service SO how is this going to impact these business models SO the companies that you mentioned, in the business model and that's going to be a cultural, it is all of the data that you own. SO one of the things I want to ask you about, And the reason is, is because you take even aggregate other than that's the convention that we used to and combine it to identify a human being, as you say. in the long run, to be better at understanding, I think it absolutely is going to create advantages and having the customer understand "tell me what I need to know, that just to pop at something you were saying before, "you're not legally allowed to have? Like you said earlier, "to have that data, that's not going to help you at all." The chicken and the egg. is that one of the risks you have, on the show, it's been a fascinating conversation. I'm Rebecca Knight for Peter Burris, we will have more

ENTITIES

Entity	Category	Confidence
Shelia	PERSON	0.99+
Paul	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Peter Burris	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Paul Stringfellow	PERSON	0.99+
Berlin	LOCATION	0.99+
Shelia Fitzpatrick	PERSON	0.99+
Europe	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
May 25th of 2018	DATE	0.99+
Facebook	ORGANIZATION	0.99+
UK	LOCATION	0.99+
GDPR	TITLE	0.99+
Gardner Systems	ORGANIZATION	0.99+
two ways	QUANTITY	0.99+
4%	QUANTITY	0.99+
Sheila FitzPatrick	PERSON	0.99+
NetApp	ORGANIZATION	0.99+
first	QUANTITY	0.99+
two-way	QUANTITY	0.99+
First	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.99+
20 million Euro	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
ten times	QUANTITY	0.99+
EU	ORGANIZATION	0.99+
Shelia	ORGANIZATION	0.98+
today	DATE	0.98+
one more question	QUANTITY	0.98+
one	QUANTITY	0.98+
two fundamental business models	QUANTITY	0.97+
Heathrow Airport	LOCATION	0.97+
NetApp Insight	ORGANIZATION	0.97+
theCUBE	ORGANIZATION	0.96+
Twitter	ORGANIZATION	0.96+
EU	LOCATION	0.95+
tons and tons of data	QUANTITY	0.94+
2017	DATE	0.92+
May of next year	DATE	0.91+

Ruairí McBride, Arrow ECS & Brian McCloskey, NetApp| NetApp Insight Berlin 2017

>> Narrator: Live form Berlin, Germany, it's the Cube, covering NetApp insight 2017, brought to you by NetApp. Welcome back to the Cube's live coverage of NetApp insight 2017, we're here in Berlin, Germany, I'm your host, Rebecca Night along with my cohost Peter Burris. We have two guests on the program now, we have Rory McBride, who is the technical account manager at Aero and Bryan Mclosky, who is the vice president world wide for hyper converge infrastructure at NetApp. Bryan, Rory, thanks so much for coming on the show. >> Thanks. >> Let me start with you, Bryan, talk a little bit, tell our viewers a little bit about the value, that HCI delivers to customers, especially in terms of simplifying the data. >> In a nutshell, what NetApp HCI does is it takes what wold normally be hours and hours to implement a solution and 100s of inputs, generally, over 400 inputs and it simplifies it down to under 30 inputs in an installation, that will be done within 45 minutes. Traditionally HCI solutions have similar implementation characteristics, but you lose some of the enterprise flexibility and scale, that customers of NetApp have come to expect over the years. What we've done is we've provided that simplicity, while allowing customers to have the enterprise capabilities and flexibility, that they've grown accustomed to. >> Is this something, that you are talking with customers, in terms of the simplicity, what were you hearing from customers? >> Most customers these days are challenged of, everybody has to find a way to do more with less or to do minimally a lot more with the same. If you think of NetApp, we've always been wonderful about giving customers a great production experience. When you buy a typical NetApp product, you're gonna own it for three, four or five years and it will continue. NetApp has always been great for that three, four and five year time frame and what we've done with HCI is we really simplified the beginning part of that curve of how do you get it from the time it lands on your dock to implement it and usable by our users in a short manner, that's what HCI has brought to the NetApp portfolio, that's incremental to what was there before. >> One of the advantages to third parties, that work closely with NetApp is, that by having a simpler approach of doing things, you can do more of them, but on the other hand, you want to ensure, that you're also focused on the value add. In the field, when you're sitting down with a customer and working with them to ensure, that they get the value, that they want from these products, how do you affect that balance? As the product becomes simpler to the customer now being able to focus more on other things, other than configuration of limitation. >> We've been able to get to doing something with your data is the key. You needed a little bar of entry, which a lot of the software and hardware providers are trying to do today. I think HCI just has to pull all of that together, which is great. We're hearing from third party vendors, that it's great, that from day one, they've been integrated into the overall portfolio message and I think customers are just gonna be pretty excited with what they can do from zero with this hardware. >> When you think about ultimately how they're gonna spend their time, what are they going to be doing instead of now all this all configuration work? What is Aero gonna be doing now, that you're not doing that value added configuration work? >> Hopefully, we'll be helping to realize the full potential of what they bought, rather than spending a lot of time trying to make the hardware work, they're concentrating more on delivering a service or an application back to the business, it's gonna generate some revenue. In Aero we're talking a lot to people about IOT and it's gonna be the next wave of information, that people are gonna have to deal with and having a stable product, that can support and provide value, you have information back to business, it's gonna be key. >> Bryan, HCI, as you noted, dramatically reduces the time to get to value, not only now, but it also sustains that level of simplicity over the life of the utilization of the product. How does it fit into the rest of the NetApp product set, the rest of the NetApp portfolio? What does it make better, what makes it better in addition to just the HCI product? >> NetApp has a really robust portfolio of offerings, that we, at a high level categorize into our next generation offerings, which are Solid Fire, Flexpod Solid Fire, storage grid and hyper converge and then the traditional NetApp on tap based offerings. What the glue between the whole portfolio is the data fabric and HCI is very tightly integrated into the data fabric, one of the innovations we are delivering is snap mirror integration of the RHCI platform into the traditional on tap family of products. You can seamlessly move data from our hyper converge system to a traditional on tap base system and it also gives you seamless mobility to either your own private cloud or to public cloud platforms. As a company with a wide portfolio, it gives us the ability to be consultative with our partners and our customers. What we want is and we feel customers are best served on NetApp and we want them to use NetApp, and if an on tap base system is a better solution for them than hyper converge, then that's absolutely what we will recommend for them. Into your earlier question about the partners, one of the interesting things with HCI is it's the first time as NetAPP were delivering an integrated system with compute and with a hyperviser, it comes preconfigured with the emware and it's a wonderful opportunity for our partners to add incremental value through the sale cycle to what they've brought to NetApp in the past. Because as NetApp, we're really storage experts, where our partners have a much wider and deeper understanding of the whole ecosystem than we do. It's been interesting for us to have discussions with partners, cuz we're learning a lot, because we're now involved in layers and we're deeply involved at higher levels of the stack, than we have been. >> I'm really interested in that, because you say, that you have this consultative relationship with these customers, how are you able to learn from them, their best practices and then do you transfer what you've learned to other partners and other customers? >> From the customer and we try and disseminate the learning as much as we can, but we're a huge organization with many account teams, but it all starts with what the customers wants to accomplish, minimally they need a solution, that's gonna plug in and do what they expect it to do today. What's the more important part is where what their vision is for where they wanna be three years down the road, five years down the road, 10 years down the road. It's that vision piece, that tends to drive more towards one part of the portfolio, than the other. >> Take us through how this works. You walk into an account, presumably Aero ECS has a customer. The Aero ECS customer says, "Well, we have an issue, that's going to require some specialized capabilities and how we use our data". You can look at a lot of different options, but you immediately think NetApp, what is it, that leads you to NetApp HCI versus on tap, versus Solid Fire, is there immediate characteristic, that you say, "That's HCI"? >> I would say, that the driving factor was the fact, that they wanted something that's simple and easy to manage, they want to get a mango data base up and running or they've got some other application, that really depends on their business. The underlying hardware needs to function. Bryan was saying, that it's got element OS sitting underneath it, which is in its 10th iteration and you've got VM version six, which is the most adopted virtualization platform out there. These are two best breed partnerships coming together and people are happy with that, and can move, and manage it from a single pane of glass moving forward from day one right the way through when they need to transition to a new platform, which is seamless for them. That's great from any application point, because you don't wanna worry about the health of things, you wanna be able to give an application back to the business. We talked about education, this event is gauged towards bringing customers together with NetApp and understanding the messaging around HCI, which is great. >> What are the things, that you keep hearing form customers, does this need for data simplicity, this need for huge time saving products and services? What do you think, if you can think three to five years down the road, what will the next generation of concerns be and how are you, I'm gonna use the word, that we're hearing a lot, future proof, what you're doing now to serve those customers needs of the future? >> Three to five years down the road. I can't predict three to five years out very reliably. >> But you can predict, that they're gonna have more data, they're going to merge it in new and unseen ways and they need to do it more cheaply. >> The future proofing really comes in from the data fabric. With the integration into the data fabric, you could have information, that started on a NetApp system, that was announced eight years ago, seamlessly moves into a solid fire or flash array, which seamlessly moves to a hyperconverge system, which seamlessly moves to your private cloud, which eventually moves off to a public cloud and you can bring it back into any tiers and wherever you want that data in six, seven, eight years, the data fabric will extend to it. Within each individual product, there are investment protection technologies within each one, but it's the data fabric, that should make customers feel comfortable, that no matter where they're gonna end up, taking their first step with NetApp is a step in the right direction. >> The value added ecosystem, that NetApp and others use and Aero ECS has a big play around that, has historically been tied back into hardware assets, how does it feel to be moving more into worrying about your customers data assets? >> I think it's an exciting time to be bringing those things together. At the end of the day, it's what the customer wants, they want a solution, that integrates seamlessly from whether that be the rack right the way up to the application, they want something, that they can get on their phone, they want something they can get on their tablet, they want the same experience regardless whether they're in an airplane or right next to the data center. The demand on data is huge and will only get bigger over the next five years. I was looking at a recent cover of forest magazine, it was from a number of years ago about Nokia and how can anybody ever catch them and where are they now? I think you need to be able to spot the changes and adapt quickly and to steal one of the comments from the key note yesterday, is moving from a survivor to a thriver with your data, it's gonna be key to those companies. >> In talking about the demands on data growing, it's also true, that the demands on data professionals are growing too. How is that changing the way you recruit and retain top talent? >> For us, as NetApp, if you were to look at what we wanted in the CV five years ago, we wanted people, that understood storage, we wanted people, that knew about volumes, that knew about data layouts, that knew how to maximize performance by physical placement of data and now what we're looking for is people, that really understand the whole stack and that can talk to customers about their application needs their business problems, can talk to developers. Because what we've done is we've taken those people, that were good in all those other things I mentioned, when you ask them what did you love about this product, none of them ever came back and said I love the first week I spent installing it. We've taken that away and we've let them do more interesting work. A challenge for us is, us is a collective society, is to make sure we bring people forward from an education perspective skills enablement, so they're capable of rising to that next level of demand, but we're taking a lot of the busy work out. >> Making sure, that they have the skills to be able to take what they're seeing in the data and then take action. >> We want our customers to look at NetApp as data expert, that can work with them on their business problem, not a storage expert, that can explain how an array works. >> Bryan, Rory, thank you so much for coming on the show, it's been a great conversation. >> Thank you. >> Thank you very much. >> You are watching the Cube, we will have more from NetApp insight, I'm Rebecca Night for Peter Burris in just a little bit.

Published Date : Nov 14 2017

SUMMARY :

covering NetApp insight 2017, brought to you by NetApp. that HCI delivers to customers, especially in terms and flexibility, that they've grown accustomed to. or to do minimally a lot more with the same. As the product becomes simpler to the customer now I think HCI just has to pull all of that together, that people are gonna have to deal with the time to get to value, not only now, and it also gives you seamless mobility From the customer and we try and disseminate what is it, that leads you to NetApp HCI and easy to manage, they want to get a mango data base I can't predict three to five years out very reliably. and they need to do it more cheaply. and you can bring it back into any tiers and adapt quickly and to steal one of the comments How is that changing the way you recruit and that can talk to customers about their application needs to be able to take what they're seeing in the data as data expert, that can work with them for coming on the show, it's been a great conversation. we will have more from NetApp insight,

ENTITIES

Entity	Category	Confidence
Bryan	PERSON	0.99+
Rory McBride	PERSON	0.99+
Peter Burris	PERSON	0.99+
three	QUANTITY	0.99+
Brian McCloskey	PERSON	0.99+
Three	QUANTITY	0.99+
Rebecca Night	PERSON	0.99+
five years	QUANTITY	0.99+
Nokia	ORGANIZATION	0.99+
six	QUANTITY	0.99+
two guests	QUANTITY	0.99+
Ruairí McBride	PERSON	0.99+
four	QUANTITY	0.99+
NetApp	ORGANIZATION	0.99+
yesterday	DATE	0.99+
10th iteration	QUANTITY	0.99+
first step	QUANTITY	0.99+
Bryan Mclosky	PERSON	0.99+
Rory	PERSON	0.99+
five year	QUANTITY	0.99+
NetApp	TITLE	0.99+
three years	QUANTITY	0.99+
seven	QUANTITY	0.99+
One	QUANTITY	0.99+
eight years ago	DATE	0.98+
10 years	QUANTITY	0.98+
eight years	QUANTITY	0.98+
first time	QUANTITY	0.98+
five years ago	DATE	0.98+
Berlin, Germany	LOCATION	0.98+
today	DATE	0.97+
Aero ECS	ORGANIZATION	0.97+
over 400 inputs	QUANTITY	0.97+
HCI	ORGANIZATION	0.97+
each one	QUANTITY	0.96+
Aero	ORGANIZATION	0.95+
under 30 inputs	QUANTITY	0.95+
45 minutes	QUANTITY	0.95+
100s of inputs	QUANTITY	0.94+
one part	QUANTITY	0.92+
RHCI	TITLE	0.92+
single pane	QUANTITY	0.91+
zero	QUANTITY	0.89+
Cube	COMMERCIAL_ITEM	0.88+
NetAPP	TITLE	0.87+
one	QUANTITY	0.87+
2017	DATE	0.86+
day one	QUANTITY	0.85+
two best breed partnerships	QUANTITY	0.82+
number of years ago	DATE	0.81+
each individual product	QUANTITY	0.81+
HCI	TITLE	0.74+
first week	QUANTITY	0.71+
Cube	TITLE	0.69+

Brett Roscoe, NetApp & Laura Dubois, IDC | NetApp Insight Berlin 2017

>> Announcer: Live from Berlin, Germany, it's theCUBE! Covering NetApp Insight 2017. Brought to you by NetApp. (rippling music) Welcome back to theCUBE's live coverage of NetApp Insight. I'm Rebecca Knight, your host, along with my cohost Peter Burris. We are joined by Brett Roscoe. He is the Vice President for Solutions and Service Marketing at NetApp, and Laura Dubois, who is a Group Vice President at IDC. Thanks so much for coming on the show. Yeah, thanks for having us. Thank you for having us. So, NetApp and IDC partner together and worked on this big research project, as you were calling it, a thought leadership project, to really tease out what the companies that are thriving and being successful with their data strategies are doing, and what separates those from those that are merely just surviving. Do you want to just lay the scene for our viewers and explain why you embarked on this? Well, you know, it's interesting. NetApp has embarked on its own journey, right, its own transformation. If you look at where the company's been really over the past few years in terms of becoming a traditional storage company to a truly software, cloud-focused, data-focused company, right? And that means a whole different set of capabilities that we provide to our customers. It's a different, our customers are looking at data in a different way. So what we did was look at that and say we know that we're going through a transformation, so we know our customers are going through a journey themselves. And whatever their business model is, it's being disrupted by this digital economy. And we wanted a way to work with IDC and really help our customers understand what that journey might look like, where they might be on that path, and what are the tools and what are the engagement models for us to help them along that journey? So that was really the goal, was really, it's engagement with our customers, it's looking and being curious about where they are on their journey on digital, and how do they move forward in that, in doing all kinds of new things like new customer opportunities and new business and cost optimization, all that kind of stuff. So that's really what got us interested in the project to begin with. Yeah, and I would just add to that. Revenue's at risk of disruption across pretty much every industry, and what's different is the amount of revenue that's at risk within one industry to the next. And all of this revenue that's at risk, is really as a consequence of new kinds of business models, new kinds of products and services that are getting launched new ways of engaging with customers. And these are some of the things that we see thrivers doing and outperforming merely just survivors, or even just data resisters. And so we want to understand the characteristics of data thrivers, and what are they doing that's uniquely different, what are their attributes versus companies that are just surviving. So let's tease that out a little bit. What are these data thrivers doing differently? What are some of the best practices that have emerged from this study? Well I mean, I think if you look at there's a lot of great information that came out of the study for us in terms of what they're doing. I think in a nutshell, it's really they put a focus on their data and they look at it as an asset to their business. Which means a lot of different things in terms of how is the data able to drive opportunities for them. I mean, there's so many companies now that are getting insights from their data, and they're able to push that back to their customer. I mean, NetApp is a perfect example of that. We actually do that with our customers. All the telemetry data we collect from our own systems, we provide that information back to our customers so they can help plan and optimize their own environments. So I think data is certainly, it's validated our theory, our message of where we're going with data, but I think the data focus, I mean, there's lot of other attributes, there's the focus of hiring chief data officers within the company, there's certainly lots of other attributes, Laura, that you can comment on. Yeah, I mean, we see new roles emerging around data, right, and so we see the rise of the data management office. We see the emergence of a Chief Data Officer, we see data architects, certainly data scientists, and this data role that's increasingly integrated into sort of the traditional IT organization, enterprise, architecture. And so enterprise, architecture and these data roles very, very closely aligned is one, I would say, example of a best practice in terms of the thriver organizations, is having these data champions, if you will, or data visionaries. And certainly there's a lot of things that need to be done to have a successful execution, and a data strategy as a first place, but then a successful execution around data. And there's a lot of challenges that exist around data as well. So the survey highlighted that obviously data's distributed, it's dynamic and it's diverse, it's not only in your private cloud but in the public cloud, I think it's at 34% on average of data is in a public cloud. So, how to deal with these challenges is, I think, also one of the things that you guys wanted to highlight. Yeah, and I think the other big revelation was the thrivers, one of the aspects, so not their data focus but also they're making business decisions with their data. They tend to use that data in terms of their operations and how they drive their business. They tend to look for new ways to engage with their customers through a digital or data-driven experience. Look at the number of mobile apps coming out of consumer, really B to C kind of businesses. So there's more and more digital focus, there's more and more data focus, and there's business decisions made around that data. So, I want to push you guys on this a little bit. 'Cause we've always used data in business, so that's not new. There's always been increasing amounts of data being used. So while the volume's certainly new, it's very interesting, it's by itself not that new. What is new about this? What is really new about it that's catalyzing this change right now? Have you got some insights into that? Well, I would just say if you look at some of the largest companies that are no longer here, so you've got Blockbuster, you've got Borders Books and Music, you've got RadioShack, look at what Amazon has done to the retail industry. You look at what Uber is doing to the transportation industry. Look at every single industry, there's disruption. And there's the success of this new innovative company, and I think that's why now. Yes, data has always been an important attribute of any kind of business operation. As more data gets digital, combine that with innovation and APIs that allow you to, and the public cloud, allow you to use that as a launch pad for innovation. I think those are some of the things about why now. I mean, that would be my take, I don't know-- Yeah, I think there's a couple things. Number one, I think yes, businesses have been storing data for years and using data for years, but what you're seeing is new ways to use the data. There's analytics now, it is so easy to run analytics compared to what it was just years ago, that you can now use data that you've been storing for years and run historical patterns on that, and figure out trends and new ways to do business. I think the other piece that is very interesting is the machine learning, the artificial intelligence, right? So much of the industry now, I mean, look at the automotive industry. They are collecting more information than I bet they ever thought they would, because the autonomous driving effort, all of that, is all about collecting information, doing analytics on information, and creating AI capabilities within their products. So there's a whole new business that's all new, there's whole new revenue streams that are coming up as a result of leveraging insights from data. So let me run something by ya, 'cause I was looking for something different. It used to be that the data we were working was what I call stylized data. You can't go out here in Berlin and wander the streets and find Accounting. It doesn't exist, it's human-made, it's contrived. HR is contrived. We have historically built these systems based on transactions, highly stylized types of data. There's only so much you can do with it. But because of technology, mobile, IOT, others, we now are utilizing real world data. So we're collecting an entirely new class of data that has a dramatic impact in how we think about business and operations. Does that comport with what the study said, that study respondents focusing on new types of data as opposed to just traditional sources of data? We certainly looked at correlations of what data thrivers are doing by different types of data. I would say, in terms of the new types of data that are emerging, you've got time series data, stream data, that's increasingly important. You've got machine-generated data from sensors. And I would say that one thing that the thrivers do better than merely just survivors, is have processes and procedures in place to action the data. To collect it and analyze it, as Brett pointed out, is accessible, and it's easy. But what's not easy to is to action results out of that data to drive change and business processes, to drive change in how things are brought to market, for example. So, those are things that data thrivers are doing that maybe data survivors aren't. I don't know if you have anything to add to that. Yeah, no, I think that's exactly right. I think, yes, traditional data, but it's interesting because even those traditional data sets that have been sitting there for years have untapped value. >> Peter: Wikibon knew types of data. That's right. But we've also been doing data warehousing, analytics for a long time. So it seems as though, I would guess, that the companies that are leading, many that you mentioned, are capturing data differently, they're using analytics and turning data into value differently, and then they are taking action based on that data differently. And I'm wondering if across the continuum that you guys have identified, of thrivers all the way down to survivors, and you mentioned one other, data-- >> Laura: resisters. resisters, and there was, anyways. So there's some continuum of data companies. Do they fall into that pattern, where I'm good at capturing data, I'm good at generating analytics, but I'm not good at taking action on it? Is that what a data resister is? So a data resister is sort of the one extreme. Companies that don't have well-aligned processes where they're doing digital transformation on a very ad hoc basis, it's not repeatable. They're somewhat resistant to change. They're really not embracing that there's disruption going on that data can be a source of enablement to do the disrupting, not being disrupted. So they're kind of resisting those fundamental constructs, I would say. They typically tend to be very siloed. Their IT's in a very siloed architecture where they're not looking for ways to take advantage of new opportunities across the data they're generating, or the data they're collecting, rather. So that would be they're either not as good at creating business value out of the data they have access to. Yes, that's right, that's right. And then I think the whole thing with thrivers is that they are purposeful. They set a high level objective, a business-level objective that says we're going to leverage data and we're going to use digital to help drive our business forward. We are going to look to disrupt our own business before somebody disrupts it for us. So how do you help those data resistors? What's your message to them, particularly if they may not even operate with the belief that data is this asset? I mean, that's the whole premise of the study. I think the data that comes out, like you know, hey data thrivers, you're two times more likely to draw two times more profitability to there's lots of great statistics that we pulled out of this to say thrivers have a lot more going for them. There is a direct corelation that says if you are taking a high business value of your data, and high business value of the digital transformation that you are going to be more profitable, you're going to generate more revenue, and you're going to be more relevant in the next 10 to 20 years. And that's what we want to use that, to say okay where are you on this journey? We're actually giving them tools to measure themselves by taking assessments. They can take an assessment of their own situation and say okay, we are a survivor Okay, how do we move closer to being a thriver? And that's where NetApp would love to come in and engage and say let us show you best practices, let us show you tools and capabilities that we can bring to bear to your environment to help you go a little bit further on that journey, or help you on a path that's going to lead you to a data thriver. Yeah, that's right, I agree with that. (laughs) What is the thing that keeps you up at night for the data resister, though, in the sense of someone who is not, does not have, maybe not even capturing and storing the data but really has no strategy to take whatever insights the data might be giving them to create value? I don't know, that's a hard question. I don't know, what keeps you up at night? Well, I think if I were looking at a data resister, I think the stats, the data's against them. I mean, right? If you look at a Fortune 500 company in the 1950s, their average lifespan was something like 40 years. And by the year 2020, the average lifespan of an S&P 500 company is going to be seven years, and that's because of disruption. Now, historically that may have been industrial disruption, but now it's digital disruption, and that right there is, if you're feeling like you're just a survivor, that ought to keep a survivor up at night. If I can ask too. It's, for example, one of the reasons why so many executives say you have to hire millennials, because there's this presumption that millennials have a more natural affinity with data, than older people like me. Now, there's not necessarily a lot of stats that definitely prove that, but I think that's one of the, the misperceptions, or one of the perceptions, that I have to get more young people in because they'll be more likely to help me move forward in an empirical style of management than some older people who are used to a very, very different type of management practice. But still there are a lot of things that companies, I would presume, would need to be able to do to move from one who's resisting these kinds of changes to actually taking advantage of it. Can I ask one more question? Is it that, did the research discover that data is the cause of some of these, or just is correlated with success? In other words, you take a company like Amazon, who did not have to build stores like traditional retailers, didn't have to carry that financial burden, didn't have to worry so much about those things, so that may be starting to change, interestingly enough. Is that, so they found a way to use data to alter that business, but they also didn't have to deal with the financial structure of a lot of the companies they were competing with. They were able to say our business is data, whereas others had said our business is serving the customer with these places in place. So, which is it? Do you think it's a combination of cause and effect, or is it just that it's correlated? Hmm. I would say it's probably both. We do see a correlation, but I would say the study included companies whose business was data, as well as companies that were across a variety of industries where they're just leveraging data in new ways. I would say there's probably some aspects of both of that, but that wasn't like a central tenent of the study per se, but maybe that will be phase two. Maybe we'll mine the data and try and find some insights there. Yeah, there's a lot more information that we can glean from this data. We think this'll be an ongoing effort for us to kind of be a thought leader in this area. I mean, the data proved that there was 11% of those 800 respondents that are thrivers, which means most people are not in that place yet. So I think it's going to be a journey for everyone. Yes, I agree that some companies may have some laws of physics or some previous disruptions like brick and mortar versus online retail, but it doesn't mean there's not ways that traditional companies can't use technology. I mean, you look at, in the white paper, we used examples like General Electric and John Deere. These are very traditional companies that are using technology to collect data to provide insights into how customers are using their products. So that's kind of the thought leadership that any company has to have, is how do I leverage digital capabilities, online capabilities, to my advantage and keep being disruptive in the digital age? I think that's kind of the message that we want them to hear. Right, and I would just add to that. It's not only their data, but it's third-party data. So it's enriching their data, say in the case of Starbucks. So Starbucks is a company that certainly has many physical assets. They're taking their customer data, they're taking partner data, whether that be music data, or content from the New York Times, and they're combining that all to provide a customer experience on their mobile app that gives them an experience on the digital platform that they might have experienced in the physical store. So when they go to order their coffee in their mobile pay app, they don't have to wait in line for their coffee, it's already paid for and ready when they go to pick it up. But while they're in their app, they can listen to music or they can read the New York Times. So there's a company that is using their own data plus third party data to really provide a more enriched experience for their company, and that's a traditional, physical company. And they're learning about their customers through that process too. Exactly, exactly, right. Are there any industries that you think are struggling more with this than others? Or is it really a company-specific thing? Well, the research shows that companies in ever industry are facing disruption, and the research shows that companies in every industry are reacting to that disruption. There are some industries that tend to have, obviously by industry they might have more thrivers or more resisters, but nothing I can per se call out by industry. I think retail is the one that you can point to and say there's an industry that's really struggling to really keep up with the disruption that the large, people like Amazon and others have really leveraged digital well advanced of them, well in advance of their thought process. So I think the white paper actually breaks down the data by industry, so you can kind of look at that, I think that will provide some details. But I think every, there is no industry immune, we'll just put it that way. And the whole concept of industry is undergoing change as well. That's true, that is true, everything's been disrupted. Great, well, Brett and Laura thank you so much for coming on our show. We had a great conversation. Thank you. Enjoy your time. You're watching theCUBE, we'll have more from NetApp Insight after this. (rippling music)

Published Date : Nov 14 2017

SUMMARY :

and APIs that allow you guess, that the companies so that may be starting to

ENTITIES

Entity	Category	Confidence
Laura	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Brett	PERSON	0.99+
Brett Roscoe	PERSON	0.99+
Starbucks	ORGANIZATION	0.99+
IDC	ORGANIZATION	0.99+
Laura Dubois	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
General Electric	ORGANIZATION	0.99+
NetApp	ORGANIZATION	0.99+
Peter Burris	PERSON	0.99+
Berlin	LOCATION	0.99+
Uber	ORGANIZATION	0.99+
John Deere	ORGANIZATION	0.99+
Peter	PERSON	0.99+
11%	QUANTITY	0.99+
40 years	QUANTITY	0.99+
two times	QUANTITY	0.99+
seven years	QUANTITY	0.99+
both	QUANTITY	0.99+
1950s	DATE	0.99+
34%	QUANTITY	0.99+
800 respondents	QUANTITY	0.99+
2020	DATE	0.99+
one	QUANTITY	0.99+
NetApp Insight	ORGANIZATION	0.98+
RadioShack	ORGANIZATION	0.98+
Berlin, Germany	LOCATION	0.98+
one industry	QUANTITY	0.97+
S&P 500	ORGANIZATION	0.97+
one more question	QUANTITY	0.96+
NetApp	TITLE	0.94+
Vice President	PERSON	0.93+
theCUBE	ORGANIZATION	0.9+
single industry	QUANTITY	0.88+
2017	DATE	0.84+
phase two	QUANTITY	0.84+
20 years	QUANTITY	0.81+
one thing	QUANTITY	0.8+
Borders Books	ORGANIZATION	0.78+
years	DATE	0.75+
first place	QUANTITY	0.75+
York Times	ORGANIZATION	0.67+
one extreme	QUANTITY	0.65+
NetApp Insight	TITLE	0.64+
Insight	EVENT	0.64+
years	QUANTITY	0.62+
NetApp Insight 2017	TITLE	0.62+
Fortune 500	ORGANIZATION	0.6+
10	QUANTITY	0.6+
couple	QUANTITY	0.57+
New York Times	ORGANIZATION	0.56+
Wikibon	ORGANIZATION	0.54+
next	DATE	0.47+

Deepak Visweswaraiah, NetApp | NetApp Insight Berlin 2017

(upbeat electronic music) >> Announcer: Live, from Berlin, Germany it's theCUBE. Covering NetApp Insight 2017. Brought to you by NetApp. Welcome back to theCUBE's live coverage of NetApp Insight here in Berlin, Germany. I'm your host, Rebecca Knight, along with my co-host Peter Burris. We are joined by Deepak Visweswaraiah. He is the senior vice president for data fabric manageability at NetApp. Thanks so much for coming on the show, Deepak. Thank you. So let's talk about the data fabric, and why modern IT needs it to do what it needs to do. For acceleration. I think anyone attending the conference, I thought the keynote that happened yesterday Kenneth Corky from Economist actually talked about how data actually is growing. And then how much of that is becoming more and more important to companies. Not only just from an ability to be able to actually handle data, but how they make their decisions based on the amount of data that they have today. The fact that we have that technology, and we have the mindset to be able to actually handle that data, I think gives that unique power to customers who actually have that data. And within their capacity. So, if you look at it in terms of the amount of data growing and what companies are trying to do with that, the fact is that data is not all in one place, it's not all in one format, it's not all just sitting in some place. Right, in terms of the fact that we call it, you know, data being diverse, data being dynamic and then what have you. So, this data, for any CIO, if you talk to an IT organization and ask them in terms of do you even really know where all your data lives, they probably, you know, 80% of the time they don't know where it is all. And they do not know who is accessing what data. Do they actually really have the access or the right people accessing the right data? And then what have you. So, being able to look at all of this data in different silos that is there, to be able to have visibility across these, to be able to actually handle the diversity of that data, whether it is structured, unstructured, comes from, you know, the edges of the network, whether it is streaming, and different types of, you know, media for that matter, whether it is streaming, video, audios, what have you. With that kind of diversity in the data, and the fact that it lives in multiple places, how do you handle all of that in a seamless fashion? Having a ability to view all of that and making decisions on leveraging the value of that data. So, number one, is really to be able to handle that diversity. What you need is a data fabric that can actually see multiple end points and kind of bring that together in one way and one form with one view for a customer. That's the number one thing, if you will. The second thing is in terms of being able to take this data and do something that's valuable in terms of their decision making. How do I decide to do something with it? I think one of the examples you might have seen today for example, is that, we have 36 billion data points coming from our own customer base, that we bring back to NetApp, and help our customers to understand in the universe of the storage end points with all the data collected, we can actually tell them what may proactively tell them, what maybe going wrong what can actually they do better. And then how can they do this. This is really what that decision making capability is to be able to analyze. It's about being able to provide that data, for analytics to happen. And that analytics may happen whether it happens in the cloud, whether it happens where the data is, it shouldn't really matter, and it's our responsibility to provide or serve that data in the most optimized way to the applications that are analyzing that data. And that analysis actually helps make significant amount of decisions that the customers are actually looking to. The third is, with all of this that is underlying infrastructure that provides the capability to handle this large amount of data, not only, and also that diversity that I talked about. How do you provide that capability for our customers, to be able to go from today's infrastructure in their data center, to be able to have and handle a hybrid way of doing things in terms of their infrastructure that they use within their data center, whether they might actually have infrastructure in the cloud, and leveraging the cloud economics to be able to do what they do best, and, or have service providers and call locators, in terms of having infrastructure that may be. Ability to be able to seamlessly look all of that providing that technology to be able to modernize their data center or in the cloud seamlessly. To be able to handle that with our technology is really the primary purpose of data fabric. And then that's what I believe we provide to our customers. So, people talk about data as an asset. And folks talk about what you need to ensure the data becomes an asset. When we talk about materials we talk about inventory we talk about supply chain, which says there's a linear progression, one of the things that I find fascinating about the term fabric even though there's a technical connotation to it, is it does suggest that in fact what businesses need to do is literally weave a data tapestry that supports what the business is going to do. Because you cannot tell with any certainty it's certainly not a linear progression, but data is going to be connected in a lot of different ways >> Deepak: Yeah To achieve the goals of the business. Tell us a little bit about the processes the underlying technologies and how that informs the way businesses are starting to think about how data does connect? >> Deepak: Can you repeat the last part? How data connects, how businesses are connecting data from multiple sources? And turning it into a real tapestry for the business. Yeah, so as you said, data comes in from various different sources for that matter, in terms of we use mobile devices so much more in the modern era, you actually have data coming in from these kind of sources, or for example in terms of let's say IoT, in terms of sensors, that are all over the place in terms of how that data actually comes along. Now, let's say, in terms of if there is a customer or if there is an organization that is looking at this kind of data that is coming from multiple different sources all coming in to play the one thing is just the sheer magnitude of the data. What typically we have seen is that there is infrastructure at the edge, even if you take the example of internet of things. You try and process the data at the edge as much as you can, and bring back only what is aggregated and what is required back to you know, your data center or a cloud infrastructure or what have you. At the same time, just that data is not good enough because you have to connect that data with the internal data that you have about-- Okay, who is this data coming from and what kind of data, what is that meta-data that connects my customers to the data that is coming in? I can give you a couple of examples in terms of let's say there is an organization that provides weather data to farmers in the corners of a country that is densely populated, but you really can never get into with a data center infrastructure to those kind of remote areas. There are at the edge, where you have these sensors in terms of being able to sample the weather data. And sample also the data of the ground in itself, it terms of being able to, the ultimate goals is to be able to help the farmer in terms of when is the right time to be able to water his field. When is the right time to be able to sow the seeds. When is the right time for him to really cut the crops, when is the most optimized time. So, when this data actually comes back from each of these locations, it's all about being able to understand where this data is coming from, from the location, and being able to connect that to the weather data that is actually coming from the satellites and relating that and collating that to be able to determine and tell a farmer on his mobile device, to be able to say okay, here is the right time, and if you don't actually cut the crops in the next week, you may actually lose the window because of the weather patterns that they see and what have you. That's an example of what I could talk about as far as how do you connect that data that is coming in from various sources. And as a great example, I think, was at the keynote yesterday about a Stanford professor talking about the race track, it's really about that race track and not just about any race track that where the cars are actually making those laps, to be able to understand and predict correctly in terms of when to make that pit stop in a race. You really need the data from that particular race track because it has characteristics that have an impact on the wear and tear of the tires. For example. That's really all about being able to correlate that data. So it's having the understanding of the greater context but the specific context too. >> Deepak: Absolutely, absolutely. Great. You also talked about you talked about the technology that's necessary, but you also mentioned the right mindset. Can you unpack that a little bit for our viewers? The mindset I talked about earlier, was really more in terms of can we actually if you think some time before, we couldn't have attacked some of the problems that we can afford to today. It's really having the mindset of being able to from the data I can do things that I could never do before. We could solve, we can solve things in the nature of being able to being able to impact lives if you will. One of our customers leads a Mercy technology. Has built a out care platform, that provides that has a number of healthcare providers coming together. Where they were actually able to make a significant impact where they could actually determine 40% of the patients coming into their facilities, really were prevented from coming back into with a sepsis kind of diagnosis. Before then, they reduce that sepsis happening in 40% of the time. Which is a significant, significant impact, if you will, for the human. Just having that mindset in terms of you have all the data and you can actually change the world with that data, and you can actually find solutions to problems that you could never have before because you have the technology and you have that data. Which was never there before. So you can actually make those kinds of improvements. It's all about extracting those insights. >> Deepak: Absolutely. Thank you so much for coming on the show, Deepak. It was a pleasure having you Thank you for having me. Thank you very much. I'm Rebecca Knight, for Peter Burris, we will have more from NetApp Insight in just a little bit. (dramatic electronic music)

Published Date : Nov 14 2017

SUMMARY :

providing that technology to be able to and how that informs the way When is the right time to be able being able to impact lives if you will. coming on the show, Deepak.

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Deepak	PERSON	0.99+
Kenneth Corky	PERSON	0.99+
Deepak Visweswaraiah	PERSON	0.99+
40%	QUANTITY	0.99+
80%	QUANTITY	0.99+
yesterday	DATE	0.99+
today	DATE	0.99+
One	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
third	QUANTITY	0.99+
next week	DATE	0.99+
36 billion data points	QUANTITY	0.99+
NetApp	ORGANIZATION	0.98+
second thing	QUANTITY	0.98+
one format	QUANTITY	0.98+
each	QUANTITY	0.97+
Stanford	ORGANIZATION	0.97+
2017	DATE	0.96+
one view	QUANTITY	0.96+
one form	QUANTITY	0.96+
one	QUANTITY	0.94+
one way	QUANTITY	0.94+
Berlin	LOCATION	0.93+
Mercy	ORGANIZATION	0.92+
NetApp	TITLE	0.92+
one place	QUANTITY	0.92+
sepsis	OTHER	0.88+
theCUBE	ORGANIZATION	0.88+
NetApp Insight	ORGANIZATION	0.8+
things	QUANTITY	0.77+
one thing	QUANTITY	0.73+
one of	QUANTITY	0.71+
NetApp Insight 2017	EVENT	0.66+
couple	QUANTITY	0.6+
Economist	TITLE	0.54+
Insight	ORGANIZATION	0.51+

Manfred Buchmann & Mark Carlton | NetApp Insight Berlin 2017

>> Announcer: From Berlin, Germany, it's the Cube. Covering NetApp Insight 2017, brought to you by NetApp. Welcome back to the Cube's live coverage of NetApp Insight here in Berlin, Germany, I'm your host Rebecca Knight along with my cohost Peter Burris. We are joined by Manfred Buchanan, he is the VP systems engineering IMIA for NetApp and Mark Carlton who is an independent IT consultant. Manfred, Mark, thanks so much for coming on the show. Thank you. Thank you for having us. So Manfred, I want to start with you, you're a company veteran, you've been with NetApp for a long time, lets talk about the data management innovations that make IT modernization possible. It's a big question. That's a great question, you know, as a veteran talking about AI and the future and data management, things make it capable, but just coming off the general session, it takes something like our object store and think about, I put an object, a picture from you, I just put it into the storage and you know, it gets handed over into Amazon analytics and Amazon analytics, oh, you are smiling. And think about this without any coding and just few things to pluck it together and it works and if you take it further it works at scale so it's not only your face, it's the two thousand, four thousand, ten thousand faces here. You just put it in in parallel at scale Amazon at scale does the analytics on top and you get the results back just as a blocking in architecture, this data management at scale is this innovation. Is this the next gen data centers, all of them. But it's not magic, something allows that to happen. So what are those kind of two or three technologies that are so crucial to ensuring that that change in system actually is possible? I will put it pretty simple, the core technology we provide connect the non premise data center with the public cloud and make this whole thing seamless happen. And make it happen for all different protocols. You have it in the send space and then an ice class in the cloud, you have it on files on premise move the file over, and you have it with an object, and an object even we go further we integrate it into message pass. Maybe it's too technical but a message pass is just I got an event and I tell someone else this event coming to something and that's what we do with the picture analyzers. I got an event, which is, I get the picture, and with this event, I tell Amazon please do something with the picture and I give you the picture to analyze. So it's a fabric, there's object storage and there's AI and related technologies that allow you to do something as long as the data is ready for that to be done. Yeah and even move to data with it basically that's what we do. And if you think about it's unbelievable magic. Mark I want to ask you, you are, you're an independent IT consultant, you've been following NetApp for a long time, you have your own blog what are some of the biggest trends that you're seeing, what are some of the biggest concerns you hear from customers? Really from customers it's more around what steps to take the markets changing as we can see what we were saying there with data sprawling and it's spreading so fast, it's growing so fast. What we were storing a few years ago a few years ago when I first started someone talked about a terabyte and you thought that's a big system or you got 50 terabytes and you were huge. Now we're talking about 500 terabytes, 100 terabytes and the difference is is what sort of data that is. Is it stored in the right place? And I think that's one of the biggest challenges is knowing what data you have, how to use it and how to get the most out of the data that, and in the right place so we talked about the on prem, on process whether it be in the cloud, whether it be an object and I think that's key from where we're moving with the data fabric within NetApp and how NetApp's creating their data management suite as such for on tap, for the solufy suite and how they're joining the products up so it makes it seamless that we can move this data about from these different platforms. And I think one of the biggest things, biggest thing for me, especially when I'm talking to customers is it's the strategy of what you can do with data. It's the, it's there's no complications, as Manfred said, it's as if it's magic, it's that type of thing, it will go, you can do whatever you want with it. And I think from a customer point of view because they don't have to make that choice and say that's what I want to do today they've got scale, they've got flexibility, they can control where their data sits, they can move it back and forth and the sprawl out into AWS this year and then Google and with a cloud that size and being able to use those three different cloud platforms, even IBM cloud and how they can plug into theirs. It's, it's really starting to open those doors and really argue the point around the challenges. You've got a lot of answers to a lot of different things. So how do you help customers make sense of all of this, I mean as you said, there are a lot of options, they can go a lot of different ways, they know that they need to use their data as an asset, they need to, they need to deploy it find that value, what's your advice? You know let me just also take a step back, we talk about we get more and more data. We talk about connecting the different clouds, but at the same time we also talked about basics I move from fresh into search class memory and I make everything faster. If you think about more data, to process more data in the same time everything needs to go faster and I give you a simple example or just challenge you, how many have you sitting before a business application in your company and you sit, you press an enter button and it takes, takes a minute, takes another and you go, uh, sorry. Thinking about it. Why does it take so long? As a veteran in the old days, what we said is basically, we press the enter button and we said we need to go for a coffee and come back and after the coffee the transaction is done. Now we talked about one stage about microseconds and milliseconds and all these things but put it into relation, take a transaction I press the enter button and it would have taken let me say 10 minutes until I got a result out of it. And this was in times of when storage response times were 10 milliseconds. Take this one into response time is now one millisecond and you do the same amount of data, you press the enter button and it's not 10 minutes, it's a minute. Now you say the next generation technology we showed, it's even a thousand times faster. You go now from a minute, to a thousand of a minute, a millisecond, you know what a millisecond means for you? You press the enter button, result is there. And now you think you get more and more data petabytes of data, how can I make sure and process it as fast as possible? So that's one character you look into and I believe the future is also for AI and all these things is how fast can you process, maybe we get a measurement which called petabytes per second or petabytes per millisecond can you process to get information out of it. And then at the same time you said which solution, which choices? I believe in the current world, as it's so fast moving, all the solutions evolve at a high speed so at a certain time you just make a decision, I just go with this one and even if you go with the public cloud, you choose the public cloud, one is price but also choose it on capabilities, if you go to the IBM side, what an IBM Watson is doing in terms of AI, incredible and that's what we use for actify queue in the support side so it's not only the system, the speed of the system, where do you ploy the data, but at the same time I give you all the information, what are you doing with your data on the support side? You're connecting this and customers will choose like we do it internally the best solution and what we give them, we give them the choice, we give them reference architectures, how it works with this one, how it works with this one, we may give them some kind of guidance but to be frank and as a veteran and sometimes as the guys know me, I'm straightforward, the decision is something the customer needs to make or the partner with the customer together because you have the knowledge basically on the implementation side, need to make, I'm the best one in this one, I know how it works, I know how I can do it, but that's a choice which is more under customer together with their implementation partners. Great, well Manfred, Mark, thanks so much for coming on the Cube, this was great, great having you on. Thank you very much. I'm Rebecca Knight, for Peter Burris, we will have more from NetApp Insight just after this.

Published Date : Nov 14 2017

SUMMARY :

and I give you the picture to analyze.

ENTITIES

Entity	Category	Confidence
Mark Carlton	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Manfred	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
10 minutes	QUANTITY	0.99+
Mark	PERSON	0.99+
Peter Burris	PERSON	0.99+
NetApp Insight	ORGANIZATION	0.99+
50 terabytes	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Manfred Buchanan	PERSON	0.99+
100 terabytes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Manfred Buchmann	PERSON	0.99+
two	QUANTITY	0.99+
10 milliseconds	QUANTITY	0.99+
one millisecond	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
NetApp	ORGANIZATION	0.99+
two thousand	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
IMIA	ORGANIZATION	0.99+
four thousand	QUANTITY	0.99+
this year	DATE	0.98+
a minute	QUANTITY	0.98+
one	QUANTITY	0.98+
one character	QUANTITY	0.97+
today	DATE	0.97+
one stage	QUANTITY	0.97+
NetApp	TITLE	0.96+
petabytes	QUANTITY	0.95+
three technologies	QUANTITY	0.95+
first	QUANTITY	0.95+
NetApp Insight	ORGANIZATION	0.94+
about 500 terabytes	QUANTITY	0.92+
a millisecond	QUANTITY	0.92+
few years ago	DATE	0.91+
three different cloud platforms	QUANTITY	0.89+
ten thousand faces	QUANTITY	0.89+
2017	DATE	0.84+
Berlin	LOCATION	0.82+
a thousand of a minute	QUANTITY	0.81+
a terabyte	QUANTITY	0.78+
Amazon analytics	ORGANIZATION	0.78+
a few years ago	DATE	0.77+
IBM cloud	ORGANIZATION	0.76+
thousand times	QUANTITY	0.71+
Cube	COMMERCIAL_ITEM	0.63+
millisecond	QUANTITY	0.55+
Watson	TITLE	0.41+
Insight	COMMERCIAL_ITEM	0.3+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Berlin Germany: