Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

Mandy Chessell, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Well hello welcome to the Cube I'm James Kobielus. I'm the lead analyst for big data analytics within the Wikibon team of SiliconANGLE Media. I'm hosting the Cube this week at Dataworks Summit 2018 in Berlin, Germany. It's been an excellent event. Hortonworks, the host, had... We've completed two days of keynotes. They made an announcement of the Data Steward Studio as the latest of their offerings and demonstrated it this morning, to address GDPR compliance, which of course is hot and heavy is coming down on enterprises both in the EU and around the world including in the U.S. and the May 25th deadline is fast approaching. One of Hortonworks' prime partners is IBM. And today on this Cube segment we have Mandy Chessell. Mandy is a distinguished engineer at IBM who did an excellent keynote yesterday all about metadata and metadata management. Mandy, great to have you. >> Hi and thank you. >> So I wonder if you can just reprise or summarize the main take aways from your keynote yesterday on metadata and it's role in GDPR compliance, so forth and the broader strategies that enterprise customers have regarding managing their data in this new multi-cloud world where Hadoop and open source platforms are critically important for storing and processing data. So Mandy go ahead. >> So, metadata's not new. I mean it's basically information about data. And a lot of companies are trying to build a data catalog which is not a catalog of, you know, actually containing their data, it's a catalog that describes their data. >> James: Is it different with index or a glossary. How's the catalog different from-- >> Yeah, so catalog actually includes both. So it is a list of all the data sets plus a links to glossary definitions of what those data items mean within the data sets, plus information about the lineage of the data. It includes information about who's using it, what they're using it for, how it should be governed. >> James: It's like a governance repository. >> So governance is part of it. So the governance part is really saying, "This is how you're allowed to use it, "this is how the data's classified," "these are the automated actions that are going to happen "on the data as it's used "within the operational environment." >> James: Yeah. >> So there's that aspect to it, but there is the collaboration side. Hey I've been using this data set it's great. Or, actually this data set is full of errors, we can't use it. So you've got feedback to data set owners as well as, exchange and collaboration between data scientists working with the data. So it's really, it is a central resource for an organization that has a strong data strategy, is interested in becoming a data-driven organization as such, so, you know, this becomes their major catalog over their data assets, and how they're using it. So when a regulator comes in and says, "can you show up, show me that you're "managing personal data?" The data catalog will have the information about where personal data's located, what type of infrastructure it's sitting on, how it's being used by different services. So they can really show that they know what they're doing and then from that they can show how to processes are used in the metadata in order to use the data appropriately day to day. >> So Apache Atlas, so it's basically a catalog, if I understand correctly at least for IBM and Hortonworks, it's Hadoop, it's Apache Atlas and Apache Atlas is essentially a metadata open source code base. >> Mandy: Yes, yes. >> So explain what Atlas is in this context. >> So yes, Atlas is a collection of code, but it supports a server, a graph-based metadata server. It also supports-- >> James: A graph-based >> Both: Metadata server >> Yes >> James: I'm sorry, so explain what you mean by graph-based in this context. >> Okay, so it runs using the JanusGraph, graph repository. And this is very good for metadata 'cause if you think about what it is it's connecting dots. It's basically saying this data set means this value and needs to be classified in this way and this-- >> James: Like a semantic knowledge graph >> It is, yes actually. And on top of it we impose a type system that describes the different types of things you need to control and manage in a data catalog, but the graph, the Atlas component gives you that graph-based, sorry, graph-based repository underneath, but on top we've built what we call the open metadata and governance libraries. They run inside Atlas so when you run Atlas you will have all the open metadata interfaces, but you can also take those libraries and connect them and load them actually into another vendor's product. And what they're doing is allowing metadata to be exchanged between repositories of different types. And this becomes incredibly important as an organization increases their maturity and their use of data because you can't just have knowledge about data in a single server, it just doesn't scale. You need to get that knowledge into every runtime environment, into the data tools that people are using across the organization. And so it needs to be distributed. >> Mandy I'm wondering, the whole notion of what you catalog in that repository, does it include, or does Apache Atlas support adding metadata relevant to data derivative assets like machine learning models-- >> Mandy: Absolutely. >> So forth. >> Mandy: Absolutely, so we have base types in the upper metadata layer, but also it's a very flexible and sensible type system. So, if you've got a specialist machine learning model that needs additional information stored about it, that can easily be added to the runtime environment. And then it will be managed through the open metadata protocols as if it was part of the native type system. >> Because of the courses in analysts, one of my core areas is artificial intelligence and one of the hot themes in artificial, well there's a broad umbrella called AI safety. >> Mandy: Yeah. >> And one of the core subsets of that is something called explicable AI, being able to identify the lineage of a given algorithmic decision back to what machine learning models fed from what data. >> Mandy: Yeah. >> Throw what action like when let's say a self-driving vehicle hits a human being for legal, you know, discovery whatever. So what I'm getting at, what I'm working through to is the extent to which the Hortonworks, IBM big data catalog running Atlas can be a foundation for explicable AI either now or in the future. We see a lot of enterprise, me as an analyst at least, sees lots of enterprises that are exploring this topic, but it's not to the point where it's in production, explicable AI, but where clearly companies like IBM are exploring building a stack or a architecture for doing this kind of thing in a standardized way. What are your thoughts there? Is IBM working on bringing, say Atlas and the overall big data catalog into that kind of a use case. >> Yes, yeah, so if you think about what's required, you need to understand the data that was used to train the AI how, what data's been fed to it since it was deployed because that's going to change its behavior, and then also a view of how that data's going to change in the future so you can start to anticipate issues that might arising from the model's changing behavior. And this is where the data catalog can actually associate and maintain information about the data that's being used with the algorithm. You can also associate the checking mechanism that's constantly monitoring the profile of the data so you can see where the data is changing over time, that will obviously affect the behavior of the machine learning model. So it's really about providing, not just information about the model itself, but also the data that's feeding it, how those characteristics are changing over time so that you know the model is continuing to work into the future. >> So tell us about the IBM, Hortonworks partnership on metadata and so forth. >> Mandy: Okay. >> How is that evolving? So, you know, your partnership is fairly tight. You clearly, you've got ODPI, you've got the work that you're doing related to the big data catalog. What can we expect to see in the near future in terms of, initiatives building on all of that for governance of big data in the multi-cloud environment? >> Yeah so Hortonworks started the Apache Atlas project a couple of years ago with a number of their customers. And they built a base repository and a set of APIs that allow it to work in the Hadoop environment. We came along last year, formed our partnership. That partnership includes this open metadata and governance layer. So since then we worked with ING as well and ING bring the, sort of, user perspective, this is the organization's use of the data. And, so between the three of us we are basically transforming Apache Atlas from an Hadoop focused metadata repository to an enterprise focused metadata repository. Plus enabling other vendors to connect into the open metadata ecosystem. So we're standardizing types, standardizing format, the format of metadata, there's a protocol for exchanging metadata between repositories. And this is all coming from that three-way partnership where you've got a consuming organization, you've got a company who's used to building enterprise middleware, and you've got Hortonworks with their knowledge of open source development in their Hadoop environment. >> Quick out of left field, as you develop this architecture, clearly you're leveraging Hadoop HTFS for storage. Are you looking to at least evaluating maybe using block chain for more distributed management of the metadata in these heterogeneous environments in the multi-cloud, or not? >> So Atlas itself does run on HTFS, but doesn't need to run on HTFS, it's got other storage environments so that we can run it outside of Hadoop. When it comes to block chain, so block chain is, for, sharing data between partners, small amounts of data that basically express agreements, so it's like a ledger. There are some aspects that we could use for metadata management. It's more that we actually need to put metadata management into block chain. So the agreements and contracts that are stored in block chain are only meaningful if we understand the data that's there, what it's quality, where it came from what it means. And so actually there's a very interesting distributor metadata question that comes with the block chain technology. And I think that's an important area of research. >> Well Mandy we're at the end of our time. Thank you very much. We could go on and on. You're a true expert and it's great to have you on the Cube. >> Thank you for inviting me. >> So this is James Kobielus with Mandy Chessell of IBM. We are here this week in Berlin at Dataworks Summit 2018. It's a great event and we have some more interviews coming up so thank you very much for tuning in. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube I'm hosting the Cube this week at Dataworks Summit 2018 and the broader strategies that enterprise customers which is not a catalog of, you know, actually containing How's the catalog different from-- So it is a list of all the data sets plus a links "these are the automated actions that are going to happen in the metadata in order to use So Apache Atlas, so it's basically a catalog, So yes, Atlas is a collection of code, James: I'm sorry, so explain what you mean and needs to be classified in this way that describes the different types of things you need in the upper metadata layer, but also it's a very flexible and one of the hot themes in artificial, And one of the core subsets of that the extent to which the Hortonworks, IBM big data catalog in the future so you can start to anticipate issues So tell us about the IBM, Hortonworks partnership for governance of big data in the multi-cloud environment? And, so between the three of us we are basically of the metadata in these heterogeneous environments So the agreements and contracts that are stored You're a true expert and it's great to have you on the Cube. So this is James Kobielus with Mandy Chessell of IBM.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Mandy Chessell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
ING	ORGANIZATION	0.99+
James	PERSON	0.99+
three	QUANTITY	0.99+
Berlin	LOCATION	0.99+
Mandy	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
May 25th	DATE	0.99+
last year	DATE	0.99+
U.S.	LOCATION	0.99+
two days	QUANTITY	0.99+
Atlas	TITLE	0.99+
yesterday	DATE	0.99+
Berlin, Germany	LOCATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Both	QUANTITY	0.98+
EU	LOCATION	0.98+
GDPR	TITLE	0.98+
One	QUANTITY	0.98+
one	QUANTITY	0.98+
Dataworks Summit 2018	EVENT	0.97+
Dataworks Summit EU 2018	EVENT	0.96+
this week	DATE	0.94+
single server	QUANTITY	0.94+
Hadoop	TITLE	0.94+
today	DATE	0.93+
this morning	DATE	0.93+
three-way partnership	QUANTITY	0.93+
Wikibon	ORGANIZATION	0.91+
Hortonworks'	ORGANIZATION	0.9+
Atlas	ORGANIZATION	0.89+
Dataworks Summit Europe 2018	EVENT	0.89+
couple of years ago	DATE	0.87+
Apache Atlas	TITLE	0.86+
Cube	COMMERCIAL_ITEM	0.83+
Apache	ORGANIZATION	0.82+
JanusGraph	TITLE	0.79+
hot themes	QUANTITY	0.68+
Hado	ORGANIZATION	0.67+
Hadoop HTFS	TITLE	0.63+

Dave McDonnell, IBM | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE (relaxing music) covering DataWorks Summit Europe 2018. (relaxing music) Brought to you by Hortonworks. (quieting music) >> Well, hello and welcome to theCUBE. We're here at DataWorks Summit 2018 in Berlin, Germany, and it's been a great show. Who we have now is we have IBM. Specifically we have Dave McDonnell of IBM, and we're going to be talkin' with him for the next 10 minutes or so about... Dave, you explain. You are in storage for IBM, and IBM of course is a partner of Hortonworks who are of course the host of this show. So Dave, have you been introduced, give us your capacity or roll at IBM. Discuss the partnership of Hortonworks, and really what's your perspective on the market for storage systems for Big Data right now and going forward? And what kind of work loads and what kind of requirements are customers coming to you with for storage systems now? >> Okay, sure, so I lead alliances for the storage business unit, and Hortonworks, we actually partner with Hortonworks not just in our storage business unit but also with our analytics counterparts, our power counterparts, and we're in discussions with many others, right? Our partner organization services and so forth. So the nature of our relationship is quite broad compared to many of our others. We're working with them in the analytics space, so these are a lot of these Big Data Data Lakes, BDDNA a lot of people will use as an acronym. These are the types of work loads that customers are using us both for. >> Mm-hmm. >> And it's not new anymore, you know, by now they're well past their first half dozen applications. We've got customers running hundreds of applications. These are production applications now, so it's all about, "How can I be more efficient? "How can I grow this? "How can I get the best performance and scalability "and ease of management to deploy these "in a way that's manageable?" 'cause if I have 400 production applications, that's not off in any corner anymore. So that's how I'd describe it in a nutshell. >> One of the trends that we're seeing at Wikibon, of course I'm the lead analyst for Big Data Analytics at Wikibon under SiliconANGLE Media, we're seeing a trend in the marketplace towards I wouldn't call them appliances, but what I would call them is workload optimized hardware software platforms so they can combine storage with compute and are optimized for AI and machine learning and so forth. Is that something that you're hearing from customers, that they require those built-out, AI optimized storage systems, or is that far in the future or? Give me a sense for whether IBM is doing anything in that area and whether that's on your horizon. >> If you were to define all of IBM in five words or less, you would say "artificial intelligence and cloud computing," so this is something' >> Yeah. that gets a lot of thought in Mindshare. So absolutely we hear about it a lot. It's a very broad market with a lot of diverse requirements. So we hear people asking for the Converged infrastructure, for Appliance solutions. There's of course Hyper Converged. We actually have, either directly or with partners, answers to all of those. Now we do think one of the things that customers want to do is they're going to scale and grow in these environments is to take a software-defined strategy so they're not limited, they're not limited by hardware blocks. You know, they don't want to have to buy processing power and spend all that money on it when really all they need is more data. >> Yeah. >> There's pros and cons to the different (mumbles). >> You have power AI systems, I know that, so that's where they're probably heading, yeah. >> Yes, yes, yes. So of course, we have packages that we've modeled in AI. They feed off of some of the Hortonworks data lakes that we're building. Of course we see a lot of people putting these on new pieces of infrastructure because they don't want to put this on their production applications, so they're extracting data from maybe a Hortonworks data lake number one, Hortonworks data lake number two, some of the EDWs, some external data, and putting that into the AI infrastructure. >> As customers move their cloud infrastructures towards more edge facing environments, or edge applications, how are storage requirements change or evolving in terms of in the move to edge computing. Can you give us a sense for any sort of trends you're seeing in that area? >> Well, if we're going to the world of AI and cognitive applications, all that data that I mighta thrown in the cloud five years ago I now, I'm educated enough 'cause I've been paying bills for a few years on just how expensive it is, and if I'm going to be bringing that data back, some of which I don't even know I'm going to be bringing back, it gets extremely expensive. So we see a pendulum shift coming back where now a lot of data is going to be on host, ah sorry, on premise, but it's not going to stay there. They need the flexibility to move it here, there, or everywhere. So if it's going to come back, how can we bring customers some of that flexibility that they liked about the cloud, the speed, the ease of deployment, even a consumption based model? These are very big changes on a traditional storage manufacturer like ourselves, right? So that's requiring a lot of development in software, it's requiring a lot of development in our business model, and one of the biggest thing you hear us talk about this year is IBM Cloud Private, which does exactly that, >> Right. and it gives them somethin' they can work with that's flexible, it's agile, and allows you to take containerized based applications and move them back and forth as you please. >> Yeah. So containerized applications. So if you can define it for our audience, what is a containerized application? You talk about Docker and orchestrate it through Kubernetes and so forth. So you mentioned Cloud Private. Can you bring us up to speed on what exactly Cloud Private is and in terms of the storage requirements or storage architecture within that portfolio? >> Oh yes, absolutely. So this is a set of infrastructure that's optimized for on-premise deployment that gives you multi-cloud access, not just IBM Cloud, Amazon Web Services, Microsoft Azure, et cetera, and then it also gives you multiple architectural choices basically wrapped by software to allow you to move those containers around and put them where you want them at the right time at the right place given the business requirement at that hour. >> Now is the data storager persisted in the container itself? I know that's fairly difficult to do in a Docker environment. How do ya handle persistence of data for containerized applications within your architecture? >> Okay, some of those are going to be application specific. It's the question of designing the right data management layer depending on the application. So we have software intelligence, some of it from open source, some of which we add on top of open source to bring some of the enterprise resilience and performance needed. And of course, you have to be very careful if the biggest trend in the world is unstructured data. Well, okay fine, it's a lot of sensor data. That's still fairly easy to move around. But once we get into things like medical images, lots of video, you know, HD video, 4K video, those are the things which you have to give a lot of thought to how to do that. And that's why we have lots of new partners that we work with the help us with edge cloud, which gives that on premise-like performance in really a cloud-like set up. >> Here's a question out of left field, and you may not have the answer, but I would like to hear your thoughts on this. How has Blockchain, and IBM's been making significant investments in blockchain technology database technology, how is blockchain changing the face of the storage industry in terms of customers' requirements for a storage systems to manage data in distributed blockchains? Is that something you're hearing coming from customers as a requirement? I'm just tryin' to get a sense for whether that's, you know, is it moving customers towards more flash, towards more distributed edge-oriented or edge deployed storage systems? >> Okay, so yes, yes, and yes. >> Okay. So all of a sudden, if you're doing things like a blockchain application, things become even more important than they are today. >> Yeah. >> Okay, so you can't lose a transaction. You can't have a storage going down. So there's a lot more care and thought into the resiliency of the infrastructure. If I'm, you know, buying a diamond from you, I can't accept the excuse that my $100,000 diamond, maybe that's a little optimistic, my $10,000 diamond or yours, you know, the transaction's corrupted because the data's not proper. >> Right. >> Or if I want my privacy, I need to be assured that there's good data governance around that transaction, and that that will be protected for a good 10, 20, and 30 years. So it's elevating the importance of all the infrastructure to a whole different level. >> Switching our focus slightly, so we're here at DataWorks Summit in Berlin. Where are the largest growth markets right now for cloud storage systems? Is it Apache, is it the North America, or where are the growth markets in terms of regions, in terms of vertical industries right now in the marketplace for enterprise grade storage systems for big data in the cloud? >> That's a great question, 'cause we certainly have these conversations globally. I'd say the place where we're seeing the most activity would be the Americas, we see it in China. We have a lot of interesting engagements and people reaching out to us. I would say by market, you can also point to financial services in more than those two regions. Financial services, healthcare, retail, these are probably the top verticals. I think it's probably safe to assume, and we can the federal governments also have a lot of stringent requirements and, you know, requirements, new applications around the space as well. >> Right. GDPR, how is that impacting your customers' storage requirements. The requirement for GDPR compliance, is that moving the needle in terms of their requirement for consolidated storage of the data that they need to maintain? I mean obviously there's a security, but there's just the sheer amount of, there's a leading to consolidation or centralization of storage, of customer data, that would seem to make it easier to control and monitor usage of the data. Is it making a difference at all? >> It's making a big difference. Not many people encrypt data today, so there's a whole new level of interest in encryption at many different levels, data at rest, data in motion. There's new levels of focus and attention on performance, on the ability for customers to get their arms around disparate islands of data, because now GDPR is not only a legal requirement that requires you to be able to have it, but you've also got timelines which you're expected to act on a request from a customer to have your data removed. And most of those will have a baseline of 30 days. So you can't fool around now. It's not just a nice to have. It's an actual core part of a business requirement that if you don't have a good strategy for, you could be spending tens of millions of dollars in liability if you're not ready for it. >> Well Dave, thank you very much. We're at the end of our time. This has been Dave McDonnell of IBM talking about system storage and of course a big Hortonworks partner. We are here on day two of the DataWorks Summit, and I'm James Kobielus of Wikibon SiliconANGLE Media, and have a good day. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. are customers coming to you with for storage systems now? So the nature of our relationship is quite broad "and ease of management to deploy these One of the trends that we're seeing at Wikibon, and spend all that money on it to the different (mumbles). so that's where they're probably heading, yeah. and putting that into the AI infrastructure. in terms of in the move to edge computing. and one of the biggest thing you hear us and allows you to take containerized based applications and in terms of the storage requirements and put them where you want them at the right time in the container itself? And of course, you have to be very careful and you may not have the answer, and yes. So all of a sudden, Okay, so you can't So it's elevating the importance of all the infrastructure for big data in the cloud? and people reaching out to us. is that moving the needle in terms of their requirement on the ability for customers to get their arms around and of course a big Hortonworks partner.

ENTITIES

Entity	Category	Confidence
Nicola	PERSON	0.99+
Michael	PERSON	0.99+
David	PERSON	0.99+
Josh	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Jeremy Burton	PERSON	0.99+
Paul Gillon	PERSON	0.99+
GM	ORGANIZATION	0.99+
Bob Stefanski	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Dave McDonnell	PERSON	0.99+
amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
James Kobielus	PERSON	0.99+
Keith	PERSON	0.99+
Paul O'Farrell	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Keith Townsend	PERSON	0.99+
BMW	ORGANIZATION	0.99+
Ford	ORGANIZATION	0.99+
David Siegel	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Sandy	PERSON	0.99+
Nicola Acutt	PERSON	0.99+
Paul	PERSON	0.99+
David Lantz	PERSON	0.99+
Stu Miniman	PERSON	0.99+
three	QUANTITY	0.99+
Lisa	PERSON	0.99+
Lithuania	LOCATION	0.99+
Michigan	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
General Motors	ORGANIZATION	0.99+
Apple	ORGANIZATION	0.99+
America	LOCATION	0.99+
Charlie	PERSON	0.99+
Europe	LOCATION	0.99+
Pat Gelsing	PERSON	0.99+
Google	ORGANIZATION	0.99+
Bobby	PERSON	0.99+
London	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Dante	PERSON	0.99+
Switzerland	LOCATION	0.99+
six-week	QUANTITY	0.99+
VMware	ORGANIZATION	0.99+
Seattle	LOCATION	0.99+
Bob	PERSON	0.99+
Amazon Web Services	ORGANIZATION	0.99+
100	QUANTITY	0.99+
Michael Dell	PERSON	0.99+
John Walls	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
California	LOCATION	0.99+
Sandy Carter	PERSON	0.99+

John Kreisa, Hortonworks | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Dataworks Summit Europe 2018. Brought to you by Hortonworks. >> Hello, welcome to theCUBE. We're here at Dataworks Summit 2018 in Berlin, Germany. I'm James Kobielus. I'm the lead analyst for Big Data Analytics, within the Wikibon team of SiliconAngle Media. Our guest is John Kreisa. He's the VP for Marketing at Hortonworks, of course, the host company of Dataworks Summit. John, it's great to have you. >> Thank you Jim, it's great to be here. >> We go long back, so you know it's always great to reconnect with you guys at Hortonworks. You guys are on a roll, it's been seven years I think since you guys were founded. I remember the founding of Hortonworks. I remember when it splashed in the Wall Street Journal. It was like oh wow, this big data thing, this Hadoop thing is actually, it's a market, it's a segment and you guys have built it. You know, you and your competitors, your partners, your ecosystem continues to grow. You guys went IPO a few years ago. Your latest numbers are pretty good. You're continuing to grow in revenues, in customer acquisitions, your deal sizes are growing. So Hortonworks remains on a roll. So, I'd like you to talk right now, John, and give us a sense of where Hortonworks is at in terms of engaging with the marketplace, in terms of trends that you're seeing, in terms of how you're addressing them. But talk about first of all the Dataworks Summit. How many attendees do you have from how many countries? Just give us sort of the layout of this show. >> I don't have all of the final counts yet. >> This is year six of the show? >> This is year six in Europe, absolutely, thank you. So it's great, we've moved it around different locations. Great venue, great host city here in Berlin. Super excited about it, I know we have representatives from more than 51 countries. If you think about that, drawing from a really broad set of countries, well beyond, as you know, because you've interviewed some of the folks beyond just Europe. We've had them from South America, U.S., Africa, and Asia as well, so really a broad swath of the open-source and big data community, which is great. The final attendance is going to be 1,250 to 1,300 range. The final numbers, but a great sized conference. The energy level's been really great, the sessions have been, you know, oversubscribed, standing room only in many of the popular sessions. So the community's strong, I think that's the thing that we really see here and that we're really continuing to invest in. It's something that Hortonworks was founded around. You referenced the founding, and driving the community forward and investing is something that has been part of our mantra since we started and it remains that way today. >> Right. So first of all what is Hortonworks? Now how does Hortonworks position itself? Clearly Hadoop is your foundation, but you, just like Cloudera, MapR, you guys have all continued to evolve to address a broader range of use-cases with a deeper stack of technology with fairly extensive partner ecosystems. So what kind of a beast is Hortonworks? It's an elephant, but what kind of an elephant is it? >> We're an elephant or riding on the elephant I'd say, so we're a global data management company. That's what we're helping organizations do. Really the end-to-end lifecycle of their data, helping them manage it regardless of where it is, whether it's on-premise or in the cloud, really through hybrid data architectures. That's really how we've seen the market evolve is, we started off in terms of our strategy with the platform based on Hadoop, as you said, to store, process, and analyze data at scale. The kind of fundamental use-case for Hadoop. Then as the company emerged, as the market kind of continued to evolve, we moved to and saw the opportunity really, capturing data from the edge. As IOT and kind of edge-use cases emerged it made sense for us to add to the platform and create the Hortonworks DataFlow. >> James: Apache NiFi >> Apache NiFi, exactly, HDF underneath, with associated additional open-source projects in there. Kafka and some streaming and things like that. So that was now move data, capture data in motion, move it back and put it into the platform for those large data applications that organizations are building on the core platform. It's also the next evolution, seeing great attach rates with that, the really strong interest in the Apache NiFi, you know, the meetup here for NiFi was oversubscribed, so really really strong interest in that. And then, the markets continued to evolve with cloud and cloud architectures, customers wanting to deploy in the cloud. You know, you saw we had that poll yesterday in the general session about cloud with really interesting results, but we saw that there was really companies wanting to deploy in a hybrid way. Some of them wanted to move specific workloads to the cloud. >> Multi-cloud, public, private. >> Exactly right, and multi-data center. >> The majority of your customer deployments are on prem. >> They are. >> Rob Bearden, your CEO, I think he said in a recent article on SiliconAngle that two-thirds of your deployments are on prem. Is that percentage going down over time? Are more of your customers shifting toward a public cloud orientation? Does Hortonworks worry about that? You've got partnerships, clearly, with the likes of IBM, AWS, and Microsoft Dasher and so forth, so do you guys see that as an opportunity, as a worrisome trend? >> No, we see it very much as an opportunity. And that's because we do have customers who are wanting to put more workloads and run things in the cloud, however, there's still almost always a component that's going to be on premise. And that creates a challenge for organizations. How do they manage the security and governance and really the overall operations of those deployments as they're in the cloud and on premise. And, to your point, multi-cloud. And so you get some complexity in there around that deployment and particularly with the regulations, we talked about GDPR earlier today. >> Oh, by the way, the Data Steward Studio demo today was really, really good. It showed that, first of all, you cover the entire range of core requirements for compliance. So that was actually the primary announcement at this show; Scott Gnau announced that. You demoed it today, I think you guys are off on a good start, yeah. We've gotten really, and thank you for that, we've gotten really good feedback on our DataPlane Services strategy, right, it provides that single pane of glass. >> I should say to our viewers that Data Steward Studio is the second of the services under the DataPlane, the Hortonworks DataPlane Services Portfolio. >> That's right, that's exactly right. >> Go ahead, keep going. >> So, you know, we see that as an opportunity. We think we're very strongly positioned in the market, being the first to bring that kind of solution to the customers and our large customers that we've been talking about and who have been starting to use DataPlane have been very, very positive. I mean they see it as something that is going to help them really kind of maintain control over these deployments as they start to spread around, as they grow their uses of the thing. >> And it's built to operate across the multi-cloud, I know this as well in terms of executing the consent or withdrawal of consent that the data subject makes through what is essentially a consent portal. >> That's right, that's right. >> That was actually a very compelling demonstration in that regard. >> It was good, and they worked very hard on it. And I was speaking to an analyst yesterday, and they were saying that they're seeing an increasing number of the customers, enterprises, wanting to have a multi-cloud strategy. They don't want to get locked into any one public cloud vendor, so, what they want is somebody who can help them maintain that common security and governance across their different deployments, and they see DataPlane Services is the way that's going to help them do that. >> So John, how is Hortonworks, what's your road map, how do you see the company in your go to market evolving over the coming years in terms of geographies, in terms of your focuses? Focus, in terms of the use-cases and workloads that the Hortonworks portfolio addresses. How is that shifting? You mentioned the Edge. AI, machine learning, deep learning. You are a reseller of IBM Data Science Experience. >> DSX, that's right. >> So, let's just focus on that. Do you see more customers turning to Hortonworks and IBM for a complete end-to-end pipeline for the ingest, for the preparation, modeling, training and so forth? And deployment of operationalized AI? Is that something you see going forward as an evolution path for your capabilities? >> I'd say yes, long-term, or even in the short-term. So, they have to get their data house in order, if you will, before they get to some of those other things, so we're still, Hortonworks strategy has always been focused on the platform aspect, right? The data-at-rest platform, data-in-motion platform, and now a platform for managing common security and governance across those different deployments. Building on that is the data science, machine learning, and AI opportunity, but our strategy there, as opposed to trying to trying to do it ourselves, is to partner, so we've got the strong partnership with IBM, resell their DSX product. And also other partnerships around to deliver those other capabilities, like machine learning and AI, from our partner ecosystem, which you referenced. We have over 2,300 partners, so a very, very strong ecosystem. And so, we're going to stick to our strategy of the platforms enabling that, which will subsequently enable data science, machine learning, and AI on top. And then, if you want me to talk about our strategy in terms of growth, so we already operate globally. We've got offices in I think 19 different countries. So we're really covering the globe in terms of the demand for Hortonworks products and beginning implements. >> Where's the fastest growing market in terms of regions for Hortonworks? >> Yeah, I mean, international generally is our fastest growing region, faster than the U.S. But we're seeing very strong growth in APAC, actually, so India, Asian countries, Singapore, and then up and through to Japan. There's a lot of growth out in the Asian region. And, you know, they're sort of moving directly to digital transformation projects at really large scale. Big banks, telcos, from a workload standpoint I'd say the patterns are very similar to what we've seen. I've been at Hortonworks for six and a half years, as it turns out, and the patterns we saw initially in terms of adoption in the U.S. became the patterns we saw in terms of adoption in Europe and now those patterns of adoption are the same in Asia. So, once a company realizes they need to either drive out operational costs or build new data applications, the patterns tend to be the same whether it's retail, financial services, telco, manufacturing. You can sort of replicate those as they move forward. >> So going forward, how is Hortonworks evolving as a company in terms of, for example with GDPR, Data Steward, data governance as a strong focus going forward, are you shifting your model in terms of your target customer away from the data engineers, the Hadoop cluster managers who are still very much the center of it, towards more data governance, towards more business analyst level of focus. Do you see Hortonworks shifting in that direction in terms of your focus, go to market, your message and everything? >> I would say it's not a shifting as much as an expansion, so we definitely are continuing to invest in the core platform, in Hadoop, and you would have heard of some of the changes that are coming in the core Hadoop 3.0 and 3.1 platform here. Alan and others can talk about those details, and in Apache NiFi. But, to your point, as we bring and have brought Data Steward Studio and DataPlane Services online, that allows us to address a different user within the organization, so it's really an expansion. We're not de-investing in any other things. It's really here's another way in a natural evolution of the way that we're helping organizations solve data problems. >> That's great, well thank you. This has been John Kreisa, he's the VP for marketing at Hortonworks. I'm James Kobielus of Wikibon SiliconAngle Media here at Dataworks Summit 2018 in Berlin. And it's been great, John, and thank you very much for coming on theCUBE. >> Great, thanks for your time. (techno music)

Published Date : Apr 19 2018

SUMMARY :

Brought to you by Hortonworks. of course, the host company of Dataworks Summit. to reconnect with you guys at Hortonworks. the sessions have been, you know, oversubscribed, you guys have all continued to evolve to address the platform based on Hadoop, as you said, in the Apache NiFi, you know, the meetup here so do you guys see that as an opportunity, and really the overall operations of those Oh, by the way, the Data Steward Studio demo today is the second of the services under the DataPlane, being the first to bring that kind of solution that the data subject makes through in that regard. an increasing number of the customers, Focus, in terms of the use-cases and workloads for the preparation, modeling, training and so forth? Building on that is the data science, machine learning, in terms of adoption in the U.S. the data engineers, the Hadoop cluster managers in the core platform, in Hadoop, and you would have This has been John Kreisa, he's the Great, thanks for your time.

ENTITIES

Entity	Category	Confidence
Alan	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
Rob Bearden	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Kreisa	PERSON	0.99+
Europe	LOCATION	0.99+
John	PERSON	0.99+
Asia	LOCATION	0.99+
AWS	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
yesterday	DATE	0.99+
Africa	LOCATION	0.99+
South America	LOCATION	0.99+
SiliconAngle Media	ORGANIZATION	0.99+
U.S.	LOCATION	0.99+
1,250	QUANTITY	0.99+
Scott Gnau	PERSON	0.99+
1,300	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
seven years	QUANTITY	0.99+
six and a half years	QUANTITY	0.99+
Japan	LOCATION	0.99+
Hadoop	TITLE	0.99+
Asian	LOCATION	0.99+
second	QUANTITY	0.98+
over 2,300 partners	QUANTITY	0.98+
today	DATE	0.98+
two-thirds	QUANTITY	0.98+
19 different countries	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
more than 51 countries	QUANTITY	0.98+
Hadoop 3.0	TITLE	0.98+
first	QUANTITY	0.98+
James	PERSON	0.98+
Data Steward Studio	ORGANIZATION	0.98+
Dataworks Summit EU 2018	EVENT	0.98+
Dataworks Summit 2018	EVENT	0.97+
Cloudera	ORGANIZATION	0.97+
MapR	ORGANIZATION	0.96+
GDPR	TITLE	0.96+
DataPlane Services	ORGANIZATION	0.96+
Singapore	LOCATION	0.96+
year six	QUANTITY	0.95+
2018	EVENT	0.95+
Wikibon SiliconAngle Media	ORGANIZATION	0.94+
India	LOCATION	0.94+
Hadoop	ORGANIZATION	0.94+
APAC	ORGANIZATION	0.93+
Big Data Analytics	ORGANIZATION	0.93+
3.1	TITLE	0.93+
Wall Street Journal	TITLE	0.93+
one	QUANTITY	0.93+
Apache	ORGANIZATION	0.92+
Wikibon	ORGANIZATION	0.92+
NiFi	TITLE	0.92+

Pankaj Sodhi, Accenture | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's theCUBE. Covering Data Works Summit, Europe 2018. Brought to you by, Horton Works. >> Well hello, welcome to theCUBE. I am James Kobielus. I'm the lead analyst within the Wikbon Team at Silicon Angled Media, focused on big data analytics. And big data analytics is what Data Works Summit is all about. We are at Data Works Summit 2018 in Berlin, Germany. We are on day two, and I have, as my special guest here, Pankaj Sodhi, who is the big data practice lead with Accenture. He's based in London, and he's here to discuss really what he's seeing in terms of what his clients are doing with Big DSO. Hello, welcome Pankaj, how's it going? >> Thank you Jim, very pleased to be there. >> Great, great, so what are you seeing in terms of customers adoption of the dupe and so forth, big data platforms, for what kind of use cases are you seeing? GDPR is coming down very quickly, and we saw this poll this morning that John Chrysler, of Horton Works, did from the stage, and it's a little bit worrisome if you're an enterprise data administrator. Really, in enterprise period, because it sounds like not everybody in this audience, in fact a sizeable portion, is not entirely ready to comply with GDRP on day one, which is May 25th. What are you seeing, in terms of customer readiness, for this new regulation? >> So Jim, I'll answer the question in two ways. One was, just in terms of, you know, the adoption of Hadoop, and then, you know, get into GDPR. So in regards to Hadoop adoption, I think I would place clients in three different categories. The first ones are the ones that have been quite successful in terms of adoption of Hadoop. And what they've done there is taken a very use case driven approach to actually build up the capabilities to deploy these use cases. And they've taken an additive approach. Deployed hybrid architectures, and then taken the time. >> Jim: Hybrid public, private cloud? >> Cloud as well, but often sort of, on premise. Hybrid being, for example, with an EDW and product type AA. In that scenario, they've taken the time to actually work out some of the technical complexities and nuances of deploying these pipelines in production. Consequently, what they're in a good position to do now, is to leverage the best of Cloud computing, open so its technology, while it's looking at making the best getting the investment protection that they have from the premise deployments as well. So they're in a fairly good position. Another set of customers have done successful pilots looking at either optimization use cases. >> Jim: How so, Hadoob? >> Yes, leveraging Hadoob. Either again from a cost optimization play or potentially a Bon Sand escape abilities. And there in the process of going to production, and starting to work out, from a footprint perspective, what elements of the future pipelines are going to be on prim, potentially with Hadoop, or on cloud with Hadoop. >> When you say the pipeline in this context, what are you referring to? When I think of pipeline, in fact in our coverage of pipeline, it refers to an end to end life cycle for development and deployment and management of big data. >> Pankaj: Absolutely >> And analytics, so that's what you're saying. >> So all the way from ingestion to curation to consuming the data, through multiple different access spots, so that's the full pipeline. And I think what the organizations that have been successful have done is not just looked at the technology aspect, which is just Hadoop in this case, but looked at a mix of architecture, delivery approaches, governance, and skills. So I'd like to bring this to life by looking at advanced analytics as a use case. So rather than take the approach of lets ingest all data in a data lake, it's been driven by a use case mapped to a set of valuable data sets that can be ingested. But what's interesting then is the delivery approach has been to bring together diverse skill sets. For example, date engineers, data scientists, data ops and visualization folks, and then use them to actually challenge architecture and delivery approach. I think this is where, the key ingredient for success, which is, for me, the modern sort of Hadoob's pipeline, need to be iteratively built and deployed, rather than linear and monolithic. So this notion of, I have raw data, let me come up a minimally curated data set. And then look at how I can do future engineering and build an analytical model. If that works, and I need to enhance, get additional data attributes, I then enhance the pipeline. So this is already starting to challenge organizations architecture approaches, and how you also deploy into production. And I think that's been one of the key differences between organizations that have embarked on the journey, ingested the data, but not had a path to production. So I think that's one aspect. >> How are the data stewards of the world, or are they challenging the architecture, now that GDPR is coming down fast and furious, we're seeing, for example Horton Works architecture for data studio, are you seeing did the data govern as the data stewards of the world coming, sitting around the virtual table, challenging this architecture further to evolve? >> I think. >> To enable privacy by default and so forth? >> I think again, you know the organizations that have been successful have already been looking at privacy by design before GDPR came along. Now one of the reasons a lot of the data link implementation haven't been as successful, is the business haven't had the ability to actually curate the data sets, work out what the definitions are, what the curation levels are. So therefore, what we see with business glossaries, and sort of data architectures, from a GDPR perspective, we see this as an opportunity rather than a threat. So to actually make the data usable in the data lakes, we often talk to clients about this concept of the data marketplace. So in the data marketplace, what you need to have, is well curated data sets. The proper definition such will, for business glossary or a data catalog, underpin by the right user access model, and available for example through a search or API's. So, GDPR actually is. >> There's not a public market place, this is an architectural concept. >> Yes. >> It could be inside, completely inside, the private data center, but it's reusable data, it's both through API, and standard glossaries and meta data and so forth, is that correct? >> Correct, so data marketplace is reusable, both internally, for example, to unlock access to data scientists who might want to use the data set and then put that into a data lab. It can also be extended, from an APR perspective, for a third party data market place for exchanging data with consumers or third parties as organizations look at data monetization as well. And therefore, I think the role of data stewards is changing around a bit. Rather than looking at it from a compliance perspective, it's about how can we make data usable to the analysts and the data scientists. So actually focusing on getting the right definitions upfront, and as we curate and publish data, and as we enrich it, what's the next definition that comes of that? And actually have that available before we publish the data. >> That's a fascinating concept. So, the notion of a data steward or a data curator. It's sort of sounds like you're blending them. Where the data curator, their job, part of it, very much of it, involves identifying the relevance of data and the potential reusability and attractiveness of that data for various downstream uses and possibly being a player in the ongoing identification of the monetize-ability of data elements, both internally and externally in the (mumbles). Am I describing correctly? >> Pankaj: I think you are, yes. >> Jim: Okay. >> I think it's an interesting implication for the CDO function, because, rather than see the function being looked at as a policy. >> Jim: The chief data officer. >> Yes, chief data officer functions. So rather than imposition of policies and standards, it's about actually trying to unlock business values. So rather than look at it from a compliance perspective, which is very important, but actually flip it around and look at it from a business value perspective. >> Jim: Hmm. >> So for example, if you're able to tag and classify data, and then apply the right kind of protection against it, it actually helps the data scientists to use that data for their models. While that's actually following GDPR guidelines. So it's a win-win from that perspective. >> So, in many ways, the core requirement for GDPR compliance, which is to discover an inventory and essentially tag all of your data, on a fine grade level, can be the greatest thing that ever happened to data monetization. In other words, it's the foundation of data reuse and monetization, unlocking the true value to your business of the data. So it needn't be an overhead burden, it can be the foundation for a new business model. >> Absolutely, Because I think if you talk about organizations becoming data driven, you have to look at what does the data asset actually mean. >> Jim: Yes. >> So to me, that's a curated data set with the right level of description, again underpinned by the right authority of privacy and ability to use the data. So I think GDPR is going to be a very good enabler, so again the small minority of organizations that have been successful have done this. They've had business laws freeze data catalogs, but now with GDPR, that's almost I think going to force the issue. Which I think is a very positive outcome. >> Now Pankaj, do you see any of your customers taking this concept of curation and so forth, the next step in terms of there's data assets but then there's data derived assets, like machine learning models and so forth. Data scientists build and train and deploy these models and algorithms, that's the core of their job. >> Man: Mhmm. >> And model governance is a hot hot topic we see all over. You've got to have tight controls, not just on the data, but on the models, 'cause they're core business IP. Do you see this architecture evolving among your customer so that they'll also increasingly be required to want to essentially catalog the models and identify curate them for re-usability. Possibly monetization opportunities. Is that something that any of your customers are doing or exploring? >> Some of our customers are looking at that as well. So again, initially, exactly it's an extension of the marketplace. So while one aspect of the marketplace is data sets, you can then combine to run the models, The other aspect is models that you can also search for and prescribe data. >> Jim: Yeah, like pre-trained models. >> Correct. >> Can be golden if they're pre trained and the core domain for which they're trained doesn't change all that often, they can have a great after market value conceivably if you want to resell that. >> Absolutely, and I think this is also a key enabler for the way data scientists and data engineers expect to operate. So this notion of IDs of collaborative notebooks and so forth, and being able to soft of share the outputs of models. And to be able to share that with other folks in the team who can then maybe tweak it for a different algorithm, is a huge, I think, productivity enabler, and we've seen. >> Jim: Yes. >> Quite a few of our technology partners working towards enabling these data scientists to move very quickly from a model they may have initially developed on a laptop, to actually then deploying the (mumbles). How can you do that very quickly, and reduce the time from an ideal hypothesis to production. >> (mumbles) Modularization of machine learning and deep learning, I'm seeing a lot of that among data scientists in the business world. Well thank you, Pankaj, we're out of time right now. This has been very engaging and fascinating discussion. And we thank you very much for coming on theCUBE. This has been Pankaj Sodhi of Accenture. We're here at Data Works Summit 2018 in Berlin, Germany. Its been a great show, and we have more expert guests that we'll be interviewing later in the day. Thank you very much, Pankaj. >> Thank you very much, Jim.

Published Date : Apr 19 2018

SUMMARY :

Brought to you by, Horton Works. He's based in London, and he's here to discuss really what is not entirely ready to comply with GDRP on day one, So in regards to Hadoop adoption, I think I would place In that scenario, they've taken the time to actually and starting to work out, from a footprint perspective, it refers to an end to end life cycle for development So this is already starting to challenge organizations haven't had the ability to actually curate the data sets, this is an architectural concept. the right definitions upfront, and as we curate and possibly being a player in the ongoing identification for the CDO function, because, rather than So rather than look at it from a compliance perspective, it actually helps the data scientists that ever happened to data monetization. Absolutely, Because I think if you talk So I think GDPR is going to be a very good enabler, and algorithms, that's the core of their job. so that they'll also increasingly be required to want to of the marketplace. if you want to resell that. And to be able to share that with other folks in the team to move very quickly from a model And we thank you very much for coming on theCUBE.

ENTITIES

Entity	Category	Confidence
Pankaj	PERSON	0.99+
James Kobielus	PERSON	0.99+
Jim	PERSON	0.99+
London	LOCATION	0.99+
Pankaj Sodhi	PERSON	0.99+
May 25th	DATE	0.99+
Accenture	ORGANIZATION	0.99+
John Chrysler	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Silicon Angled Media	ORGANIZATION	0.99+
GDPR	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
one aspect	QUANTITY	0.97+
one	QUANTITY	0.97+
Data Works Summit	EVENT	0.96+
two ways	QUANTITY	0.96+
Data Works Summit 2018	EVENT	0.95+
Dataworks Summit EU 2018	EVENT	0.93+
Europe	LOCATION	0.93+
Hadoop	TITLE	0.92+
day two	QUANTITY	0.9+
Hadoob	PERSON	0.87+
2018	EVENT	0.84+
day one	QUANTITY	0.82+
three	QUANTITY	0.79+
first ones	QUANTITY	0.77+
theCUBE	ORGANIZATION	0.76+
Wikbon Team	ORGANIZATION	0.72+
this morning	DATE	0.7+
Hadoob	TITLE	0.7+
GDRP	TITLE	0.55+
categories	QUANTITY	0.54+
Big DSO	ORGANIZATION	0.52+
Hadoob	ORGANIZATION	0.46+

Andreas Kohlmaier, Munich Re | Dataworks Summit EU 2018

>> Narrator: From Berlin, Germany, it's The Cube. Covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Well, hello. Welcome to The Cube. I'm James Kobielus. I'm the Lead Analyst for Big Data Analytics in the Wikibon team of SiliconANGLE Media. We are here at DataWorks Summit 2018 in Berlin. Of course, it's hosted by a Hortonworks. We are in day one of two days of interviews with executives, with developers, with customers. And this morning the opening keynote, one of the speaker's was a customer of Hortonworks from Munich Re, the reinsurance company based of course in Munich, Germany. Andreas Kohlmaier, who's the the head of Data Engineering I believe, it was an excellent discussion you've built out of data lake. And the first thing I'd like to ask you Andreas is right now it's five weeks until GDPR, the general data protection regulation, goes into full force on May 25th. And of course it applies to the EU, to anybody who does business in the EU including companies based elsewhere, such as in the US, needs to start complying with GDPR in terms of protecting personal data. Give us a sense for how Munich Re is approaching the deadline, your level of readiness to comply with GDPR, and how your investment in your data lake serves as a foundation for that compliance. >> Absolutely. So thanks for the question. GDPR, of course, is the hot topic across all European organizations. And we actually pretty well prepared. We compiled all the processes and the necessary regulations and in fact we are now selling this also as a service product to our customers. This has been an interesting side effect because we have lots of other insurance companies and we started to think about why not offer this as a service to other insurance companies to help them prepare for GDPR. This is actually proving to be one of the exciting interesting things that can happen about GDPR. >> Maybe that would be your new line of business. You make more money doing that then. >> I'm not sure! (crosstalk) >> Well that's excellent! So you've learned a lot of lessons. So already so you're ready for May 25th? You have, okay, that's great. You're probably far ahead of I know a lot of U.S. based firms. We're, you know in our country and in other countries, we're still getting our heads around all the steps that are needed so you know many companies outside the EU may call on you guys for some consulting support. That's great! So give us a sense for your data lake. You discussed it this morning but can you give us a sense for the business justification for building it out? How you've rolled it out? What stage it's in? Who's using it for what? >> So absolutely. So one of the key things for us at Munich Re is the issue about complexity or data diversity as it was also called this morning. So we have so many different areas where we are doing business in and we have lots of experts in the different areas. And those people and I really have they are very knowledgeable in the area and now they also get access to new sources of information. So to give you a sense we have people for example that are really familiar with weather and climate change, also with satellites. We have captains for ships and pilots for aircraft. So we have lots of expertise in all the different areas. Why? Because we are taking those risks in our books. >> Those are big risks too. You're a reinsurance company so yeah. >> And these are actually complex risks where we really have people that really are experts on their field. So we have sometimes have people that have 20 years plus of experience in the area and then they change to the insurer to actually bring their expertise on the field also to the risk management side. And all those people, they now get an additional source of input which is the data that is now more or less readily available everywhere. So first of all, we are getting new data with the submissions and the risks that we are taking and there are also interesting open data sources to connect to so that those experts can actually bring their knowledge and their analytics to a new level by adding the layer of data and analytics to their existing knowledge. And this allows us, first of all, to understand the risks even better, to put a better price tag on that, and also to take up new risks that have not been possible to cover before. So one of the things is also in the media I think is that we are also now covering the Hyperloop once it's going to be built. So those kind of new things are only possible with data analytics. >> So you're a Hortonworks customer. Give us a sense for how you're using or deploying Hortonworks data platform or data plane service and whatnot inside of your data lake. It sounds like it's a big data catalog, is that a correct characterization? >> So one of the things that is key to us is actually finding the right information and connecting those different experts to each other. So this is why the data catalog plays a central role. Here we have selected Alation as a catalog tool to connect the different experts in the group. The data lake at the moment is an on-prem installation. We are thinking about moving parts of that workload to the cloud to actually save operation costs. >> On top of HTP. >> Yeah so Alation is actually as far as I know technically it's a separate server that indexes the hive tables on HTP. >> So essentially the catalog itself is provides visualization and correlation across disparate data sources that are managing your hadoop. >> Yeah, so the the catalog actually is a great way of connecting the experts together. So that's you know okay if we have people on one part of the group that are very knowledgeable about weather and they have great data about weather then we'd like to connect them for example to the guys that doing crop insurance for India so that they can use the weather data to improve the models for example for crop insurance in Asia. And there the data catalog helps us to connect those experts because you can first of all find the data sources and you can also see who is the expert on the data. You can then also call them up or ask them a question in the tool. So it's essentially a great way to share knowledge and to connect the different experts of the group. >> Okay, so it's also surfacing up human expertise. Okay, is it also serving as a way to find training datasets possibly to use to build machine learning models to do more complex analyses? Is that something that you're doing now or plan to do in the future? >> Yes, so we are doing some of course machine learning also deep learning projects. We are also just started a Center of Excellence for artificial intelligence to see okay how we can use deep learning and machine learning also to find different ways of pricing insurance lists for example and this of course for all those cases data is key and we really need people to get access to the right data. >> I have to ask you. One of the things I'm seeing, you mentioned Center of Excellence for AI. I'm seeing more companies consider, maybe not do it, consider establishing a office of the chief AI officer like reporting to the CEO. I'm not sure that that's a great idea for a lot of businesses but since an insurance company lives and dies by data and calculations and so forth, is that something that Munich Re is doing or considering in a C-Suite level officer of that sort responsible for this AI competency or no? >> Could be in the future. >> Okay. >> We sort of just started with the AI Center of Excellence. That is now reporting to our Chief Data Officer so it's not yet a C-Suite. >> Is the Center of Excellence for AI, is it simply like a training institute to provide some basic skill building or is there something more there? Do you do development? >> Actually they are trying out and developing ways on how we can use AI on deep learning for insurance. One of the core things of course is also about understanding natural language to structure the information that we are getting in PDFs and in documents but really also while using deep learning as a new way to build tariffs for the insurance industry. So that's one of the the core things to find and create new tariffs. And we also experimenting, haven't found the product yet there, whether or not we can use deep learning to create better tariffs. That could also then be one of the services, again we are providing to our customers, the insurance companies and they build that into their products. Something like yeah the algorithms is powered by Munich Re. >> Now your users of your data lake, these are expert quantitative analysts, right, for the most part? So you mentioned using natural language understanding AI capabilities. Is that something that you have a need to do in high volume as a reinsurance company? Take lots of source documents and be able to as it were identify the content and high volume and important you know not OCR but rather the actual build a graph of semantic graph of what's going on inside the document? >> I'm going to give you an example of the things that we are doing with natural language processing. And this one is about the energy business in the US. So we are actually taking up or seeing most of the risks that are related to oil and gas in the U.S. So all the refineries, all the larger stations, and the the petroleum tanks. They are all in our books and for each and every one of them we get a nice report on risks there with a couple of hundred of pages. And inside these reports there's also some paragraph written in where actually the refinery or the plants gets its supplies from and where it ships its products to. And thence we are seeing all those documents. That's in the scale of a couple of thousands so it's not really huge but all together a couple of hundred thousand pages. We use NLP and AI on those documents to extract the supply chain information out of it so in that way we can stitch together a more or less complete picture of the supply chain for oil and gas in the U.S. which helps us again to better understand that risk because supply chain breakdown is one of the major risk in the world nowadays. >> Andreas, this has been great! We can keep on going on. I'm totally fascinated by your use of AI but also your use of a data lake and I'm impressed by your ability to get your, as a company get your as we say in the U.S. get your GDPR ducks in a row and that's great. So it's been great to have you on The Cube. We are here at DataWorks Summit in Berlin. (techno music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. And the first thing I'd like to ask you Andreas of the exciting interesting things Maybe that would be your new line of business. all the steps that are needed so you know So one of the key things for us at Munich Re You're a reinsurance company so yeah. on the field also to the risk management side. of your data lake. So one of the things that is key to us the hive tables on HTP. So essentially the catalog itself experts of the group. or plan to do in the future? for artificial intelligence to see okay how we One of the things I'm seeing, That is now reporting to our Chief Data Officer so to structure the information that we are getting on inside the document? of the risks that are related to oil and gas in the U.S. So it's been great to have you on The Cube.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Munich Re	ORGANIZATION	0.99+
Andreas Kohlmaier	PERSON	0.99+
May 25th	DATE	0.99+
Andreas	PERSON	0.99+
20 years	QUANTITY	0.99+
US	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Asia	LOCATION	0.99+
GDPR	TITLE	0.99+
two days	QUANTITY	0.99+
U.S.	LOCATION	0.99+
five weeks	QUANTITY	0.99+
Center of Excellence for AI	ORGANIZATION	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
India	LOCATION	0.99+
Berlin, Germany	LOCATION	0.98+
One	QUANTITY	0.98+
Munich Re	LOCATION	0.98+
one	QUANTITY	0.97+
DataWorks Summit	EVENT	0.97+
one part	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
2018	EVENT	0.96+
EU	ORGANIZATION	0.96+
Munich, Germany	LOCATION	0.96+
each	QUANTITY	0.96+
Dataworks Summit EU 2018	EVENT	0.93+
first thing	QUANTITY	0.88+
Hyperloop	TITLE	0.87+
this morning	DATE	0.86+
Center of Excellence for artificial intelligence	ORGANIZATION	0.85+
Alation	TITLE	0.84+
EU	LOCATION	0.83+
hundred thousand pages	QUANTITY	0.82+
one of	QUANTITY	0.79+
Alation	ORGANIZATION	0.77+
Wikibon	ORGANIZATION	0.74+
couple of hundred of pages	QUANTITY	0.73+
couple of thousands	QUANTITY	0.7+
Cube	ORGANIZATION	0.7+
C-Suite	TITLE	0.69+
first	QUANTITY	0.67+
European	LOCATION	0.6+
Data	PERSON	0.57+
services	QUANTITY	0.56+
every	QUANTITY	0.53+
Europe	LOCATION	0.52+
AI	ORGANIZATION	0.52+
Data	ORGANIZATION	0.5+
couple	QUANTITY	0.43+
Cube	PERSON	0.42+

Scott Gnau, Hortonworks | Dataworks Summit EU 2018

(upbeat music) >> Announcer: From Berlin, Germany, it's The Cube, covering DataWorks Summit Europe 2018. Brought to you by Hortonworks. >> Hi, welcome to The Cube, we're separating the signal from the noise and tuning into the trends in data and analytics. Here at DataWorks Summit 2018 in Berlin, Germany. This is the sixth year, I believe, that DataWorks has been held in Europe. Last year I believe it was at Munich, now it's in Berlin. It's a great show. The host is Hortonworks and our first interviewee today is Scott Gnau, who is the chief technology officer of Hortonworks. Of course Hortonworks got established themselves about seven years ago as one of the up and coming start ups commercializing a then brand new technology called Hadoop and MapReduce. They've moved well beyond that in terms of their go to market strategy, their product portfolio, their partnerships. So Scott, this morning, it's great to have ya'. How are you doing? >> Glad to be back and good to see you. It's been awhile. >> You know, yes, I mean, you're an industry veteran. We've both been around the block a few times but I remember you years ago. You were at Teradata and I was at another analyst firm. And now you're with Hortonworks. And Hortonworks is really on a roll. I know you're not Rob Bearden, so I'm not going to go into the financials, but your financials look pretty good, your latest. You're growing, your deal sizes are growing. Your customer base is continuing to deepen. So you guys are on a roll. So we're here in Europe, we're here in Berlin in particular. It's five weeks--you did the keynote this morning, It's five weeks until GDPR. The sword of Damacles, the GDPR sword of Damacles. It's not just affecting European based companies, but it's affecting North American companies and others who do business in Europe. So your keynote this morning, your core theme was that, if you're in enterprise, your business strategy is equated with your cloud strategy now, is really equated with your data strategy. And you got to a lot of that. It was a really good discussion. And where GDPR comes into the picture is the fact that protecting data, personal data of your customers is absolutely important, in fact it's imperative and mandatory, and will be in five weeks or you'll face a significant penalty if you're not managing that data and providing customers with the right to have it erased, or the right to withdraw consent to have it profiled, and so forth. So enterprises all over the world, especially in Europe, are racing as fast as they can to get compliant with GDPR by the May 25th deadline time. So, one of the things you discussed this morning, you had an announcement overnight that Hortonworks has released a new solution in technical preview called The Data Steward Studio. And I'm wondering if you can tie that announcement to GDPR? It seems like data stewardship would have a strong value for your customers. >> Yeah, there's definitely a big tie-in. GDPR is certainly creating a milestone, kind of a trigger, for people to really think about their data assets. But it's certainly even larger than that, because when you even think about driving digitization of a business, driving new business models and connecting data and finding new use cases, it's all about finding the data you have, understanding what it is, where it came from, what's the lineage of it, who had access to it, what did they do to it? These are all governance kinds of things, which are also now mandated by laws like GDPR. And so it's all really coming together in the context of the new modern data architecture era that we live in, where a lot of data that we have access to, we didn't create. And so it was created outside the firewall by a device, by some application running with some customer, and so capturing and interpreting and governing that data is very different than taking derivative transactions from an ERP system, which are already adjudicated and understood, and governing that kind of a data structure. And so this is a need that's driven from many different perspectives, it's driven from the new architecture, the way IoT devices are connecting and just creating a data bomb, that's one thing. It's driven by business use cases, just saying what are the assets that I have access to, and how can I try to determine patterns between those assets where I didn't even create some of them, so how do I adjudicate that? >> Discovering and cataloging your data-- >> Discovering it, cataloging it, actually even... When I even think about data, just think the files on my laptop, that I created, and I don't remember what half of them are. So creating the metadata, creating that trail of bread crumbs that lets you piece together what's there, what's the relevance of it, and how, then, you might use it for some correlation. And then you get in, obviously, to the regulatory piece that says sure, if I'm a new customer and I ask to be forgotten, the only way that you can guarantee to forget me is to know where all of my data is. >> If you remember that they are your customer in the first place and you know where all that data is, if you're even aware that it exists, that's the first and foremost thing for an enterprise to be able to assess their degree of exposure to GDPR. >> So, right. It's like a whole new use case. It's a microcosm of all of these really big things that are going on. And so what we've been trying to do is really leverage our expertise in metadata management using the Apache Atlas project. >> Interviewer: You and IBM have done some major work-- >> We work with IBM and the community on Apache Atlas. You know, metadata tagging is not the most interesting topic for some people, but in the context that I just described, it's kind of important. And so I think one of the areas where we can really add value for the industry is leveraging our lowest common denominator, open source, open community kind of development to really create a standard infrastructure, a standard open infrastructure for metadata tagging, into which all of these use cases can now plug. Whether it's I want to discover data and create metadata about the data based on patterns that I see in the data, or I've inherited data and I want to ensure that the metadata stay with that data through its life cycle, so that I can guarantee the lineage of the data, and be compliant with GDPR-- >> And in fact, tomorrow we will have Mandy Chessell from IBM, a key Hortonworks partner, discussing the open metadata framework you're describing and what you're doing. >> And that was part of this morning's keynote close also. It all really flowed nicely together. Anyway, it is really a perfect storm. So what we've done is we've said, let's leverage this lowest common denominator, standard metadata tagging, Apache Atlas, and uplevel it, and not have it be part of a cluster, but actually have it be a cloud service that can be in force across multiple data stores, whether they're in the cloud or whether they're on prem. >> Interviewer: That's the Data Steward Studio? >> Well, Data Plane and Data Steward Studio really enable those things to come together. >> So the Data Steward Studio is the second service >> Like an app. >> under the Hortonworks DataPlane service. >> Yeah, so the whole idea is to be able to tie those things together, and when you think about it in today's hybrid world, and this is where I really started, where your data strategy is your cloud strategy, they can't be separate, because if they're separate, just think about what would happen. So I've copied a bunch of data out to the cloud. All memory of any lineage is gone. Or I've got to go set up manually another set of lineage that may not be the same as the lineage it came with. And so being able to provide that common service across footprint, whether it's multiple data centers, whether it's multiple clouds, or both, is a really huge value, because now you can sit back and through that single pane, see all of your data assets and understand how they interact. That obviously has the ability then to provide value like with Data Steward Studio, to discover assets, maybe to discover assets and discover duplicate assets, where, hey, I can save some money if I get rid of this cloud instance, 'cause it's over here already. Or to be compliant and say yeah, I've got these assets here, here, and here, I am now compelled to do whatever: delete, protect, encrypt. I can now go do that and keep a record through the metadata that I did it. >> Yes, in fact that is very much at the heart of compliance, you got to know what assets there are out there. And so it seems to me that Hortonworks is increasingly... the H-word rarely comes up these days. >> Scott: Not Hortonworks, you're talking about Hadoop. >> Hadoop rarely comes up these days. When the industry talks about you guys, it's known that's your core, that's your base, that's where HDP and so forth, great product, great distro. In fact, in your partnership with IBM, a year or more ago, I think it was IBM standardized on HDP in lieu of their distro, 'cause it's so well-established, so mature. But going forward, you guys in many ways, Hortonworks, you have positioned yourselves now. Wikibon sees you as being the premier solution provider of big data governance solutions specifically focused on multi-cloud, on structured data, and so forth. So the announcement today of the Data Steward Studio very much builds on that capability you already have there. So going forward, can you give us a sense to your roadmap in terms of building out DataPlane's service? 'Cause this is the second of these services under the DataPlane umbrella. Give us a sense for how you'll continue to deepen your governance portfolio in DataPlane. >> Really the way to think about it, there are a couple of things that you touched on that I think are really critical, certainly for me, and for us at Hortonworks to continue to repeat, just to make sure the message got there. Number one, Hadoop is definitely at the core of what we've done, and was kind of the secret sauce. Some very different stuff in the technology, also the fact that it's open source and community, all those kinds of things. But that really created a foundation that allowed us to build the whole beginning of big data data management. And we added and expanded to the traditional Hadoop stack by adding Data in Motion. And so what we've done is-- >> Interviewer: NiFi, I believe, you made a major investment. >> Yeah, so we made a large investment in Apache NiFi, as well as Storm and Kafka as kind of a group of technologies. And the whole idea behind doing that was to expand our footprint so that we would enable our customers to manage their data through its entire lifecycle, from being created at the edge, all the way through streaming technologies, to landing, to analytics, and then even analytics being pushed back out to the edge. So it's really about having that common management infrastructure for the lifecycle of all the data, including Hadoop and many other things. And then in that, obviously as we discuss whether it be regulation, whether it be, frankly, future functionality, there's an opportunity to uplevel those services from an overall security and governance perspective. And just like Hadoop kind of upended traditional thinking... and what I mean by that was not the economics of it, specifically, but just the fact that you could land data without describing it. That seemed so unimportant at one time, and now it's like the key thing that drives the difference. Think about sensors that are sending in data that reconfigure firmware, and those streams change. Being able to acquire data and then assess the data is a big deal. So the same thing applies, then, to how we apply governance. I said this morning, traditional governance was hey, I started this employee, I have access to this file, this file, this file, and nothing else. I don't know what else is out there. I only have access to what my job title describes. And that's traditional data governance. In the new world, that doesn't work. Data scientists need access to all of the data. Now, that doesn't mean we need to give away PII. We can encrypt it, we can tokenize it, but we keep referential integrity. We keep the integrity of the original structures, and those who have a need to actually see the PII can get the token and see the PII. But it's governance thought inversely as it's been thought about for 30 years. >> It's so great you've worked governance into an increasingly streaming, real-time in motion data environment. Scott, this has been great. It's been great to have you on The Cube. You're an alum of The Cube. I think we've had you at least two or three times over the last few years. >> It feels like 35. Nah, it's pretty fun.. >> Yeah, you've been great. So we are here at Dataworks Summit in Berlin. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. So Scott, this morning, it's great to have ya'. Glad to be back and good to see you. So, one of the things you discussed this morning, of the new modern data architecture era that we live in, forgotten, the only way that you can guarantee and foremost thing for an enterprise to be able And so what we've been trying to do is really leverage so that I can guarantee the lineage of the data, discussing the open metadata framework you're describing And that was part of this morning's keynote close also. those things to come together. of lineage that may not be the same as the lineage And so it seems to me that Hortonworks is increasingly... When the industry talks about you guys, it's known And so what we've done is-- Interviewer: NiFi, I believe, you made So the same thing applies, then, to how we apply governance. It's been great to have you on The Cube. Nah, it's pretty fun.. So we are here at Dataworks Summit in Berlin.

ENTITIES

Entity	Category	Confidence
Europe	LOCATION	0.99+
Scott	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
Scott Gnau	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Teradata	ORGANIZATION	0.99+
Last year	DATE	0.99+
May 25th	DATE	0.99+
five weeks	QUANTITY	0.99+
Mandy Chessell	PERSON	0.99+
GDPR	TITLE	0.99+
Munich	LOCATION	0.99+
Rob Bearden	PERSON	0.99+
second service	QUANTITY	0.99+
30 years	QUANTITY	0.99+
both	QUANTITY	0.99+
tomorrow	DATE	0.99+
first	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.99+
second	QUANTITY	0.99+
DataPlane	ORGANIZATION	0.99+
sixth year	QUANTITY	0.98+
three times	QUANTITY	0.98+
first interviewee	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
one	QUANTITY	0.97+
this morning	DATE	0.97+
DataWorks Summit 2018	EVENT	0.97+
MapReduce	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
Hadoop	ORGANIZATION	0.96+
one time	QUANTITY	0.96+
35	QUANTITY	0.96+
single pane	QUANTITY	0.96+
NiFi	ORGANIZATION	0.96+
today	DATE	0.94+
DataWorks Summit Europe 2018	EVENT	0.93+
Data Steward Studio	ORGANIZATION	0.93+
Dataworks Summit EU 2018	EVENT	0.92+
about seven years ago	DATE	0.91+
a year or	DATE	0.88+
years	DATE	0.87+
Storm	ORGANIZATION	0.87+
Wikibon	ORGANIZATION	0.86+
Apache NiFi	ORGANIZATION	0.85+
The Cube	PERSON	0.84+
North American	OTHER	0.84+
DataWorks	ORGANIZATION	0.84+
Data Plane	ORGANIZATION	0.76+
Data Steward Studio	TITLE	0.75+
Kafka	ORGANIZATION	0.75+

Rob Bearden, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks Summit here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host, James Kobielus. We're joined by Rob Bearden. He is the CEO of Hortonworks. So thanks so much for coming on theCUBE again, Rob. >> Thank you for having us. >> So you just got off of the keynote on the main stage. The big theme is really about modern data architecture. So we're going to have this modern data architecture. What is it all about? How do you think about it? What's your approach? And how do you walk customers through this process? >> Well, there's a lot of moving parts in enabling a modern data architecture. One of the first steps is what we're trying to do is unlock the siloed transactional applications, and to get that data into a central architecture so you can get real time insights around the inclusive dataset. But what we're really trying to accomplish then within that modern data architecture is to bring all types of data whether it be real time streaming data, whether it be sensor data, IoT data, whether it be data that's coming from a connected core across the network, and to be able to bring all that data together in real time, and give the enterprise the ability to be able to take best in class action so that you get a very prescriptive outcome of what you want. So if we bring that data under management from point of origination and out on the edge, and then have the platforms that move that through its entire lifecycle, and that's our HDF platform, it gives the customer the ability to, after they capture it at the edge, move it, and then have the ability to process it as an event happens, a condition changes, various conditions come together, have the ability to process and take the exact action that you want to see performed against that, and then bring it to rest, and that's where our HDP platform comes into play where then all that data can be aggregated so you can have a holistic insight, and have real time interactions on that data. But then it then becomes about deploying those datasets and workloads on the tier that's most economically and architecturally pragmatic. So if that's on-prem, we make sure that we are architected for that on-prem deployment or private cloud or even across multiple public clouds simultaneously, and give the enterprise the ability to support each of those native environments. And so we think hybrid cloud architecture is really where the vast majority of our customers today and in the future, are going to want to be able to run and deploy their applications and workloads. And that's where our DataPlane Service Offering gives them the ability to have that hybrid architecture and the architectural latitude to move workloads and datasets across each tier transparently to what storage file format that they did or where that application is, and we provide all the tooling to match the complexity from doing that, and then we ensured that it has one common security framework, one common governance through its entire lifecycle, and one management platform to handle that entire lifecycle data. And that's the modern data architecture is to be able to bring all data under management, all types of data under management, and manage that in real time through its lifecycle til it comes at rest and deploy that across whatever architecture tier is most appropriate financially and from a performance on-cloud or prem. >> Rob, this morning at the keynote here in day one at DataWorks San Jose, you presented this whole architecture that you described in the context of what you call hybrid clouds to enable connected communities and with HDP, Hortonworks Data Platform 3.0 is one of the prime announcements, you brought containerization into the story. Could you connect those dots, containerization, connected communities, and HDP 3.0? >> Well, HDP 3.0 is really the foundation for enabling that hybrid architecture natively, and what's it done is it separated the storage from the compute, and so now we have the ability to deploy those workloads via a container strategy across whichever tier makes the most sense, and to move those application and datasets around, and to be able to leverage each tier in the deployment architectures that are most pragmatic. And then what that lets us do then is be able to bring all of the different data types, whether it be customer data, supply chain data, product data. So imagine as an industrial piece of equipment is, an airplane is flying from Atlanta, Georgia to London, and you want to be able to make sure you really understand how well is that each component performing, so that that plane is going to need service when it gets there, it doesn't miss the turnaround and leave 300 passengers stranded or delayed, right? Now with our Connected platform, we have the ability to take every piece of data from every component that's generated and see that in real time, and let the airlines make that real time. >> Delineate essentially. >> And ensure that we know every person that touched it and looked at that data through its entire lifecycle from the ground crew to the pilots to the operations team to the service. Folks on the ground to the reservation agents, and we can prove that if somehow that data has been breached, that we know exactly at what point it was breached and who did or didn't get to see it, and can prevent that because of the security models that we put in place. >> And that relates to compliance and mandates such as the Global Data Protection Regulation GDPR in the EU. At DataWorks Berlin a few months ago, you laid out, Hortonworks laid out, announced a new product called the Data Steward Studio to enable GDPR compliance. Can you give our listeners now who may not have been following the Berlin event a bit of an update on Data Steward Studio, how it relates to the whole data lineage, or set of requirements that you're describing, and then going forward what does Hortonworks's roadmap for supporting the full governance lifecycle for the Connected community, from data lineage through like model governance and so forth. Can you just connect a few dots that will be helpful? >> Absolutely. What's important certainly, driven by GDPR, is the requirement to be able to prove that you understand who's touched that data and who has not had access to it, and that you ensure that you're in compliance with the GDPR regulations which are significant, but essentially what they say is you have to protect the personal data and attributes of that data of the individual. And so what's very important is that you've got to be able to have the systems that not just secure the data, but understand who has the accessibility at any point in time that you've ever maintained that individual's data. And so it's not just about when you've had a transaction with that individual, but it's the rest of the history that you've kept or the multiple datasets that you may try to correlate to try to expand relationship with that customer, and you need to make sure that you can ensure not only that you've secured their data, but then you're protecting and governing who has access to it and when. And as importantly that you can prove in the event of a breach that you had control of that, and who did or did not access it, because if you can't prove any breach, that it was secure, and that no one breached it, who has or access to this not supposed to, you can be opened up for hundreds of thousands of dollars or even multiple millions of dollars of fines just because you can't prove that it was not accessed, and that's what the variety of our platforms, you mentioned Data Studio, is part of. DataPlane is one of the capabilities that gives us the ability. The core engine that does that is Atlas, and that's the open source governance platform that we developed through the community that really drives all the capabilities for governance that moves through each of our products, HDP, HDF, then of course, and DataPlane and Data Studio takes advantage of that and how it moves and replicates data and manages that process for us. >> One of the things that we were talking about before the cameras were rolling was this idea of data driven business models, how they are disrupting current contenders, new rivals coming on the scene all the time. Can you talk a little bit about what you're seeing and what are some of the most exciting and maybe also some of the most threatening things that you're seeing? >> Sure, in the traditional legacy enterprise, it's very procedural driven. You think about classic Encore ERP. It's worked very hard to have a very rigid, very structural procedural order to cash cycle that has not a great deal of flexibility. And it takes through a design process, it builds product, that then you sell product to a customer, and then you service that customer, and then you learn from that transaction different ways to automate or improve efficiencies in their supply chain. But it's very procedural, very linear. And in the new world of connected data models, you want to bring transparency and real time understanding and connectivity between the enterprise, the customer, the product, and the supply chain, and that you can take real time best in practice action. So for example you understand how well your product is performing. Is your customer using it correctly? Are they frustrated with that? Are they using it in the patterns and the frequency that they should be if they are going to expand their use and buy more, and if they're not, how do we engage in that cycle? How do we understand if they're going through a re-review and another buying of something similar that may not be with you for a different reason. And when we have real time visibility to our customer's interaction, understand our product's performance through its entire lifecycle, then we can bring real time efficiency with linking those together with our supply chain into the various relationships we have with our customers. To do that, it requires the modern data architecture, bringing data under management from the point it originates, whether it's from the product or the customer interacting with the company, or the customer interacting potentially with our ecosystem partners, mutual partners, and then letting the best in practice supply chain techniques, make sure that we're bringing the highest level of service and support to that entire lifecycle. And when we bring data under management, manage it through its lifecycle and have the historical view at rest, and leverage that across every tier, that's when we get these high velocity, deep transparency, and connectivity between each of the constituents in the value chain, and that's what our platforms give them the ability to do. >> Not only your platform, you guys have been in business now for I think seven years or so, and you shifted from being in the minds of many and including your own strategy from being the premier data at rest company in terms of the a Hadoop platform to being one of the premier data in motion companies. Is that really where you're going? To be more of a completely streaming focus, solution provider in a multi-cloud environment? And I hear a lot of Kafka in your story now that it's like, oh yeah, that's right, Hortonworks is big on Kafka. Can you give us just a quick sense of how you're making that shift towards low latency real time streaming, big data, or small data for that matter, with embedded analytics and machine learning? >> So, we have evolved from certainly being the leader in global data platforms with all the work that we do collaboratively, and in through the community, to make Hadoop an enterprise viable data platform that has the ability to run mission critical workloads and apps at scale, ensuring that it has all the enterprise facilities from security and governance and management. But you're right, we have expanded our footprint aggressively. And we saw the opportunity to actually create more value for our customers by giving them the ability to not wait til they bring data under management to gain an insight, because in that case, they're happened to be reactive post event post transaction. We want to give them the ability to shift their business model to being interactive, pre-event, pre-conditioned. The way to do that we learned was to be able to bring the data under management from the point of origination, and that's what we used MiNiFi and NiFi for, and then HDF, to move it through its lifecycle, and your point, we have the intellect, we have the insight, and then we have the ability then to process the best in class outcome based on what we know the variables are we're trying to solve for as that's happening. >> And there's the word, the phrase asset which of course is a transactional data paradigm plan, I hear that all over your story now in streaming. So, what you're saying is it's a completely enterprise-grade streaming environment from n to n for the new era of edge computing. Would that be a fair way of-- >> It's very much so. And our model and strategy has always been bring the other best in class engines for what they do well for their particular dataset. A couple of examples of that, one, you brought up Kafka, another is Spark. And they do what they do really well. But what we do is make sure that they fit inside an overall data architecture that then embodies their access to a much broader central dataset that goes from point of origination to point of rest on a whole central architecture, and then benefit from our security, governance, and operations model, being able to manage those engines. So what we're trying to do is eliminate the silos for our customers, and having siloed datasets that just do particular functions. We give them the ability to have an enterprise modern data architecture, we manage the things that bring that forward for the enterprise to have the modern data driven business models by bringing the governance, the security, the operations management, ensure that those workflows go from beginning to end seamlessly. >> Do you, go ahead. >> So I was just going to ask about the customer concerns. So here you are, you've now given them this ability to make these real time changes, what's sort of next? What's on their mind now and what do you see as the future of what you want to deliver next? >> First and foremost we got to make sure we get this right, and we really bring this modern data architecture forward, and make sure that we truly have the governance correct, the security models correct. One pane of glass to manage this. And really enable that hybrid data architecture, and let them leverage the cloud tier where it's architecturally and financially pragmatic to do it, and give them the ability to leg into a cloud architecture without risk of either being locked in or misunderstanding where the lines of demarcation of workloads or datasets are, and not getting the economies or efficiencies they should. And we solved that with DataPlane. So we're working very hard with the community, with our ecosystem and strategic partners to make sure that we're enabling the ability to bring each type of data from any source and deploy it across any tier with a common security, governance, and management framework. So then what's next is now that we have this high velocity of data through its entire lifecycle on one common set of platforms, then we can start enabling the modern applications to function. And we can go look back into some of the legacy technologies that are very procedural based and are dependent on a transaction or an event happening before they can run their logic to get an outcome because that grinds the customer in post world activity. We want to make sure that we're bringing that kind of, for example, supply chain functionality, to the modern data architecture, so that we can put real time inventory allocation based on the patterns that our customers go in either how they're using the product, or frustrations they've had, or success they've had. And we know through artificial intelligence and machine learning that there's a high probability not only they will buy or use or expand their consumption of whatever that they have of our product or service, but it will probably to these other things as well if we do those things. >> Predict the logic as opposed to procedural, yes, AI. >> And very much so. And so it'll be bringing those what's next will be the modern applications on top of this that become very predictive and enabler versus very procedural post to that post transaction. We're little ways downstream. That's looking out. >> That's next year's conference. >> That's probably next year's conference. >> Well, Rob, thank you so much for coming on theCUBE, it's always a pleasure to have you. >> Thank you both for having us, and thank you for being here, and enjoy the summit. >> We're excited. >> Thank you. >> We'll do. >> I'm Rebecca Knight for Jim Kobielus. We will have more from DataWorks Summit just after this. (upbeat music)

Published Date : Jun 20 2018

SUMMARY :

in the heart of Silicon Valley, He is the CEO of Hortonworks. keynote on the main stage. and give the enterprise the ability in the context of what you call and let the airlines from the ground crew to the pilots And that relates to and that you ensure that and maybe also some of the most and that you can take real and you shifted from being that has the ability to run for the new era of edge computing. and then benefit from our security, and what do you see as the future and make sure that we truly have Predict the logic as the modern applications on top of this That's probably next year's it's always a pleasure to have you. and enjoy the summit. I'm Rebecca Knight for Jim Kobielus.

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Rob Bearden	PERSON	0.99+
Jim Kobielus	PERSON	0.99+
London	LOCATION	0.99+
300 passengers	QUANTITY	0.99+
San Jose	LOCATION	0.99+
Rob	PERSON	0.99+
Silicon Valley	LOCATION	0.99+
Hortonworks	ORGANIZATION	0.99+
seven years	QUANTITY	0.99+
hundreds of thousands of dollars	QUANTITY	0.99+
San Jose, California	LOCATION	0.99+
each component	QUANTITY	0.99+
GDPR	TITLE	0.99+
DataWorks Summit	EVENT	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
millions of dollars	QUANTITY	0.98+
Atlas	TITLE	0.98+
first steps	QUANTITY	0.98+
HDP 3.0	TITLE	0.97+
One pane	QUANTITY	0.97+
both	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
First	QUANTITY	0.96+
next year	DATE	0.96+
each	QUANTITY	0.96+
DataPlane	TITLE	0.96+
theCUBE	ORGANIZATION	0.96+
Hadoop	TITLE	0.96+
DataWorks	ORGANIZATION	0.95+
Spark	TITLE	0.95+
today	DATE	0.94+
EU	LOCATION	0.93+
this morning	DATE	0.91+
Atlanta,	LOCATION	0.91+
Berlin	LOCATION	0.9+
each type	QUANTITY	0.88+
Global Data Protection Regulation GDPR	TITLE	0.87+
one common	QUANTITY	0.86+
few months ago	DATE	0.85+
NiFi	ORGANIZATION	0.85+
Data Platform 3.0	TITLE	0.84+
each tier	QUANTITY	0.84+
Data Studio	ORGANIZATION	0.84+
Data Studio	TITLE	0.83+
day one	QUANTITY	0.83+
one management platform	QUANTITY	0.82+
MiNiFi	ORGANIZATION	0.82+
San	LOCATION	0.71+
DataPlane	ORGANIZATION	0.69+
Kafka	TITLE	0.67+
Encore ERP	TITLE	0.66+
one common set	QUANTITY	0.65+
Data Steward Studio	ORGANIZATION	0.65+
HDF	ORGANIZATION	0.59+
Georgia	LOCATION	0.55+
announcements	QUANTITY	0.51+
Jose	ORGANIZATION	0.47+

Dan Potter, Attunity & Ali Bajwa, Hortonworks | DataWorks Summit 2018

>> Live from San Jose in the heart of Silicon Valley, it's theCUBE, covering DataWorks Summit 2018, brought to you by Hortonworks. >> Welcome back to theCUBE's live coverage of DataWorks here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Dan Potter. He is the VP Product Management at Attunity and also Ali Bajwah, who is the principal partner solutions engineer at Hortonworks. Thanks so much for coming on theCUBE. >> Pleasure to be here. >> It's good to be here. >> So I want to start with you, Dan, and have you tell our viewers a little bit about the company based in Boston, Massachusetts, what Attunity does. >> Attunity, we're a data integration vendor. We are best known as a provider of real-time data movement from transactional systems into data lakes, into clouds, into streaming architectures, so it's a modern approach to data integration. So as these core transactional systems are being updated, we're able to take those changes and move those changes where they're needed when they're needed for analytics for new operational applications, for a variety of different tasks. >> Change data capture. >> Change data capture is the heart of our-- >> They are well known in this business. They have changed data capture. Go ahead. >> We are. >> So tell us about the announcement today that Attunity has made at the Hortonworks-- >> Yeah, thank you, it's a great announcement because it showcases the collaboration between Attunity and Hortonworks and it's all about taking the metadata that we capture in that integration process. So we're a piece of a data lake architecture. As we are capturing changes from those source systems, we are also capturing the metadata, so we understand the source systems, we understand how the data gets modified along the way. We use that metadata internally and now we're built extensions to share that metadata into Atlas and to be able to extend that out through Atlas to higher data governance initiatives, so Data Steward Studio, into the DataPlane Services, so it's really important to be able to take the metadata that we have and to add to it the metadata that's from the other sources of information. >> Sure, for more of the transactional semantics of what Hortonworks has been describing they've baked in to HDP in your overall portfolios. Is that true? I mean, that supports those kind of requirements. >> With HTP, what we're seeing is you know the EDW optimization play has become more and more important for a lot of customers as they try to optimize the data that their EDWs are working on, so it really gels well with what we've done here with Attunity and then on the Atlas side with the integration on the governance side with GDPR and other sort of regulations coming into the play now, you know, those sort of things are becoming more and more important, you know, specifically around the governance initiative. We actually have a talk just on Thursday morning where we're actually showcasing the integration as well. >> So can you talk a little bit more about that for those who aren't going to be there for Thursday. GDPR was really a big theme at the DataWorks Berlin event and now we're in this new era and it's not talked about too, too much, I mean we-- >> And global business who have businesses at EU, but also all over the world, are trying to be systematic and are consistent about how they manage PII everywhere. So GDPR are those in EU regulation, really in many ways it's having ripple effects across the world in terms of practices. >> Absolutely and at the heart of understanding how you protect yourself and comply, I need to understand my data, and that's where metadata comes in. So having a holistic understanding of all of the data that resides in your data lake or in your cloud, metadata becomes a key part of that. And also in terms of enforcing that, if I understand my customer data, where the customer data comes from, the lineage from that, then I'm able to apply the protections of the masking on top of that data. So it's really, the GDPR effect has had, you know, it's created a broad-scale need for organizations to really get a handle on metadata so the timing of our announcement just works real well. >> And one nice thing about this integration is that you know it's not just about being able to capture the data in Atlas, but now with the integration of Atlas and Ranger, you can do enforcement of policies based on classifications as well, so if you can tag data as PCI, PII, personal data, that can get enforced through Ranger to say, hey, only certain admins can access certain types of data and now all that becomes possible once we've taken the initial steps of the Atlas integration. >> So with this collaboration, and it's really deepening an existing relationship, so how do you go to market? How do you collaborate with each other and then also service clients? >> You want to? >> Yeah, so from an engineering perspective, we've got deep roots in terms of being a first-class provider into the Hortonworks platform, both HDP and HDF. Last year about this time, we announced our support for acid merge capabilities, so the leading-edge work that Hortonworks has done in bringing acid compliance capabilities into Hive, was a really important one, so our change to data capture capabilities are able to feed directly into that and be able to support those extensions. >> Yeah, we have a lot of you know really key customers together with Attunity and you know maybe a a result of that they are actually our ISV of the Year as well, which they probably showcase on their booth there. >> We're very proud of that. Yeah, no, it's a nice honor for us to get that distinction from Hortonworks and it's also a proof point to the collaboration that we have commercially. You know our sales reps work hand in hand. When we go into a large organization, we both sell to very large organizations. These are big transformative initiatives for these organizations and they're looking for solutions not technologies, so the fact that we can come in, we can show the proof points from other customers that are successfully using our joint solution, that's really, it's critical. >> And I think it helps that they're integrating with some of our key technologies because, you know, that's where our sales force and our customers really see, you know, that as well as that's where we're putting in the investment and that's where these guys are also investing, so it really, you know, helps the story together. So with Hive, we're doing a lot of investment of making it closer and closer to a sort of real-time database, where you can combine historical insights as well as your, you know, real-time insights. with the new acid merge capabilities where you can do the inserts, updates and deletes, and so that's exactly what Attunity's integrating with with Atlas. We're doing a lot of investments there and that's exactly what these guys are integrating with. So I think our customers and prospects really see that and that's where all the wins are coming from. >> Yeah, and I think together there were two main barriers that we saw in terms of customers getting the most out of their data lake investment. One of them was, as I'm moving data into my data lake, I need to be able to put some structure around this, I need to be able to handle continuously updating data from multiple sources and that's what we introduce with Attunity composed for Hive, building out the structure in an automated fashion so I've got analytics-ready data and using the acid merge capabilities just made those updates much easier. The second piece was metadata. Business users need to have confidence that the data that they're using. Where did this come from? How is it modified? And overcoming both of those is really helping organizations make the most of those investments. >> How would you describe customer attitudes right now in terms of their approach to data because I mean, as we've talked about, data is the new oil, so there's a real excitement and there's a buzz around it and yet there's also so many high-profile cases of breeches and security concerns, so what would you say, is it that customers, are they more excited or are they more trepidatious? How would you describe the CIL mindset right now? >> So I think security and governance has become top of minds right, so more and more the serveways that we've taken with our customers, right, you know, more and more customers are more concerned about security, they're more concerned about governance. The joke is that we talk to some of our customers and they keep talking to us about Atlas, which is sort of one of the newer offerings on governance that we have, but then we ask, "Hey, what about Ranger for enforcement?" And they're like, "Oh, yeah, that's a standard now." So we have Ranger, now it's a question of you know how do we get our you know hooks into the Atlas and all that kind of stuff, so yeah, definitely, as you mentioned, because of GDPR, because of all these kind of issues that have happened, it's definitely become top of minds. >> And I would say the other side of that is there's real excitement as well about the possibilities. Now bringing together all of this data, AI, machine learning, real-time analytics and real-time visualization. There's analytic capabilities now that organizations have never had, so there's great excitement, but there's also trepidation. You know, how do we solve for both of those? And together, we're doing just that. >> But as you mentioned, if you look at Europe, some of the European companies that are more hit by GDPR, they're actually excited that now they can, you know, really get to understand their data more and do better things with it as a result of you know the GDPR initiative. >> Absolutely. >> Are you using machine learning inside of Attunity in a Hortonworks context to find patterns in that data in real time? >> So we enable data scientists to build those models. So we're not only bringing the data together but again, part of the announcement last year is the way we structure that data in Hive, we provide a complete historic data store so every single transaction that has happened and we send those transactions as they happen, it's at a big append, so if you're a data scientist, I want to understand the complete history of the transactions of a customer to be able to build those models, so building those out in Hive and making those analytics ready in Hive, that's what we do, so we're a key enabler to machine learning. >> Making analytics ready rather than do the analytics in the spring, yeah. >> Absolutely. >> Yeah, the other side to that is that because they're integrated with Atlas, you know, now we have a new capability called DataPlane and Data Steward Studio so the idea there is around multi-everything, so more and more customers have multiple clusters whether it's on-prem, in the cloud, so now more and more customers are looking at how do I get a single glass pane of view across all my data whether it's on-prem, in the cloud, whether it's IOT, whether it's data at rest, right, so that's where DataPlane comes in and with the Data Steward Studio, which is our second offering on top of DataPlane, they can kind of get that view across all their clusters, so as soon as you know the data lands from Attunity into Atlas, you can get a view into that across as a part of Data Steward Studio, and one of the nice things we do in Data Steward Studio is that we also have machine learning models to do some profiling, to figure out that hey, this looks like a credit card, so maybe I should suggest this as a tag of sensitive data and now the end user, the end administration has the option of you know saying that okay, yeah, this is a credit card, I'll accept that tag, or they can reject that and pick one of their own. >> Will any of this going forward of the Attunity CDC change in the capture capability be containerized for deployment to the edges in HDP 3.0? I mean, 'cause it seems, I mean for internetive things, edge analytics and so forth, change data capture, is it absolutely necessary to make the entire, some call it the fog computing, cloud or whatever, to make it a completely transactional environment for all applications from micro endpoint to micro endpoint? Are there any plans to do that going forward? >> Yeah, so I think what HDP 3.0 as you mentioned right, one of the key factors that was coming into play was around time to value, so with containerization now being able to bring third-party apps on top of Yarn through Docker, I think that's definitely an avenue that we're looking at. >> Yes, we're excited about that with 3.0 as well, so that's definitely in the cards for us. >> Great, well, Ali and Dan, thank you so much for coming on theCUBE. It's fun to have you here. >> Nice to be here, thank you guys. >> Great to have you. >> Thank you, it was a pleasure. >> I'm Rebecca Knight, for James Kobielus, we will have more from DataWorks in San Jose just after this. (techno music)

Published Date : Jun 19 2018

SUMMARY :

to you by Hortonworks. He is the VP Product So I want to start with able to take those changes They are well known in this business. about taking the metadata that we capture Sure, for more of the into the play now, you at the DataWorks Berlin event but also all over the world, so the timing of our announcement of the Atlas integration. so the leading-edge work ISV of the Year as well, fact that we can come in, so it really, you know, that the data that they're using. right, so more and more the about the possibilities. that now they can, you know, is the way we structure that data in Hive, do the analytics in the spring, yeah. Yeah, the other side to forward of the Attunity CDC one of the key factors so that's definitely in the cards for us. It's fun to have you here. Kobielus, we will have more

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Dan Potter	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Ali Bajwah	PERSON	0.99+
Dan	PERSON	0.99+
Ali Bajwa	PERSON	0.99+
Ali	PERSON	0.99+
James Kobielus	PERSON	0.99+
Thursday morning	DATE	0.99+
San Jose	LOCATION	0.99+
Silicon Valley	LOCATION	0.99+
last year	DATE	0.99+
San Jose	LOCATION	0.99+
Attunity	ORGANIZATION	0.99+
Last year	DATE	0.99+
One	QUANTITY	0.99+
second piece	QUANTITY	0.99+
GDPR	TITLE	0.99+
Atlas	ORGANIZATION	0.99+
Thursday	DATE	0.99+
both	QUANTITY	0.99+
theCUBE	ORGANIZATION	0.98+
Ranger	ORGANIZATION	0.98+
second offering	QUANTITY	0.98+
DataWorks	ORGANIZATION	0.98+
Europe	LOCATION	0.98+
Atlas	TITLE	0.98+
Boston, Massachusetts	LOCATION	0.98+
today	DATE	0.97+
DataWorks Summit 2018	EVENT	0.96+
two main barriers	QUANTITY	0.95+
DataPlane Services	ORGANIZATION	0.95+
DataWorks Summit 2018	EVENT	0.94+
one	QUANTITY	0.93+
San Jose, California	LOCATION	0.93+
Docker	TITLE	0.9+
single glass	QUANTITY	0.87+
3.0	OTHER	0.85+
European	OTHER	0.84+
Attunity	PERSON	0.84+
Hive	LOCATION	0.83+
HDP 3.0	OTHER	0.82+
one nice thing	QUANTITY	0.82+
DataWorks Berlin	EVENT	0.81+
EU	ORGANIZATION	0.81+
first	QUANTITY	0.8+
DataPlane	TITLE	0.8+
EU	LOCATION	0.78+
EDW	TITLE	0.77+
Data Steward Studio	ORGANIZATION	0.73+
Hive	ORGANIZATION	0.73+
Data Steward Studio	TITLE	0.69+
single transaction	QUANTITY	0.68+
Ranger	TITLE	0.66+
Studio	COMMERCIAL_ITEM	0.63+
CDC	ORGANIZATION	0.58+
DataPlane	ORGANIZATION	0.55+
them	QUANTITY	0.53+
HDP 3.0	OTHER	0.52+

Day Two Keynote Analysis | Dataworks Summit 2018

>> Announcer: From Berlin, Germany, it's the Cube covering Datawork Summit Europe 2018. Brought to you by Hortonworks. (electronic music) >> Hello and welcome to the Cube on day two of Dataworks Summit 2018 from Berlin. It's been a great show so far. We have just completed the day two keynote and in just a moment I'll bring ya up to speed on the major points and the presentations from that. It's been a great conference. Fairly well attended here. The hallway chatter, discussion's been great. The breakouts have been stimulating. For me the takeaway is the fact that Hortonworks, the show host, has announced yesterday at the keynote, Scott Gnau, the CTO of Hortonworks announced Data Steward Studio, DSS they call it, part of the data plane, Hotronworks data plane services portfolio and it could not be more timely Data Steward Studio because we are now five weeks away from GDPR, that's the General Data Protection Regulation becoming the law of the land. When I say the land, the EU, but really any company that operates in the EU, and that includes many U.S. based and Apac based and other companies will need to comply with the GDPR as of May 25th and ongoing. In terms of protecting the personal data of EU citizens. And that means a lot of different things. Data Steward Studio announced yesterday, was demo'd today, by Hortonworks and it was a really excellent demo, and showed that it's a powerful solution for a number of things that are at the core of GDPR compliance. The demo covered the capability of the solution to discover and inventory personal data within a distributed data lake or enterprise data environment, number one. Number two, the ability of the solution to centralize consent, provide a consent portal essentially that data subjects can use then to review the data that's kept on them to make fine grain consents or withdraw consents for use in profiling of their data that they own. And then number three, the show, they demonstrated the capability of the solution then to execute the data subject to people's requests in terms of the handling of their personal data. The three main points in terms of enabling, adding the teeth to enforce GDPR in an operational setting in any company that needs to comply with GDPR. So, what we're going to see, I believe going forward in the, really in the whole global economy and in the big data space is that Hortonworks and others in the data lake industry, and there's many others, are going to need to roll out similar capabilities in their portfolios 'cause their customers are absolutely going to demand it. In fact the deadline is fast approaching, it's only five weeks away. One of the interesting take aways from the, the keynote this morning was the fact that John Kreisa, the VP for marketing at Hortonworks today, a quick survey of those in the audience a poll, asking how ready they are to comply with GDPR as of May 25th and it was a bit eye opening. I wasn't surprised, but I think it was 19 or 20%, I don't have the numbers in front of me, said that they won't be ready to comply. I believe it was something where between 20 and 30% said they will be able to comply. About 40% I'm, don't quote me on that, but a fair plurality said that they're preparing. So that, indicates that they're not entirely 100% sure that they will be able to comply 100% to the letter of the law as of May 25th. I think that's probably accurate in terms of ballpark figures. I think there's a lot of, I know there's a lot of companies, users racing for compliance by that date. And so really GDPR is definitely the headline banner, umbrella story around this event and really around the big data community world-wide right now in terms of enterprise, investments in the needed compliance software and services and capabilities are needed to comply with GDPR. That was important. That wasn't the only thing that was covered in, not only the keynotes, but in the sessions here so far. AI, clearly AI and machine learning are hot themes in terms of the innovation side of big data. There's compliance, there's GDPR, but really innovation in terms of what enterprises are doing with their data, with their analytics, they're building more and more AI and embedding that in conversational UIs and chatbots and their embedding AI, you know manner of e-commerce applications, internal applications in terms of search, as well as things like face recognition, voice recognition, and so forth and so on. So, what we've seen here at the show is what I've been seeing for quite some time is that more of the actual developers who are working with big data are the data scientists of the world. And more of the traditional coders are getting up to speed very rapidly on the new state of the art for building machine learning and deep learning AI natural language processing into their applications. That said, so Hortonworks has become a fairly substantial player in the machine learning space. In fact, you know, really across their portfolio many of the discussions here I've seen shows that everybody's buzzing about getting up to speed on frameworks for building and deploying and iterating and refining machine learning models in operational environments. So that's definitely a hot theme. And so there was an AI presentation this morning from the first gentleman that came on that laid out the broad parameters of what, what developers are doing and looking to do with data that they maintain in their lakes, training data to both build the models and train them and deploy them. So, that was also something I expected and it's good to see at Dataworks Summit that there is a substantial focus on that in addition of course to GDPR and compliance. It's been about seven years now since Hortonworks was essentially spun off of Yahoo. It's been I think about three years or so since they went IPO. And what I can see is that they are making great progress in terms of their growth, in terms of not just the finances, but their customer acquisition and their deal size and also customer satisfaction. I get a sense from talking to many of the attendees at this event that Hortonworks has become a fairly blue chip vendor, that they're really in many ways, continuing to grow their footprint of Hortonworks products and services in most of their partners, such as IBM. And from what I can see everybody was wrapped with intention around Data Steward Studio and I sensed, sort of a sigh of relief that it looks like a fairly good solution and so I have no doubt that a fair number of those in this hall right now are probably, as we say in the U.S., probably kicking the tires of DSS and probably going to expedite their adoption of it. So, with that said, we have day two here, so what we're going to have is Alan Gates, one of the founders of Hortonworks coming on in just a few minutes and I'll be interviewing him, asking about the vibrancy in the health of the community, the Hortonworks ecosystem, developers, partners, and so forth as well as of course the open source communities for Hadoop and Ranger and Atlas and so forth, the growing stack of open source code upon which Hortonworks has built their substantial portfolio of solutions. Following him we'll have John Kreisa, the VP for marketing. I'm going to ask John to give us an update on, really the, sort of the health of Hortonworks as a business in terms of the reach out to the community in terms of their messaging obviously and have him really position Hortonworks in the community in terms of who's he see them competing with. What segments is Hortonworks in now? The whole Hadoop segment increasingly... Hadoop is there. It's the foundation. The word is not invoked in the context of discussions of Hortonworks as much now as it was in the past. And the same thing for say Cloudera one of their closest to traditional rivals, closest in the sense that people associate them. I was at the Cloudera analyst event the other week in Santa Monica, California. It was the same thing. I think both of these vendors are on a similar path to become fairly substantial data warehousing and data governance suppliers to the enterprises of the world that have traditionally gone with the likes of IBM and Oracle and SAP and so forth. So I think they're, Hortonworks, has definitely evolved into a far more diversified solution provider than people realize. And that's really one of the take aways from Dataworks Summit. With that said, this is Jim Kobielus. I'm the lead analyst, I should've said that at the outset. I'm the lead analyst at SiliconANGLE's Media's Wikibon team focused on big data analytics. I'm your host this week on the Cube at Dataworks Summit Berlin. I'll close out this segment and we'll get ready to talk to the Hortonworks and IBM personnel. I understand there's a gentleman from Accenture on as well today on the Cube here at Dataworks Summit Berlin. (electronic music)

Published Date : Apr 19 2018

SUMMARY :

Announcer: From Berlin, Germany, it's the Cube as a business in terms of the reach out to the community

ENTITIES

Entity	Category	Confidence
Jim Kobielus	PERSON	0.99+
John Kreisa	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John	PERSON	0.99+
Cloudera	ORGANIZATION	0.99+
May 25th	DATE	0.99+
Berlin	LOCATION	0.99+
Yahoo	ORGANIZATION	0.99+
five weeks	QUANTITY	0.99+
Alan Gates	PERSON	0.99+
Oracle	ORGANIZATION	0.99+
Hotronworks	ORGANIZATION	0.99+
Data Steward Studio	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Santa Monica, California	LOCATION	0.99+
GDPR	TITLE	0.99+
19	QUANTITY	0.99+
both	QUANTITY	0.99+
100%	QUANTITY	0.99+
today	DATE	0.99+
20%	QUANTITY	0.99+
one	QUANTITY	0.99+
yesterday	DATE	0.99+
U.S.	LOCATION	0.99+
DSS	ORGANIZATION	0.99+
30%	QUANTITY	0.99+
Berlin, Germany	LOCATION	0.98+
Dataworks Summit 2018	EVENT	0.98+
three main points	QUANTITY	0.98+
Atlas	ORGANIZATION	0.98+
20	QUANTITY	0.98+
about seven years	QUANTITY	0.98+
Accenture	ORGANIZATION	0.97+
SiliconANGLE	ORGANIZATION	0.97+
One	QUANTITY	0.97+
about three years	QUANTITY	0.97+
Day Two	QUANTITY	0.97+
first gentleman	QUANTITY	0.96+
day two	QUANTITY	0.96+
SAP	ORGANIZATION	0.96+
EU	LOCATION	0.95+
Datawork Summit Europe 2018	EVENT	0.95+
Dataworks Summit	EVENT	0.94+
this morning	DATE	0.91+
About 40%	QUANTITY	0.91+
Wikibon	ORGANIZATION	0.9+
EU	ORGANIZATION	0.9+

Muggie van Staden, Obsidian | Dataworks Summit 2018

>> Voiceover: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Hi, hello, welcome to theCUBE, I'm James Kobielus. I'm the lead analyst for Big Data Analytics at the Wikibon, which is the team inside of SiliconANGLE Media that focuses on emerging trends and technologies. We are here, on theCUBE at DataWorks Summit 2018 in Berlin, Germany. And I have a guest here. This is, Muggie, and if I get it wrong, Muggie Van Staden >> That's good enough, yep. >> Who is with Obsidian, which is a South Africa-based partner of Hortonworks. And I'm not familiar with Obsidian, so I'm going to ask Muggie to tell us a little bit about your company, what you do, your focus on open source, and really the opportunities you see for big data, for Hadoop, in South Africa, really the African continent as a whole. So, Muggie? >> Yeah, James great to be here. Yes, Obsidian, we started it 23 years ago, focusing mostly on open source technologies, and as you can imagine that has changed a lot over the last 23 years when we started the concept of selling Linux was basically a box with a hat and maybe a T-shirt in it. Today that's changed. >> James: Hopefully there's a stuffed penguin in there, too. (laughing) I could use that right now. >> Maybe a manual. So our business has evolved a lot over the last 23 years. And one of the technologies that has come around is Hadoop. And we actually started with some of the other Hadoop vendors out there as our first partnerships, and probably three or four years ago we decided to take on Hortonworks as one of our vendors. We found them an amazing company to work with. And together with them we've now worked in four of the big banks in South Africa. One of them is actually here at DataWorks Summit. They won an award last night. So it's fantastic to be part of all of that. And yes, South Africa being so far removed from the rest of the world. They have different challenges. Everybody's nervous of Cloud. We have the joys that we don't really have any Cloud players locally yet. The two big players are in Microsoft and Amazon are planning some data centers soon. So the guys have different challenges to Europe and to the States. But big data, the big banks are looking at it, starting to deploy nice Hadoop clusters, starting to ingest data, starting to get real business value out of it, and we're there to help, and hopefully the four is the start for us and we can help lots of customers on this journey. >> Are South African-based companies, because you are so distant in terms of miles on the planet from Europe, from the EU, is any company in South Africa, or many companies, concerned at all about the global, or say the general data protection regulation, GDPR? US-based companies certainly are 'cause they operate in Europe. So is that a growing focus for them? And we have five weeks until GDPR kicks in. So tell me about it. >> Yeah, so from a South African point of view, some of the banks and some of the companies would have subsidiaries in Europe. So for them it's a very real thing. But we have our own Act called PoPI, which is the protection of private information, so very similar. So everybody's keeping an eye on it. Everybody's worried. I think everybody's worried for the first company to be fined. And then they will all make sure that they get their things right. But, I think not just because of a legislation, I think it's something that everybody should worry about. How do we protect data? How do we make sure the right people have access to the correct data when they should and nobody violates that because I mean, in this day and age, you know, Google and Amazon and those guys probably know more about me than my family does. So it's a challenge for everybody. And I think it's just the right thing for companies to do is to make sure that the data that they do have that they really do take good care of it. We trust them with our money and now we're trusting them with our data. So it's a real challenge for everybody. >> So how long has Obsidian been a partner of Hortonworks and how has your role, or partnership I should say, evolved over that time, and how do you see it evolving going forward. >> We've been a partner about three or four years now. And started off as a value added reseller. We also a training partner in South Africa for them. And as they as company have evolved, we've had to evolve with them. You know, so they started with HTTP as the Hadoop platform. Now they're doing NiFi and HDF, so we have to learn all of those technologies as well. But very, very excited where they're going with DataPlane service just managing a customer's data across multiple clusters, multiple clouds, because that's realistically where we see all the customers going, is you know clusters, on-premise clusters in typically multiple Clouds and how do you manage that? And we are very excited to walk this road together with Hortonworks and all the South African customers that we have. >> So you say your customers are deploying multiple Clouds. Public Clouds or hybrid private-public Clouds? Give us a sense, for South Africa, whether public Cloud is a major, or is a major deployment option or choice for financial services firms that you work with. >> Not necessarily financial services, so most of them are kicking tires at this stage, nobody's really put major work loads in there. As I mentioned, both Amazon and Microsoft are planning to put data centers down in South Africa very soon, and I think that will spur a big movement towards Cloud, but we do have some customers, unfortunately not Hortonworks customers, that are actually mostly in the Cloud. And they are now starting to look at a multi-Cloud strategy. So to ideally be in the three or four major Cloud providers and spinning up the right workloads in the right Cloud, and we're there to help. >> One of the most predominant workloads that your customers are running in the Cloud, is it backend in terms of data ingest and transformation? Is it a bit of maybe data warehousing with unstructured data? Is it a bit of things like queriable archiving. I want to get a sense for, what is predominant right now in workloads? >> Yeah I think most of them start with (mumble) environments. (mumbles) one customer that's heavily into Cloud from a data point of view. Literally it's their data warehouse. They put everything in there. I think from the banking customers, most of them are considering DR of their existing Hadoop clusters, maybe a subset of their data and not necessarily everything. And I think some of them are also considering putting their unstructured data outside on the Cloud because that's where most of it's coming from. I mean, if you have Twitter, Facebook, LinkedIn data, it's a bit silly to pull all of that into your environment, why not just put it in the Cloud, that's where it's coming from, and analyze that and connect it back to your data where relevant. So I think a lot of the customers would love to get there, and now Hortonworks makes it so much easier to do that. I think a lot of them will start moving in that direction. Now, excuse me, so are any or many of your customers doing development and training of machine learning algorithms and models in their Clouds? And to the extent that they are, are they using tools like the IBM Data Science Experience that Hortonworks resells for that? >> I think it's definitely on the radar for a lot of them. I'm not aware of anybody using it yet, but lots of people are looking at it and excited about the partnership between IBM and Hortonworks. And IBM has been a longstanding player in the South African market, and it's exciting for us as well to bring them into the whole Hortonworks ecosystem, and together solve real world problems. >> Give us a sense for how built out the big data infrastructure is in neighboring countries like Botswana or Angola or Mozambique and so forth. Is that an area that your company, are those regions that your company operates in? Sells into? >> We don't have offices, but we don't have a problem going in and helping customers there, so we've had projects in the past, not data related, that we've flown in and helped people. Most of the banks from a South African point of view, have branches into Africa. So it's on the roadmap, some are a little bit ahead of others, but definitely on the roadmap to actually put down Hadoop clusters in some of the major countries all throughout Africa. There's a big debate, do you put it down there, do you leave the data in South Africa? So they're all going through their own legislation, but it's definitely on the roadmap for all of them to actually take their data, knowledge in data science, up into Africa. >> Now you say that in South Africa Proper, there are privacy regulations, you know, maybe not the same as GDPR, but equivalent. Throughout Africa, at least throughout Southern Africa, how is privacy regulation lacking or is it emerging? >> I think it's emerging. A lot of the countries do have the basic rule that their data shouldn't leave the country. So everybody wants that data sovereignty and that's why a lot of them will not go to Cloud, and that's part of the challenges for the banks, that if they have banks up in Botswana, etc. And Botswana rules are our data has to stay in country. They have to figure out a way how do they connect that data to get the value for all of their customers. So real world challenges for everybody. >> When you're going into and selling into an emerging, or developing nation, of you need to provide upfront consulting to help the customer bootstrap their own understanding of the technology and making the business case and so forth. And how consultative is the selling process... >> Absolutely, and what we see with the banks, most of them even have a consultative approach within their own environment, so you would have the South African team maybe flying into the team at (mumbles) Botswana, and share some of the learnings that they've had. And then help those guys get up to speed. The reality is the skills are not necessarily in country. So there's a lot of training, a lot of help to go and say, we've done this, let us upscale you. And be a part of that process. So we sometimes send in teams to come and do two, three day training, basics, etc., so that ultimately the guys can operationalize in each country by themselves. >> So, that's very interesting, so what do you want to take away from this event? What do you find most interesting in terms of the sessions you've been in around the community showcase that you can take back to Obsidian, back in your country and apply? Like the announcement this morning of the Data Steward Studio. Do you see a possible, that your customers might be eager to use that for curation of their data in their clusters? >> Definitely, and one of the key messages for me was Scott, the CTO's message about your data strategy, your Cloud strategy, and your business strategy. It is effectively the same thing. And I think that's the biggest message that I would like to take back to the South African customers is to go and say, you need to start thinking about this. You know, as Cloud becomes a bigger reality for us, we have to align, we have to go and say, how do we get your data where it belongs? So you know, we like to say to our customers, we help the teams get the right code to the right computer and the right data, and I think it's absolutely critical for all of the customers to go and say, well, where is that data going to sit? Where is the right compute for that piece of data? And can we get it then, can we manage it, etc.? And align to business strategy. Everybody's trying to do digital transformation, and those three things go very much hand-in-hand. >> Well, Muggie, thank you very much. We're at the end of our slot. This has been great. It's been excellent to learn more about Obsidian and the work you're doing in South Africa, providing big data solutions or working with customers to build the big data infrastructure in the financial industry down there. So this has been theCUBE. We've been speaking with Muggie Van Staden of Obsidian Systems, and here at DataWorks Summit 2018 in Berlin. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. I'm the lead analyst for Big Data Analytics at the Wikibon, and really the opportunities you see for big data, and as you can imagine that has changed a lot I could use that right now. So the guys have different challenges to Europe or say the general data protection regulation, GDPR? And I think it's just the right thing for companies to do and how do you see it evolving going forward. And we are very excited to walk this road together So you say your customers are deploying multiple Clouds. And they are now starting to look at a multi-Cloud strategy. One of the most predominant workloads and now Hortonworks makes it so much easier to do that. and excited about the partnership the big data infrastructure is in neighboring countries but definitely on the roadmap to actually put down you know, maybe not the same as GDPR, and that's part of the challenges for the banks, And how consultative is the selling process... and share some of the learnings that they've had. around the community showcase that you can take back for all of the customers to go and say, and the work you're doing in South Africa,

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
James Kobielus	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
Muggie Van Staden	PERSON	0.99+
Africa	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Muggie van Staden	PERSON	0.99+
Botswana	LOCATION	0.99+
Mozambique	LOCATION	0.99+
Angola	LOCATION	0.99+
Muggie	PERSON	0.99+
Scott	PERSON	0.99+
South Africa	LOCATION	0.99+
James	PERSON	0.99+
Southern Africa	LOCATION	0.99+
two	QUANTITY	0.99+
LinkedIn	ORGANIZATION	0.99+
Berlin	LOCATION	0.99+
three day	QUANTITY	0.99+
three	QUANTITY	0.99+
GDPR	TITLE	0.99+
Facebook	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Twitter	ORGANIZATION	0.99+
Obsidian Systems	ORGANIZATION	0.99+
first company	QUANTITY	0.99+
five weeks	QUANTITY	0.99+
four	QUANTITY	0.99+
first partnerships	QUANTITY	0.99+
three	DATE	0.99+
Today	DATE	0.98+
Linux	TITLE	0.98+
23 years ago	DATE	0.98+
DataWorks Summit 2018	EVENT	0.98+
both	QUANTITY	0.97+
EU	LOCATION	0.97+
Wikibon	ORGANIZATION	0.97+
one	QUANTITY	0.97+
PoPI	TITLE	0.97+
Data Steward Studio	ORGANIZATION	0.97+
each country	QUANTITY	0.97+
Cloud	TITLE	0.97+
US	LOCATION	0.96+
last night	DATE	0.96+
SiliconANGLE Media	ORGANIZATION	0.96+
four years	QUANTITY	0.96+
DataWorks Summit	EVENT	0.96+
Hadoo	ORGANIZATION	0.96+
One	QUANTITY	0.96+
Dataworks Summit 2018	EVENT	0.95+
Hadoop	ORGANIZATION	0.93+
about three	QUANTITY	0.93+
two big players	QUANTITY	0.93+
theCUBE	ORGANIZATION	0.93+

Bernard Marr | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE, covering DataWorks Summit Europe 2018, brought to you by Hortonworks. >> Well, hello, and welcome to the Cube. I'm James Kobielus. I'm the lead analyst for Big Data Analytics with the Wikibon team within SiliconANGLE Media. We are here at the DataWorks Summit 2018 in Berlin, Germany. And I have a special guest, we have a special guest, Bernard Marr, one of the most influential, thought leaders in the big data analytics arena. And it's not just me saying that. You look at anybody's rankings, Bernard's usually in the top two or three of influentials. He publishes a lot. He's a great consultant. He keynoted this morning on the main stage at Dataworks Summit. It was a very fascinating discussion, Bernard. And I'm a little bit star struck 'cause I assumed you were this mythical beast who just kept putting out these great books and articles and so forth. And I'm glad to have you. So, Bernard, I'd like for you to stand back, we are here in Berlin, in Europe. This is April of 2018, in five weeks time, the general data protection, feels global 'cause it sort of is. >> It is. >> The general data protection regulation will take full force, which means that companies that do business in Europe, in the EU, must under the law protect the personal data they collect on EU citizens ensuring the right to privacy, the right to be forgotten, ensuring user's, people's ability to withhold consent to process and profile and so forth. So that mandate is coming down very fast and so forth. What is your thoughts on GDPR? Is it a good thing, Bernard, is it high time? Is it a burden? Give us your thoughts on GDPR currently. >> Okay, first, let me return all the compliments. It's really great to be here. I think GDPR can be both. And for me it will come down very much to the way it gets implemented. So, in principle for me, it is a good thing because what I've always made companies do and advise them to do is to be completely transparent in the way they're collecting data and using data. I believe that the big data world can't thrive if we don't develop this trust and have this transparency. So in principle, it's a great thing. For me will come down to the implementation of all of this. I had an interesting chat just minutes ago with the event photographer saying that once GDPR kicks in he can't actually publish any photographs without getting written consent for everyone in the photograph. That's a massive challenge and he was saying he can't afford to lose 4% of his global revenue. So I think it will be very interesting to see how this will-- >> How it'll be affecting face recognition, I'm sorry go ahead. >> Bernard: Yeah maybe. >> Well maybe that's a bad thing, maybe it's a good thing. >> Maybe it is, yeah, maybe. So for me, in principle a very good thing. In practice, I'm intrigued to see how this will get implemented. >> Of the clients you consult, what percentage in the EU, without giving away names, what percentage do you think are really ready right now or at least will be by May 25th to comply with the letter of the law? Is it more than 50%? Is it more than 80%? Or will there be a lot of catching up to do in a short period of time? >> My sense is that there's a lot of catching up to do. I think people are scrambling to get ready at the moment. But the thing is nobody really knows what being ready really means. I think there are lots of different interpretations. I've been talking to a few lawyers recently. And everyone has a slightly different interpretation of how far they can push the boundaries, so, again, I'm intrigued to see what will actually happen. And I very much hope that common sense prevails and it will be seen as a good force and something that is actually good for everyone in the field of big data. >> So slightly changing track, in the introduction of you this morning, I think it was John Christ of Hortonworks said that you made a prediction about this year that AI will be used to automate more things than people realize and it'll come along fairly fast. Can you give us a sense for how automation, AI is enabling greater automation, and whether, you know, this is the hot button topic, AI will put lots of people out of work fairly quickly by automating everything that white collar workers and so forth are doing, what are your thoughts there? Is it cause for concern? >> Yes, and it's probably one of the questions I get asked the most and I wish I had a very good answer for it. If we look back at the other, I believe that we are experiencing a new industrial revolution at the moment, and if you look at what the World Economic Forums CEO and founder, Klaus Schwab, is preaching about, it is that we are experiencing this new industrial revolution that will truly transform the workplace and our lives. In history, all of the other three previous industrial revolutions have somehow made our lives better. And we have always found something to do for us. And they have changed the jobs. Again, there was a recent report that looked at some of the key AI trends and what they found is that actually AI produces more new jobs than it destroys. >> Will we all become data scientists under, as AI becomes predominant? Or what's going on here? >> No I don't, and this is, I wish I had the answer to this. For me is the advice I give my own children now is to focus on the really human element of it and probably the more strategic element. The problem is five, six years ago this was a lot easier. I could talk about emotional, intelligence, creativity, with advances in machine learning, this advice is no longer true. And lots of jobs, even some of the things I do, I write for Forbes on a regular basis. I also know that AIs write for Forbes. A lot of the analyst reports are now machine generated. >> Natural language generation, a huge use case for AI that people don't realize. >> Bernard: Absolutely. >> Yeah. >> So, for me I see it, as an optimist I see it positively. I also question whether we as human beings should be going to work eight hours a day doing lots of stuff we quite often don't enjoy. So for me, the challenge is adjusting our economic model to this new reality, and I see that there will be significant disruption over the next 20 years that with all the technology coming in and really challenging our jobs. >> Will AI put you and me out of a job. In other words, will it put the analysts and the consultants out of work and allow people to get expert advice on how to manage technology without having to go through somebody like a you or a me? >> Absolutely, and for me, my favorite example is looking at medicine. If you look at doctors, traditionally you send a doctor to medical school for seven years. You then hope that they retain 10% of what they've learned if you're lucky. Then they gain some experience. You then turn up in the practice with your conditions. Again, if you're super lucky, they might have skim read some of your previous conditions, and then diagnose you. And unless you have something that's very common, the chance that they get this right is very low. So compare this with your old stomping ground IBMs Watson, so they are able to feed all medical knowledge into that cognitive computing platform. They can update this continuously, and you think, and could then talk to Watson eight hours a day if I wanted to about my symptoms. >> But can you trust that advice? Why should you trust the advice that's coming from a bot? Yeah, that's one of the key issues. >> Absolutely, and I think at the moment maybe not quite because there's still a human element that a doctor can bring because they can read your emotions, they can understand your tone of voice. This is going to change with affective computing and the ability for machines to do more of this, too. >> Well science fiction authors run amok of course, because they imagine the end state of perfection of all the capabilities like you're describing. So we perfect robotics. We perfect emotion analytics and so forth. We use machine learning to drive conversational UIs. Clearly a lot of people imagine that the technology, all those technologies are perfected or close to it, so, you know. But clearly you and I know that it's a lot of work to do to get them-- >> And we both have been in the technology space long enough to know that there are promises and there's lots of hype, and then there's a lot of disappointment, and it usually takes longer than most people predict. So what I'm seeing is that every industry I work in, and this is what my prediction is, automation is happening across every industry I work in. More things, even things I thought five years ago couldn't be automated. But to get to a state where it really transforms our world, I think we are still a few years away from that. >> Bernard, in terms of the hype factor for AI, it's out of sight. What do you think is the most hyped technology or application under the big umbrella of AI right now in terms of the hype far exceeds the utility. I don't want to put words in your mouth. I've got some ideas. Your thoughts? >> Lots of them. I think that the two areas I write a lot about and talk to companies a lot about is deep learning, machine learning, and blockchain technology. >> James: Blockchain. >> So they are, for me, they have huge potential, some amazing use cases, at the same time the hype is far ahead of reality. >> And there's sort of an intersection between AI and blockchain right now, but it's kind of tentative. Hey, Bernard, we are at the end of this segment. It's been so great. We could just keep going on and on and on. >> I know we could just be... >> Yeah, there's a lot I've been wanting to ask you for a long time. I want to thank you for coming to theCUBE. >> Pleasure. >> This has been Bernard Marr. I'm James Kobielus on theCUBE from DataWorks Summit in Berlin, and we'll be back with another guest in just a little while. Thank you very much.

Published Date : Apr 18 2018

SUMMARY :

brought to you by Hortonworks. And I'm glad to have you. ensuring the right to privacy, I believe that the big data world can't thrive I'm sorry go ahead. In practice, I'm intrigued to see I think people are scrambling to get ready at the moment. in the introduction of you this morning, and if you look at what the World Economic Forums and probably the more strategic element. a huge use case for AI that people don't realize. and I see that there will be significant disruption and allow people to get expert advice the chance that they get this right is very low. Yeah, that's one of the key issues. and the ability for machines to do more of this, too. Clearly a lot of people imagine that the technology, I think we are still a few years away from that. Bernard, in terms of the hype factor for AI, and talk to companies a lot about at the same time the hype is far ahead of reality. Hey, Bernard, we are at the end of this segment. to ask you for a long time. and we'll be back with another guest in just a little while.

ENTITIES

Entity	Category	Confidence
Bernard	PERSON	0.99+
James Kobielus	PERSON	0.99+
Berlin	LOCATION	0.99+
Bernard Marr	PERSON	0.99+
John Christ	PERSON	0.99+
April of 2018	DATE	0.99+
Europe	LOCATION	0.99+
Klaus Schwab	PERSON	0.99+
10%	QUANTITY	0.99+
seven years	QUANTITY	0.99+
May 25th	DATE	0.99+
IBMs	ORGANIZATION	0.99+
James	PERSON	0.99+
4%	QUANTITY	0.99+
more than 80%	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
three	QUANTITY	0.99+
more than 50%	QUANTITY	0.99+
both	QUANTITY	0.99+
GDPR	TITLE	0.99+
two areas	QUANTITY	0.99+
DataWorks Summit	EVENT	0.99+
first	QUANTITY	0.98+
one	QUANTITY	0.98+
eight hours a day	QUANTITY	0.98+
Dataworks Summit	EVENT	0.98+
World Economic Forums	ORGANIZATION	0.97+
DataWorks Summit 2018	EVENT	0.97+
2018	EVENT	0.97+
five years ago	DATE	0.96+
five	DATE	0.96+
six years ago	DATE	0.96+
five weeks	QUANTITY	0.95+
Wikibon	ORGANIZATION	0.95+
Dataworks Summit 2018	EVENT	0.92+
Forbes	ORGANIZATION	0.92+
this year	DATE	0.89+
EU	LOCATION	0.88+
three previous industrial revolutions	QUANTITY	0.87+
this morning	DATE	0.86+
EU	ORGANIZATION	0.8+
Big Data Analytics	ORGANIZATION	0.78+
one of	QUANTITY	0.77+
next 20 years	DATE	0.75+
Watson	TITLE	0.75+
SiliconANGLE Media	ORGANIZATION	0.71+
Watson	ORGANIZATION	0.68+
minutes	DATE	0.61+
two	QUANTITY	0.54+
questions	QUANTITY	0.51+
morning	DATE	0.5+
Cube	ORGANIZATION	0.48+
issues	QUANTITY	0.46+

Keynote Analysis | Dataworks Summit 2018

>> Narrator: From Berlin, Germany, it's theCUBE! Covering DataWorks Summit, Europe 2018. (upbeat music) Brought to you by Hortonworks. (upbeat music) >> Hello, and welcome to theCUBE. I'm James Kobielus. I'm the lead analyst for Big Data analytics in the Wikibon team of SiliconANGLE Media, and we're here at DataWorks Summit 2018 in Berlin, Germany. And it's an excellent event, and we are here for two days of hard-hitting interviews with industry experts focused on the hot issues facing customers, enterprises, in Europe and the world over, related to the management of data and analytics. And what's super hot this year, and it will remain hot as an issue, is data privacy and privacy protection. Five weeks from now, a new regulation of the European Union called the General Data Protection Regulation takes effect, and it's a mandate that is effecting any business that is not only based in the EU but that does business in the EU. It's coming fairly quickly, and enterprises on both sides of the Atlantic and really throughout the world are focused on GDPR compliance. So that's a hot issue that was discussed this morning in the keynote, and so what we're going to be doing over the next two days, we're going to be having experts from Hortonworks, the show's host, as well as IBM, Hortonworks is one of their lead partners, as well as a customer, Munich Re, will appear on theCUBE and I'll be interviewing them about not just GDPR but really the trends facing the Big Data industry. Hadoop, of course, Hortonworks got started about seven years ago as one of the solution providers that was focused on commercializing the open source Hadoop code base, and they've come quite a ways. They've had their recent financials were very good. They continue to rock 'n' roll on the growth side and customer acquisitions and deal sizes. So we'll be talking a little bit later to Scott Gnau, their chief technology officer, who did the core keynote this morning. He'll be talking not only about how the business is doing but about a new product announcement, the Data Steward Studio that Hortonworks announced overnight. It is directly related to or useful, this new solution, for GDPR compliance, and we'll ask Scott to bring us more insight there. But what we'll be doing over the next two days is extracting signal from noise. The Big Data space continues to grow and develop. Hadoop has been around for a number of years now, but in many ways it's been superseded in the agenda as the priorities of enterprises that are building applications from data by some newer primarily open source technology such as Apache Spark, TensorFlow for building deep learning and so forth. We'll be discussing the trends towards the deepening of the open source data analytics stack with our guest. We'll be talking with a European based reinsurance company, Munich Re, about the data lake that they have built for their internal operations, and we'll be asking their, Andres Kohlmaier, their lead of data engineering, to discuss how they're using it, how they're managing their data lake, and possibly to give us some insight about it will serve them in achieving GDPR compliance and sustaining it going forward. So what we will be doing is that we'll be looking at trends, not just in compliance, not just in the underlying technologies, but the applications that Hadoop and Spark and so forth, these technologies are being used for, and the applications are really, the same initiatives in Europe are world-wide in terms of what enterprises are doing. They're moving away from Big Data environments built primarily on data at rest, that's where Hadoop has been, the sweet spot, towards more streaming architectures. And so Hortonworks, as I said the show's host, has been going more deeply towards streaming architectures with its investments in NiFi and so forth. We'll be asking them to give us some insight about where they're going with that. We'll also be looking at the growth of multi-cloud Big Data environments. What we're seeing is that there's a trend in the marketplace away from predominately premises-based Big Data platforms towards public cloud-based Big Data platforms. And so Hortonworks, they are partners with a number of the public cloud providers, the IBM that I mentioned. They've also got partnerships with Microsoft Azure, with Amazon Web Services, with Google and so forth. We'll be looking, we'll be asking our guest to give us some insight about where they're going in terms of their support for multi-clouds, support for edge computing, analytics, and the internet of things. Big Data increasingly is evolving towards more of a focus on serving applications at the edge like mobile devices that have autonomous smarts like for self-driving vehicles. Big Data is critically important for feeding, for modeling and building the AI needed to power the intelligence and endpoints. Not just self-driving cars but intelligent appliances, conversational user interfaces for mobile devices for our consumer appliances like, you know, Amazon's got their Alexa, Apple's got their Siri and so forth. So we'll be looking at those trends as well towards pushing more of that intelligence towards the edge and the power and the role of Big Data and data driven algorithms, like machine learning, and driving those kinds of applications. So what we see in the Wikibon, the team that I'm embedded within, we have published just recently our updated forecast for the Big Data analytics market, and we've identified key trends that are... revolutionizing and disrupting and changing the market for Big Data analytics. So among the core trends, I mentioned the move towards multi-clouds. The move towards a more public cloud-based big data environments in the enterprise, I'll be asking Hortonworks, who of course built their business and their revenue stream primarily on on-premises deployments, to give us a sense for how they plan to evolve as a business as their customers move towards more public cloud facing deployments. And IBM, of course, will be here in force. We have tomorrow, which is a Thursday. We have several representatives from IBM to talk about their initiatives and partnerships with Hortonworks and others in the area of metadata management, in the area of machine learning and AI development tools and collaboration platforms. We'll be also discussing the push by IBM and Hortonworks to enable greater depths of governance applied to enterprise deployments of Big Data, both data governance, which is an area where Hortonworks and IBM as partners have achieved a lot of traction in terms of recognition among the pace setters in data governance in the multi-cloud, unstructured, Big Data environments, but also model governments. The governing, the version controls and so forth of machine learning and AI models. Model governance is a huge push by enterprises who increasingly are doing data science, which is what machine learning is all about. Taking that competency, that practice, and turning into more of an industrialized pipeline of building and training and deploying into an operational environment, a steady stream of machine-learning models into multiple applications, you know, edge applications, conversational UIs, search engines, eCommerce environments that are driven increasingly by machine learning that's able to process Big Data in real time and deliver next best actions and so forth more intelligence into all applications. So we'll be asking Hortonworks and IBM to net out where they're going with their partnership in terms of enabling a multi-layered governance environment to enable this pipeline, this machine-learning pipeline, this data science pipeline, to be deployed it as an operational capability into more organizations. Also, one of the areas where I'll be probing our guest is to talk about automation in the machine learning pipeline. That's been a hot theme that Wikibon has seen in our research. A lot of vendors in the data science arena are adding automation capabilities to their machine-learning tools. Automation is critically important for productivity. Data scientists as a discipline are in limited supply. I mean experienced, trained, seasoned data scientists fetch a high price. There aren't that many of them, so more of the work they do needs to be automated. It can be automated by a mature tool, increasingly mature tools on the market, a growing range of vendors. I'll be asking IBM and Hortonworks to net out where they're going with automation in sight of their Big Data, their machine learning tools and partnerships going forward. So really what we're going to be doing over the next few days is looking at these trends, but it's going to come back down to GDPR as a core envelope that many companies attending this event, DataWorks Summit, Berlin, are facing. So I'm James Kobielus with theCUBE. Thank you very much for joining us, and we look forward to starting our interviews in just a little while. Our first up will be Scott Gnau from Hortonworks. Thank you very much. (upbeat music)

Published Date : Apr 18 2018

SUMMARY :

Brought to you by Hortonworks. and enterprises on both sides of the Atlantic

ENTITIES

Entity	Category	Confidence
James Kobielus	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Scott Gnau	PERSON	0.99+
Andres Kohlmaier	PERSON	0.99+
Apple	ORGANIZATION	0.99+
European Union	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
General Data Protection Regulation	TITLE	0.99+
Scott	PERSON	0.99+
Google	ORGANIZATION	0.99+
Amazon Web Services	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
two days	QUANTITY	0.99+
Munich Re	ORGANIZATION	0.99+
Thursday	DATE	0.99+
Siri	TITLE	0.99+
GDPR	TITLE	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
Berlin, Germany	LOCATION	0.99+
Wikibon	ORGANIZATION	0.99+
first	QUANTITY	0.99+
Data Steward Studio	ORGANIZATION	0.98+
both	QUANTITY	0.98+
tomorrow	DATE	0.98+
DataWorks Summit	EVENT	0.98+
Atlantic	LOCATION	0.98+
one	QUANTITY	0.98+
Berlin	LOCATION	0.98+
both sides	QUANTITY	0.97+
DataWorks Summit 2018	EVENT	0.97+
Apache	ORGANIZATION	0.96+
Hadoop	TITLE	0.95+
Alexa	TITLE	0.94+
this year	DATE	0.94+
Spark	TITLE	0.92+
2018	EVENT	0.91+
EU	ORGANIZATION	0.91+
Dataworks Summit 2018	EVENT	0.88+
TensorFlow	ORGANIZATION	0.81+
this morning	DATE	0.77+
about seven years ago	DATE	0.76+
Azure	TITLE	0.7+
next two days	DATE	0.68+
Five weeks	QUANTITY	0.62+
NiFi	TITLE	0.59+
European	LOCATION	0.59+
theCUBE	ORGANIZATION	0.58+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Dataworks Summit EU 2018: