Piotr Mierzejewski, IBM | Dataworks Summit EU 2018

>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)

Published Date : Apr 19 2018

SUMMARY :

brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.

ENTITIES

Entity	Category	Confidence
Piotr Mierzejewski	PERSON	0.99+
James Kobielus	PERSON	0.99+
James	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Piotr	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
Berlin	LOCATION	0.99+
IWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
Spark	TITLE	0.99+
two	QUANTITY	0.99+
First	QUANTITY	0.99+
Scala	TITLE	0.99+
Berlin, Germany	LOCATION	0.99+
350,000 employees	QUANTITY	0.99+
DSX	ORGANIZATION	0.99+
Mac	COMMERCIAL_ITEM	0.99+
two things	QUANTITY	0.99+
RStudio	TITLE	0.99+
DSX	TITLE	0.99+
DSX 1.2	TITLE	0.98+
both developers	QUANTITY	0.98+
second	QUANTITY	0.98+
GDPR	TITLE	0.98+
Watson Explorer	TITLE	0.98+
Dataworks Summit 2018	EVENT	0.98+
first line	QUANTITY	0.98+
Dataworks Summit Europe 2018	EVENT	0.98+
SiliconANGLE Media	ORGANIZATION	0.97+
end of June	DATE	0.97+
TensorFlow	TITLE	0.97+
thousands of libraries	QUANTITY	0.96+
R	TITLE	0.96+
Jupyter	ORGANIZATION	0.96+
1.2.1	OTHER	0.96+
two excellent days	QUANTITY	0.95+
Dataworks Summit	EVENT	0.94+
Dataworks Summit EU 2018	EVENT	0.94+
SPSS	TITLE	0.94+
one	QUANTITY	0.94+
Azure	TITLE	0.92+
one kind	QUANTITY	0.92+
theCUBE	ORGANIZATION	0.92+
HDP	ORGANIZATION	0.91+

GDPR on theCUBE, Highlight Reel #3 | GDPR Day

(bouncy, melodic music) - The world's kind of revolting against these mega-siloed platforms. - That's the risk of having such centralized control over technology. If you remember in the old days, when Microsoft dominance was rising, all you had to do was target Windows as a virus platform, and you were able to impact thousands of businesses even in the early Internet days, within hours. And it's the same thing happening right now, as a weaponization of these social media platforms, and Google's search engine technology and so forth, is the same side effect now. The centralization, that control, is the problem. One of the reasons I love the Blockstack technology, and Blockchain in general, is the ability to decentralize these things right now. And the most passionate thing I care about nowadays is being driven out of Europe, where they have a lot more maturity in terms of handling these nuisance-- - You mean the check being driven out of Europe. - Their loss, - The loss, okay. - being driven out of Europe and-- - Be specific, we'd like an example. - The major deadline that's coming up in May 25th of 2018 is GDPR, General Data Protection Regulation, where European citizens now, and any company, American or otherwise, catering to European citizens, has to respond to things like the Right To Be Forgotten request. You've got 24 hours as a global corporation with European operations, to respond to European citizens, EU citizens, Right To Be Forgotten request where all the personally identifiable information, the PII, has to be removed and auto-trailed, proving it's been removed, has to be gone from two, three hundred internal systems within 24 hours. And this has teeth by the way. It's not like the 2.7 billion dollar fine that Google just flipped away casually. This has up to 4% of your global profits per incident where you don't meet that requirement. - And so what we're seeing in the case of GDPR is that's an accelerant to adopt Cloud, because we actually isolate the data down into regions and the way we've architected our platform from day one is always been a true multi-tenant SaaS technology platform. And so there's not that worry about data resiliency and where it resides, and how you get access to it, because we've built all that up. And so, when we go through all of our own attestations, whether it's SOC Type One, Type Two, GDPR as an initiative, what we're doin' for HIPAA, what we're doin' for plethora of other things, usually the CSO says, "Oh, I get it, you're way more secure, now help me," because I don't want the folks in development or operations to go amuck, so to speak, I want to be an enabler, not Doctor No. - I'm a developer, I search for data, I'm just searching for data. - That's right. - What's the controls available for making sure that I don't go afoul of GDPR. - So absolutely. So we have phenomenal security capabilities that are built into our product, both from an identification point of view, giving rights and privileges, as well as protecting that data from any third party access. All of this information is going to be compliant with these regulations, beyond GDPR. There's enormous regulations around data that require us to keep our securities levels as high as we go. In fact, we would argue that AWS itself is now typically more secure, more secure, - [Mike] They've done the work. - than your classic data center. - [Mike] Yeah, they've done the work. - AI-ers, explicable machine learning. - Yeah, that's a hot focus, - Indeed. - or concern of enterprises everywhere, especially in a world where governance and tracking and lineage, - Precisely. - GDPR and so forth, so hot. - Yes, you have mentioned all the right things. Now, so given those two things, there's normal web data, NML is not easy, why the partnership between Hortonworks and IBM makes sense? Well, you're looking at the number one, industry leading big data platform, Hortonworks, Then you look at a DSX Local, which I'm proud to say I've been there since the first line of code, and I'm feeling very passionate about the product, is the merge between the two. Ability to integrate them tightly together, gives your data scientists secure access to data, ability to leverage the Spark that runs inside of Hortonworks Glassdoor, ability to actually work in a platform like DSX, that doesn't limit you to just one kind of technology but allows you to work within multiple technologies, Ability to actually work on your, not only Spark-- - You say technologies here, are you referring to frameworks like TensorFlow, and-- - [Piotr] Precisely. - Okay, okay. - Very good, now, that part I'm gonna get into very shortly. So please don't steal my thunder. - So GDPR you see as a big opportunity for Cloud providers, like Azure. Or they bring something to the table, right? - Yeah, they bring different things to the table. You have elements of data where you need the on-premise solution, you need to have control, and you need to have that restriction about where that data sits. And some of the talks here that are going on at the moment, is understanding, again, how critical and how risky is that data? What is it you're keepin' and how high does that come up in our business value it is? So if that's gonna be on your imperma-solution, there may be other data that can get push out into the Cloud, but, I would say, Azure, the AWS Suites and Google, they are really pushing down that security, what you can do, how you protect it, how you can protect that data, and you've got the capabilities of things like LSR or GSR, and having that global reach or that local repositories, for the object storage. So you can start to control by policies. You can write into this country, but you're not allowed to go to this country, and you're not allowed to go to that one, and Cloud does give you that to a certain element, but also then, you have to step back into, maybe the sorts of things that-- - So does that make Cloud Orchestrator more valuable, or has it still got more work to do? Because under what Adam was saying, is that the point and click, is a great way to provision, right?

Published Date : May 25 2018

SUMMARY :

- So GDPR you see as a big opportunity

ENTITIES

Entity	Category	Confidence
Hortonworks	ORGANIZATION	0.99+
IBM	ORGANIZATION	0.99+
Adam	PERSON	0.99+
two	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
Europe	LOCATION	0.99+
May 25th of 2018	DATE	0.99+
GDPR	TITLE	0.99+
Microsoft	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
General Data Protection Regulation	TITLE	0.99+
24 hours	QUANTITY	0.99+
two things	QUANTITY	0.99+
Mike	PERSON	0.99+
first line	QUANTITY	0.99+
HIPAA	TITLE	0.98+
Cloud	TITLE	0.98+
One	QUANTITY	0.98+
both	QUANTITY	0.98+
thousands of businesses	QUANTITY	0.96+
Windows	TITLE	0.96+
Piotr	PERSON	0.95+
up to 4%	QUANTITY	0.95+
TensorFlow	TITLE	0.94+
one kind	QUANTITY	0.93+
European	OTHER	0.93+
2.7 billion dollar	QUANTITY	0.91+
Azure	ORGANIZATION	0.89+
AWS Suites	ORGANIZATION	0.89+
Spark	TITLE	0.88+
three hundred internal systems	QUANTITY	0.86+
EU	LOCATION	0.85+
Hortonworks Glassdoor	ORGANIZATION	0.84+
NML	ORGANIZATION	0.83+
GDPR Day	EVENT	0.78+
day one	QUANTITY	0.75+
American	OTHER	0.74+
CSO	ORGANIZATION	0.72+
LSR	TITLE	0.7+
Right To Be Forgotten	OTHER	0.68+
GSR	TITLE	0.62+
Type Two	OTHER	0.62+
To	OTHER	0.6+
DSX	ORGANIZATION	0.59+
One	OTHER	0.59+
Highlight Reel	ORGANIZATION	0.56+
#3	QUANTITY	0.54+
one	QUANTITY	0.5+
SOC Type	TITLE	0.49+
DSX Local	TITLE	0.44+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Piotr: