Victoria Stasiewicz, Harley-Davidson Motor Company | IBM DataOps 2020
from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hi everybody this is Dave Volante and welcome to this special digital cube presentation sponsored by IBM we're going to focus in on data op data ops in action a lot of practitioners tell us that they really have challenges operationalizing in infusing AI into the data pipeline we're going to talk to some practitioners and really understand how they're solving this problem and really pleased to bring Victoria stayshia vich who's the Global Information Systems Manager for information management at harley-davidson Vik thanks for coming to the cube great to see you wish we were face to face but really appreciate your coming on in this manner that's okay that's why technology's great right so you you are steeped in a data role at harley-davidson can you describe a little bit about what you're doing and what that role is like definitely so obviously a manager of information management >> governance at harley-davidson and what my team is charged with is building out data governance at an enterprise level as well as supporting the AI and machine learning technologies within my function right so I have a portfolio that portfolio really includes DNA I and governance and also our master data and reference data and data quality function if you're familiar with the dama wheel of course what I can tell you is that my team did an excellent job within this last year in 2019 standing up the infrastructure so those technologies right specific to governance as well as their newer more modern warehouse on cloud technologies and cloud objects tour which also included Watson Studio and Watson Explorer so many of the IBM errs of the world might hear about obviously IBM ISEE or work on it directly we stood that up in the cloud as well as db2 warehouse and cloud like I said in cloud object store we spent about the first five months of last year standing that infrastructure up working on the workflow ensuring that access security management was all set up and can within the platform and what we did the last half of the year right was really start to collect that metadata as well as the data itself and bring the metadata into our metadata repository which is rx metadata base without a tie FCE and then also bring that into our db2 warehouse on cloud environment so we were able to start with what we would consider our dealer domain for harley-davidson and bring those dimensions within to db2 warehouse on cloud which was never done before a lot of the information that we were collecting and bringing together for the analytics team lived in disparate data sources throughout the enterprise so the goal right was to stop with redundant data across the enterprise eliminate some of those disparity to source data resources right and bring it into a centralized repository for reporting okay Wow we got a lot to unpack here Victoria so but let me start with sort of the macro picture I mean years ago you see the data was this thing that had to be managed and it still does but it was a cost was largely a liability you know governance was sort of front and center sometimes you know it was the tail that wagged the value dog and then the whole Big Data movement comes in and everybody wants to be data-driven and so you saw some pretty big changes in just the way in which people looked at data they wanted to you know mine that data and make it an asset versus just a straight liability so what what are the changes that you discerned in in data and in your organization over the last let's say half a decade we to tell you the truth we started looking at access management and the ability to allow some of our users to do some rapid prototyping that they could never do before so what more and more we're seeing as far as data citizens or data scientists right or even analysts throughout most enterprises is it well they want access to the information they want it now they want speed to insight at this moment using pretty much minimal Viable Product they may not need the entire data set and they don't want to have to go through leaps and bounds right to just get access to that information or to bring that information into necessarily a centralized location so while I talk about our db2 warehouse on cloud and that's an excellent example of one we actually need to model data we know that this is data that we trust right that's going to be called upon many many times from many many analysts right there's other information out there that people are collecting because there's so much big data right there's so many ways to enrich your data within your organization for your customer reporting the people are really trying to tap into those third-party datasets so what my team has done what we're seeing right change throughout the industry is that a lot of teams and a lot of enterprises are looking at s technologists how can we enable our scientists and our analysts right the ability to access data virtually so instead of repeating right recuperating redundant data sources we're actually ambling data virtualization at harley-davidson and we've been doing that first working with our db2 warehouse on cloud and connecting to some of our other trusted versions of data warehouses that we have throughout the enterprise that being our dealer warehouse as well to enable obviously analysts to do some quick reporting without having to bring all that data together that is a big change I see the fact that we were able to tackle that that's allowed technology to get back ahead because most backup Furnish say most organizations right have given IT the bad rap wrap up it takes too long to get what we need my technologists cannot give me my data at my fingertips in a timely manner to not allow for speed to insight and answers the business questions at point of time of delivery most and we've supplied data to our analysts right they're able to calculate aggregate brief the reporting metrics to get those answers back to the business but they're a week two weeks too late the information is no longer relevant so data virtualization through data Ops is one of the ways and we've been able to speed that up and act as a catalyst for data delivery but we've also done though and I see this quite a bit is well that's excellent we still need to start classifying our information and labeling that at the system level we've seen most most enterprises right I worked at Blue Cross as well with IBM tool had the same struggle they were trying to eliminate their technology debt reduce their spend reduce the time it takes for resources working on technologies to maintain technologies they want to reduce their their IT portfolio of assets and capabilities that they license today so what do they do to do that it's time to start taking a look at what systems should be classified as essential systems versus those systems that are disparate and could be eliminated and that starts with data governance right so okay so your your main focus is on governance and you talked about real people want answers now they don't want to have to wait they don't want to go big waterfall process so what was what would you say was sort of some of the top challenges in terms of just operationalizing your data pipelining getting to the point that you are today you know I have to be quite honest um standing up the governance framework the methodology behind it right to get it data owners data stewards at a catalog established that was not necessarily the heavy lifting the heavy lifting really came with I'm setting up a brand new infrastructure in the cloud for us to be quite honest um we with IBM partnered and said you know what we're going to the cloud and these tools had never been implemented in the cloud before we were kind of the first do it so some of the struggles that we aren't they or took on and we're actually um standing up the infrastructure security and access management network pipeline access right VPN issues things of that nature I would say is some of the initial roadblocks we went through but after we overcame those challenges with the help of IBM and the patience of both the Harley and IBM team it became quite easy to roll out these technologies to other users the nice thing is right we at harley-davidson have been taking the time to educate our users today up for example we had what we call the data bytes a Lunch and Learn and so in that Lunch and Learn what we did is we took our entire GIS team our global information services team which is all of IT through these new technologies it was a form of over 250 people with our CIO and CTO on and taking them through how do we use these tools what are the purpose of schools why do we need governance to maintain these pools why is metadata management important to the organization that piece of it seems to be much easier than just our initial scanning it up so it's good enough to start letting users in well sounds like you had real sponsorship from from leadership and input from leadership and they were kind of leaning into the whole process first of all is that true and how important is that for success oh it's essential we often said when we were first standing up the tools to be quite honest is our CIO really understand what it is that were for standing up as our CIO really understand governance because we didn't have the time to really get that face-to-face interaction with our leadership so I myself made it a mandate having done this previously at Blue Cross to get in front of my CIO and my CTO and educate them on what it is we are exactly standing up and once we did that it was very easy to get at an executive steering committee as well as an executive membership Council right I'm boarded with our governance council and now they're the champions of that it's never easy that was selling governance to leadership and the ROI is never easy because it's not something that you can easily calculate it's something that has to show its return on investment over time and that means that you're bringing dashboards you're educating your CIO and CTO and how you're bringing people together how groups are now talking about solutions and technologies in a domain like environment right where you have people from at an international level we have people from Asia from Europe from China that join calls every Thursday to talk about the data quality issue specific to dealer for example what systems were using what solutions on there are on the horizon to solve them so that now instead of having people from other countries that work for Harley as well as just even within the US right creating one-off solutions that are answering the same business questions using the same data but creating multiple solutions right to solve the same problem we're now bringing them together and we're solving together and we're prioritizing those as well so that return on investment necessarily down the line you can show that is you know what instead of this printing into five projects we've now turned this into one and instead of implementing four systems we've now implemented one and guess what we have the business rules and we have the classification I to this system so that you CIO or CTO right you now go in and reference this information a glossary a user interface something that a c-level can read interpret understand quickly write dissect the information for their own need without having to take the long lengthy time to talk to a technologist about what does this information mean and how do i how do I use it you know what's interesting is take away based on what you just said is you know harley-davidson is an iconic brand cool company with fuckin motorcycles right and but you came out of an insurance background which is a regulated industry where you know governance is sort of de rigueur right I mean it's it's a table steak so how are you able that arleigh to balance the sort of tension between governance and the sort of business flexibility so there's different there's different lovers I would call them right obviously within healthcare in insurance the importance becomes compliance and risk and regulatory right they're big pushes gosh I don't want to pay millions of dollars for fines start classifying this information enabling security reducing risk all that good stuff right for Harley Davidson it was much different it was more or less we have a mission right we want to invest in our technologies yet we want to save money how do we cut down the technologies that we have today reduce our technology spend yet and able our users have access to more information in a timely manner that's not an easy that's not an easy pass right um so what we did is I took that my married governance part-time model and our time model is specific worried they're gonna tolerate an application we're going to invest in an application we're gonna migrate an application or we're gonna eliminate that so I'm talking to my CIO said you know we can use governance the classifier system help act as a catalyst when we start to implement what it is we're doing with our technologies which technologies are we going to eliminate tomorrow we as IG cannot do that unless we discuss some sort of business impact unless you look at a system and say how many users are using us what reports are essential the business teams do they need this system is this something that's critical for users today to eat is this duplicate 'iv right we have many systems that are solving the same capability that is how I sold that off my CIO and it made it important to the rest of the organization they knew we had a mandate in front of us we had to reduce technology spend and that really for me made it quite easy and talking to other technologists as well as business users on why if governance is important why it's going to help harley-davidson and their mission to save money going forward I will tell you though that the businesses of biggest value right is the fact that they now owns the data they're more likely right to use your master data management systems like I said I'm the owner of our MDM services today as well as our customer knowledge center today they're more likely to access and reference those systems if they feel that they built the rule and they own the rules in those systems so that's another big value add to write as many business users will say ok you know you think I need access to this system I don't know I'm not sure I don't know what the data looks like within it is it easily accessible is it gonna give me the reporting metrics that I need that's where governance will help them for example like our state a scientist beam using a catalog right you can browse your metadata you can look at your server your database your tables your fields understand what those mean understand the classifications the formulas within them right they're all documented in a glossary versus having to go and ask for access to six different systems throughout the enterprise hoping right that's Sally next few that told you you needed access to these systems was right just to find out that you don't need the access and hence it took you three days to get the access anyway that's why a glossary is really a catalyst a lot of that well it's really interesting what you just said about you went through essentially an application rationalization exercise which which saved your organization money that's not always easy because you know businesses even though the you know IIT may be spending money on these systems businesses don't want to give them up but you were able to use it sounds like you're able to use data to actually inform which applications you should invest in versus you know sunset as well you'd sounds like you were giving the business a real incentive to go through this exercise because they ended up as you said owning the data well then what's great right who wants pepper what's using the old power and driving a new car if they can buy the I'm sorry bull owning the old car right driving the old park if they can truly own a new car for a cheaper price nobody wants to do that I've even looked at Tesla's right I can buy a Tesla for the same prices I can buy a minivan these days I think I might buy the Tesla but what I will say is that we also use that we built out a capabilities model with our enterprise architecture team and building that capabilities model we started to bucket our technologies within those capabilities models right like AI machine learning warehouse on cloud technologies are even warehousing technologies governance technologies you know those types of classifications today integrations technologies reporting technologies by kind of grouping all those into a capabilities matrix right and was Eve it was easy for us to then start identifying alright we're the system owners for these when it comes to technologies who are the business users for these based on that right let's go talk to this team the dealer management team about access to this new profiling capability with an IBM or this new catalog with an IBM right that they can use stay versus this sharepoint excel spreadsheets they were using for their metadata management right or the profiling tools that were old you know ten years old some of our sa peoples that they were using before right let's sell them on the noodles and start migrating them that becomes pretty easy because I mean unless you're buying some really old technology when you give people a purview into those new tools and those new capabilities especially with some of the IBM's new tools we have today there the buy-in is pretty quick it's pretty easy to sell somebody on something shiny and it's much easier to use than some of the older technologies let's talk about the business impact in my understanding is you were trying to increase the improve the effectiveness of the dealers not not just go out and brute force sign up more dealers were you able to achieve that outcome and what does it meant for your business yes actually we were so right now what we did is we slipped something called a CDR and that's our consumer dealer and development repository right that's where a lot of our dealer information resides today it's actually argue ler warehouse we had some other systems that we're collecting that information Kalinin like speed for example we were able to bring all that reporting man to one location sunset some of those other technologies but then also enable for that centralized reporting layer which we've also used data virtualization to start to marry submit information to db2 warehouse on cloud for users so we're allowing basically those that want to access CDR and our db2 warehouse and called dealer information to do that within one reporting layer um in doing so we were able to create something called a dealer harmonized ID really which is our version of we have so many dealers today right and some of those dealers actually sell bytes some of those dealers sell just apparel material some of those dealers just sell parts of those dealers right can we have certain you IDs kind of a golden record mastered information if you will right bought back in reporting so that we can accurately assess the dealer performance up to two years ago right it was really hard to do that we had information spread out all over it was really hard to get a good handle on what dealers were performing and what dealers weren't because was it was tough right for our analysts to wrangle that information and bring it together it took time many times we you would get multiple answers to one business question which is never good right one one question should have one answer if it's accurate um that is what we worked on within us last year and that's where really our CEO so the value at is now we can start to act on what dealers are performing at an optimal level versus what dealers are struggling and that's allowed even our account reps or field steel fields that right to go work with those struggling dealers and start to share with them the information of you know these are what some of our stronger dealer performing dealers are doing today that is making them more affecting it inside sorry effective is selling bikes you know these are some of the best practices you can implement that's where we make right our field staff smarter and our dealers smarter we're not looking to shut down dealers we just want to educate them on how to do better well and to your point about a single version of the truth if you will the the lines of business kind of owning their own data that's critical because you're not spending all your time you know pointing at fingers trying to understand the data if the if the users own it then they own it I and so how does self-service fit in were you able to achieve you know some level of self-service how far could you and you go there we were we did use some other tools I'll be quite honest aside from just the IBM tools today that's enabled some of that self-service analytics si PSAC was one of them Alteryx is another big one that we like to that our analyst team likes to use today to wrangle and bring that data together but that really allowed for our analysts spread in our reporting teams to start to build their own derivations their transformations for reporting themselves because they're more user interface space versus going in the backend systems and having to write straight pull right sequel queries things of that nature it usually takes time then requires a deeper level of knowledge then what we'd like to allow for our analysts right to have today I can say the same thing with the data scientist scheme you know they use a lot of the R and Python coding today what we've tried to do is make sure that the tools are available so that they can do everything they need to do without us really having to touch anything and I will be quite honest we have not had to touch much of anything we have a very skilled data scientist team so I will tell you that the tools that we put in place today Watson explore some of the other tools as well they haven't that has enabled the data scientists to really quickly move do what they need to do for reporting and even in cases where maybe Watson or Explorer may not be the optimal technology right for them to use we've also allowed for them to use some of our other resources are open source resources to build some of the models that they're that they were looking to build well I'm glad you brought that up Victoria because IBM makes a big deal out of you know being open and so you're kind of confirming that you can use third-party tools and and if you like you know tool vendor ABC you can use them as part of this framework yeah it's really about TCO right so take a look at what you have today if it's giving you at least 80% of what you need for the business or for your data scientists or reporting analysts right to do what they need to do it's to me it's good enough right it's giving you what you need it's pretty hard to find anything that's exactly 100 percent it's about being open though to when you're scientists or your analysts find another reporting tool right that requires minimal maintenance or let's just say did a scientist flow that requires minimal maintenance it's free right because it's open source IBM can integrate with that and we can enable that to be a quicker way for them to do what they need to do versus telling them no right you can't use the other technologies or the other open source information out there for you today you've got to use just these spools that's pretty tough to do and I think that would shut most IT shops down pretty quick within larger enterprises because it would really act as a roadblock to allow most of our teams right to do what they need to do reporting well last question so a big part of this the data ops you know borrowing from DevOps is this continuous integration continuous improvement you know kind of ongoing MOOC raising the bar if you will what do you see going from here oh I definitely see I see a world I see a world of where we're allowing for that rapid prototyping like I was talking about earlier I see a very big change in the data industry you said it yourself right we are in the brink of big data and it's only gonna get bigger there are organizations right right now that have literally understood how much of an asset their data really is today but they're starting to sell their data ah to other of their similar people are smaller industries right similar vendors within the industry similar spaces right so they can make money off of it because data truly is an asset now the key to it that was obviously making sure that it's curated that it's cleanse that it's rusted so that when you are selling that back you can't really make money off of it but we've seen though and what I really see on the horizon is the ability to vet that data right is in the past what have you been doing the past decade or just buying big data sets we're trusting that it's you know good information we're not doing a lot of profiling at most organizations arts you're gonna pay this big top dollar you're gonna receive this third-party data set and you're not gonna be able to use it the way you need to what I see on the horizon is us being able to do that you know we're building data Lake houses if you will right we're building um really those Hadoop link environments those data lakes right where we can land information we can quickly access it we can quickly profile it with tools that it would take hours for an ALICE write a bunch of queries do to understand what the profile of that data look like we did that recently at harley-davidson we bought and some third-party data evaluated it quickly through our agile scrum team right within a week we determined that the data was not as good as it as the vendor selling it right pretty much sold it to be and so we told the vendor we want our money back the data is not what we thought it would be please take the data sets back now that's just one use case right but to me that was golden it's a way to save money and start betting the data that we're buying otherwise what I would see in the past or what I've seen in the past is many organizations are just buying up big third-party data sets and just saying okay now it's good enough we think that you know just because it comes from the motorcycle and council right for motorcycles and operation Council then it's good enough it may not be it's up to us to start vetting that and that's where technology is going to change data is going to change analytics is going to change is a great example you're really in the cutting edge of this whole data op trend really appreciate you coming on the cube and sharing your insights and there's more in the crowd chatter crowd chatter off the Thank You Victoria for coming on the cube well thank you Dave nice to meet you it was a pleasure speaking with you yeah really a pleasure was all ours and thank you for watching everybody as I say crowd chatting at flash data op or more detail more Q&A this is Dave Volante for the cube keep it right there but right back right after this short break [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volante | PERSON | 0.99+ |
Asia | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
five projects | QUANTITY | 0.99+ |
Victoria Stasiewicz | PERSON | 0.99+ |
China | LOCATION | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
Victoria | PERSON | 0.99+ |
Harley | ORGANIZATION | 0.99+ |
Harley Davidson | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Blue Cross | ORGANIZATION | 0.99+ |
Blue Cross | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
Harley-Davidson Motor Company | ORGANIZATION | 0.99+ |
harley-davidson | PERSON | 0.99+ |
six different systems | QUANTITY | 0.99+ |
Dave Volante | PERSON | 0.99+ |
last year | DATE | 0.99+ |
over 250 people | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
three days | QUANTITY | 0.99+ |
100 percent | QUANTITY | 0.99+ |
IG | ORGANIZATION | 0.99+ |
Watson | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
tomorrow | DATE | 0.98+ |
one business question | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
ABC | ORGANIZATION | 0.98+ |
one answer | QUANTITY | 0.97+ |
four systems | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
Victoria stayshia | PERSON | 0.96+ |
Watson Explorer | TITLE | 0.96+ |
Explorer | TITLE | 0.96+ |
2019 | DATE | 0.96+ |
agile | ORGANIZATION | 0.95+ |
Vik | PERSON | 0.95+ |
two years ago | DATE | 0.95+ |
one question | QUANTITY | 0.95+ |
two weeks | QUANTITY | 0.94+ |
both | QUANTITY | 0.93+ |
excel | TITLE | 0.93+ |
Sally | PERSON | 0.92+ |
a week | QUANTITY | 0.92+ |
harley | ORGANIZATION | 0.91+ |
Watson Studio | TITLE | 0.91+ |
last half of the year | DATE | 0.89+ |
Alteryx | ORGANIZATION | 0.88+ |
millions of dollars | QUANTITY | 0.87+ |
single version | QUANTITY | 0.86+ |
every Thursday | QUANTITY | 0.86+ |
R | TITLE | 0.85+ |
Piotr Mierzejewski, IBM | Dataworks Summit EU 2018
>> Announcer: From Berlin, Germany, it's theCUBE covering Dataworks Summit Europe 2018 brought to you by Hortonworks. (upbeat music) >> Well hello, I'm James Kobielus and welcome to theCUBE. We are here at Dataworks Summit 2018, in Berlin, Germany. It's a great event, Hortonworks is the host, they made some great announcements. They've had partners doing the keynotes and the sessions, breakouts, and IBM is one of their big partners. Speaking of IBM, from IBM we have a program manager, Piotr, I'll get this right, Piotr Mierzejewski, your focus is on data science machine learning and data science experience which is one of the IBM Products for working data scientists to build and to train models in team data science enterprise operational environments, so Piotr, welcome to theCUBE. I don't think we've had you before. >> Thank you. >> You're a program manager. I'd like you to discuss what you do for IBM, I'd like you to discuss Data Science Experience. I know that Hortonworks is a reseller of Data Science Experience, so I'd like you to discuss the partnership going forward and how you and Hortonworks are serving your customers, data scientists and others in those teams who are building and training and deploying machine learning and deep learning, AI, into operational applications. So Piotr, I give it to you now. >> Thank you. Thank you for inviting me here, very excited. This is a very loaded question, and I would like to begin, before I get actually to why the partnership makes sense, I would like to begin with two things. First, there is no machine learning about data. And second, machine learning is not easy. Especially, especially-- >> James: I never said it was! (Piotr laughs) >> Well there is this kind of perception, like you can have a data scientist working on their Mac, working on some machine learning algorithms and they can create a recommendation engine, let's say in a two, three days' time. This is because of the explosion of open-source in that space. You have thousands of libraries, from Python, from R, from Scala, you have access to Spark. All these various open-source offerings that are enabling data scientists to actually do this wonderful work. However, when you start talking about bringing machine learning to the enterprise, this is not an easy thing to do. You have to think about governance, resiliency, the data access, actual model deployments, which are not trivial. When you have to expose this in a uniform fashion to actually various business units. Now all this has to actually work in a private cloud, public clouds environment, on a variety of hardware, a variety of different operating systems. Now that is not trivial. (laughs) Now when you deploy a model, as the data scientist is going to deploy the model, he needs to be able to actually explain how the model was created. He has to be able to explain what the data was used. He needs to ensure-- >> Explicable AI, or explicable machine learning, yeah, that's a hot focus of our concern, of enterprises everywhere, especially in a world where governance and tracking and lineage GDPR and so forth, so hot. >> Yes, you've mentioned all the right things. Now, so given those two things, there's no ML web data, and ML is not easy, why the partnership between Hortonworks and IBM makes sense, well, you're looking at the number one industry leading big data plot from Hortonworks. Then, you look at a DSX local, which, I'm proud to say, I've been there since the first line of code, and I'm feeling very passionate about the product, is the merger between the two, ability to integrate them tightly together gives your data scientists secure access to data, ability to leverage the spark that runs inside a Hortonworks cluster, ability to actually work in a platform like DSX that doesn't limit you to just one kind of technology but allows you to work with multiple technologies, ability to actually work on not only-- >> When you say technologies here, you're referring to frameworks like TensorFlow, and-- >> Precisely. Very good, now that part I'm going to get into very shortly, (laughs) so please don't steal my thunder. >> James: Okay. >> Now, what I was saying is that not only DSX and Hortonworks integrated to the point that you can actually manage your Hadoop clusters, Hadoop environments within a DSX, you can actually work on your Python models and your analytics within DSX and then push it remotely to be executed where your data is. Now, why is this important? If you work with the data that's megabytes, gigabytes, maybe you know you can pull it in, but in truly what you want to do when you move to the terabytes and the petabytes of data, what happens is that you actually have to push the analytics to where your data resides, and leverage for example YARN, a resource manager, to distribute your workloads and actually train your models on your actually HDP cluster. That's one of the huge volume propositions. Now, mind you to say this is all done in a secure fashion, with ability to actually install DSX on the edge notes of the HDP clusters. >> James: Hmm... >> As of HDP 264, DSX has been certified to actually work with HDP. Now, this partnership embarked, we embarked on this partnership about 10 months ago. Now, often happens that there is announcements, but there is not much materializing after such announcement. This is not true in case of DSX and HDP. We have had, just recently we have had a release of the DSX 1.2 which I'm super excited about. Now, let's talk about those open-source toolings in the various platforms. Now, you don't want to force your data scientists to actually work with just one environment. Some of them might prefer to work on Spark, some of them like their RStudio, they're statisticians, they like R, others like Python, with Zeppelin, say Jupyter Notebook. Now, how about Tensorflow? What are you going to do when actually, you know, you have to do the deep learning workloads, when you want to use neural nets? Well, DSX does support ability to actually bring in GPU notes and do the Tensorflow training. As a sidecar approach, you can append the note, you can scale the platform horizontally and vertically, and train your deep learning workloads, and actually remove the sidecar out. So you should put it towards the cluster and remove it at will. Now, DSX also actually not only satisfies the needs of your programmer data scientists, that actually code in Python and Scala or R, but actually allows your business analysts to work and create models in a visual fashion. As of DSX 1.2, you can actually, we have embedded, integrated, an SPSS modeler, redesigned, rebranded, this is an amazing technology from IBM that's been on for a while, very well established, but now with the new interface, embedded inside a DSX platform, allows your business analysts to actually train and create the model in a visual fashion and, what is beautiful-- >> Business analysts, not traditional data scientists. >> Not traditional data scientists. >> That sounds equivalent to how IBM, a few years back, was able to bring more of a visual experience to SPSS proper to enable the business analysts of the world to build and do data-mining and so forth with structured data. Go ahead, I don't want to steal your thunder here. >> No, no, precisely. (laughs) >> But I see it's the same phenomenon, you bring the same capability to greatly expand the range of data professionals who can do, in this case, do machine learning hopefully as well as professional, dedicated data scientists. >> Certainly, now what we have to also understand is that data science is actually a team sport. It involves various stakeholders from the organization. From executive, that actually gives you the business use case to your data engineers that actually understand where your data is and can grant the access-- >> James: They manage the Hadoop clusters, many of them, yeah. >> Precisely. So they manage the Hadoop clusters, they actually manage your relational databases, because we have to realize that not all the data is in the datalinks yet, you have legacy systems, which DSX allows you to actually connect to and integrate to get data from. It also allows you to actually consume data from streaming sources, so if you actually have a Kafka message cob and actually were streaming data from your applications or IoT devices, you can actually integrate all those various data sources and federate them within the DSX to use for machine training models. Now, this is all around predictive analytics. But what if I tell you that right now with the DSX you can actually do prescriptive analytics as well? With the 1.2, again I'm going to be coming back to this 1.2 DSX with the most recent release we have actually added decision optimization, an industry-leading solution from IBM-- >> Prescriptive analytics, gotcha-- >> Yes, for prescriptive analysis. So now if you have warehouses, or you have a fleet of trucks, or you want to optimize the flow in let's say, a utility company, whether it be for power or could it be for, let's say for water, you can actually create and train prescriptive models within DSX and deploy them the same fashion as you will deploy and manage your SPSS streams as well as the machine learning models from Spark, from Python, so with XGBoost, Tensorflow, Keras, all those various aspects. >> James: Mmmhmm. >> Now what's going to get really exciting in the next two months, DSX will actually bring in natural learning language processing and text analysis and sentiment analysis by Vio X. So Watson Explorer, it's another offering from IBM... >> James: It's called, what is the name of it? >> Watson Explorer. >> Oh Watson Explorer, yes. >> Watson Explorer, yes. >> So now you're going to have this collaborative message platform, extendable! Extendable collaborative platform that can actually install and run in your data centers without the need to access internet. That's actually critical. Yes, we can deploy an IWS. Yes we can deploy an Azure. On Google Cloud, definitely we can deploy in Softlayer and we're very good at that, however in the majority of cases we find that the customers have challenges for bringing the data out to the cloud environments. Hence, with DSX, we designed it to actually deploy and run and scale everywhere. Now, how we have done it, we've embraced open source. This was a huge shift within IBM to realize that yes we do have 350,000 employees, yes we could develop container technologies, but why? Why not embrace what is actually industry standards with the Docker and equivalent as they became industry standards? Bring in RStudio, the Jupyter, the Zeppelin Notebooks, bring in the ability for a data scientist to choose the environments they want to work with and actually extend them and make the deployments of web services, applications, the models, and those are actually full releases, I'm not only talking about the model, I'm talking about the scripts that can go with that ability to actually pull the data in and allow the models to be re-trained, evaluated and actually re-deployed without taking them down. Now that's what actually becomes, that's what is the true differentiator when it comes to DSX, and all done in either your public or private cloud environments. >> So that's coming in the next version of DSX? >> Outside of DSX-- >> James: We're almost out of time, so-- >> Oh, I'm so sorry! >> No, no, no. It's my job as the host to let you know that. >> Of course. (laughs) >> So if you could summarize where DSX is going in 30 seconds or less as a product, the next version is, what is it? >> It's going to be the 1.2.1. >> James: Okay. >> 1.2.1 and we're expecting to release at the end of June. What's going to be unique in the 1.2.1 is infusing the text and sentiment analysis, so natural language processing with predictive and prescriptive analysis for both developers and your business analysts. >> James: Yes. >> So essentially a platform not only for your data scientist but pretty much every single persona inside the organization >> Including your marketing professionals who are baking sentiment analysis into what they do. Thank you very much. This has been Piotr Mierzejewski of IBM. He's a Program Manager for DSX and for ML, AI, and data science solutions and of course a strong partnership is with Hortonworks. We're here at Dataworks Summit in Berlin. We've had two excellent days of conversations with industry experts including Piotr. We want to thank everyone, we want to thank the host of this event, Hortonworks for having us here. We want to thank all of our guests, all these experts, for sharing their time out of their busy schedules. We want to thank everybody at this event for all the fascinating conversations, the breakouts have been great, the whole buzz here is exciting. GDPR's coming down and everybody's gearing up and getting ready for that, but everybody's also focused on innovative and disruptive uses of AI and machine learning and business, and using tools like DSX. I'm James Kobielus for the entire CUBE team, SiliconANGLE Media, wishing you all, wherever you are, whenever you watch this, have a good day and thank you for watching theCUBE. (upbeat music)
SUMMARY :
brought to you by Hortonworks. and to train models in team data science and how you and Hortonworks are serving your customers, Thank you for inviting me here, very excited. from Python, from R, from Scala, you have access to Spark. GDPR and so forth, so hot. that doesn't limit you to just one kind of technology Very good, now that part I'm going to get into very shortly, and then push it remotely to be executed where your data is. Now, you don't want to force your data scientists of the world to build and do data-mining (laughs) you bring the same capability the business use case to your data engineers James: They manage the Hadoop clusters, With the 1.2, again I'm going to be coming back to this as you will deploy and manage your SPSS streams in the next two months, DSX will actually bring in and allow the models to be re-trained, evaluated It's my job as the host to let you know that. (laughs) is infusing the text and sentiment analysis, and of course a strong partnership is with Hortonworks.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Piotr Mierzejewski | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
James | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Piotr | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
30 seconds | QUANTITY | 0.99+ |
Berlin | LOCATION | 0.99+ |
IWS | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
Spark | TITLE | 0.99+ |
two | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
Scala | TITLE | 0.99+ |
Berlin, Germany | LOCATION | 0.99+ |
350,000 employees | QUANTITY | 0.99+ |
DSX | ORGANIZATION | 0.99+ |
Mac | COMMERCIAL_ITEM | 0.99+ |
two things | QUANTITY | 0.99+ |
RStudio | TITLE | 0.99+ |
DSX | TITLE | 0.99+ |
DSX 1.2 | TITLE | 0.98+ |
both developers | QUANTITY | 0.98+ |
second | QUANTITY | 0.98+ |
GDPR | TITLE | 0.98+ |
Watson Explorer | TITLE | 0.98+ |
Dataworks Summit 2018 | EVENT | 0.98+ |
first line | QUANTITY | 0.98+ |
Dataworks Summit Europe 2018 | EVENT | 0.98+ |
SiliconANGLE Media | ORGANIZATION | 0.97+ |
end of June | DATE | 0.97+ |
TensorFlow | TITLE | 0.97+ |
thousands of libraries | QUANTITY | 0.96+ |
R | TITLE | 0.96+ |
Jupyter | ORGANIZATION | 0.96+ |
1.2.1 | OTHER | 0.96+ |
two excellent days | QUANTITY | 0.95+ |
Dataworks Summit | EVENT | 0.94+ |
Dataworks Summit EU 2018 | EVENT | 0.94+ |
SPSS | TITLE | 0.94+ |
one | QUANTITY | 0.94+ |
Azure | TITLE | 0.92+ |
one kind | QUANTITY | 0.92+ |
theCUBE | ORGANIZATION | 0.92+ |
HDP | ORGANIZATION | 0.91+ |