Victoria Stasiewicz, Harley-Davidson Motor Company | IBM DataOps 2020
from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hi everybody this is Dave Volante and welcome to this special digital cube presentation sponsored by IBM we're going to focus in on data op data ops in action a lot of practitioners tell us that they really have challenges operationalizing in infusing AI into the data pipeline we're going to talk to some practitioners and really understand how they're solving this problem and really pleased to bring Victoria stayshia vich who's the Global Information Systems Manager for information management at harley-davidson Vik thanks for coming to the cube great to see you wish we were face to face but really appreciate your coming on in this manner that's okay that's why technology's great right so you you are steeped in a data role at harley-davidson can you describe a little bit about what you're doing and what that role is like definitely so obviously a manager of information management >> governance at harley-davidson and what my team is charged with is building out data governance at an enterprise level as well as supporting the AI and machine learning technologies within my function right so I have a portfolio that portfolio really includes DNA I and governance and also our master data and reference data and data quality function if you're familiar with the dama wheel of course what I can tell you is that my team did an excellent job within this last year in 2019 standing up the infrastructure so those technologies right specific to governance as well as their newer more modern warehouse on cloud technologies and cloud objects tour which also included Watson Studio and Watson Explorer so many of the IBM errs of the world might hear about obviously IBM ISEE or work on it directly we stood that up in the cloud as well as db2 warehouse and cloud like I said in cloud object store we spent about the first five months of last year standing that infrastructure up working on the workflow ensuring that access security management was all set up and can within the platform and what we did the last half of the year right was really start to collect that metadata as well as the data itself and bring the metadata into our metadata repository which is rx metadata base without a tie FCE and then also bring that into our db2 warehouse on cloud environment so we were able to start with what we would consider our dealer domain for harley-davidson and bring those dimensions within to db2 warehouse on cloud which was never done before a lot of the information that we were collecting and bringing together for the analytics team lived in disparate data sources throughout the enterprise so the goal right was to stop with redundant data across the enterprise eliminate some of those disparity to source data resources right and bring it into a centralized repository for reporting okay Wow we got a lot to unpack here Victoria so but let me start with sort of the macro picture I mean years ago you see the data was this thing that had to be managed and it still does but it was a cost was largely a liability you know governance was sort of front and center sometimes you know it was the tail that wagged the value dog and then the whole Big Data movement comes in and everybody wants to be data-driven and so you saw some pretty big changes in just the way in which people looked at data they wanted to you know mine that data and make it an asset versus just a straight liability so what what are the changes that you discerned in in data and in your organization over the last let's say half a decade we to tell you the truth we started looking at access management and the ability to allow some of our users to do some rapid prototyping that they could never do before so what more and more we're seeing as far as data citizens or data scientists right or even analysts throughout most enterprises is it well they want access to the information they want it now they want speed to insight at this moment using pretty much minimal Viable Product they may not need the entire data set and they don't want to have to go through leaps and bounds right to just get access to that information or to bring that information into necessarily a centralized location so while I talk about our db2 warehouse on cloud and that's an excellent example of one we actually need to model data we know that this is data that we trust right that's going to be called upon many many times from many many analysts right there's other information out there that people are collecting because there's so much big data right there's so many ways to enrich your data within your organization for your customer reporting the people are really trying to tap into those third-party datasets so what my team has done what we're seeing right change throughout the industry is that a lot of teams and a lot of enterprises are looking at s technologists how can we enable our scientists and our analysts right the ability to access data virtually so instead of repeating right recuperating redundant data sources we're actually ambling data virtualization at harley-davidson and we've been doing that first working with our db2 warehouse on cloud and connecting to some of our other trusted versions of data warehouses that we have throughout the enterprise that being our dealer warehouse as well to enable obviously analysts to do some quick reporting without having to bring all that data together that is a big change I see the fact that we were able to tackle that that's allowed technology to get back ahead because most backup Furnish say most organizations right have given IT the bad rap wrap up it takes too long to get what we need my technologists cannot give me my data at my fingertips in a timely manner to not allow for speed to insight and answers the business questions at point of time of delivery most and we've supplied data to our analysts right they're able to calculate aggregate brief the reporting metrics to get those answers back to the business but they're a week two weeks too late the information is no longer relevant so data virtualization through data Ops is one of the ways and we've been able to speed that up and act as a catalyst for data delivery but we've also done though and I see this quite a bit is well that's excellent we still need to start classifying our information and labeling that at the system level we've seen most most enterprises right I worked at Blue Cross as well with IBM tool had the same struggle they were trying to eliminate their technology debt reduce their spend reduce the time it takes for resources working on technologies to maintain technologies they want to reduce their their IT portfolio of assets and capabilities that they license today so what do they do to do that it's time to start taking a look at what systems should be classified as essential systems versus those systems that are disparate and could be eliminated and that starts with data governance right so okay so your your main focus is on governance and you talked about real people want answers now they don't want to have to wait they don't want to go big waterfall process so what was what would you say was sort of some of the top challenges in terms of just operationalizing your data pipelining getting to the point that you are today you know I have to be quite honest um standing up the governance framework the methodology behind it right to get it data owners data stewards at a catalog established that was not necessarily the heavy lifting the heavy lifting really came with I'm setting up a brand new infrastructure in the cloud for us to be quite honest um we with IBM partnered and said you know what we're going to the cloud and these tools had never been implemented in the cloud before we were kind of the first do it so some of the struggles that we aren't they or took on and we're actually um standing up the infrastructure security and access management network pipeline access right VPN issues things of that nature I would say is some of the initial roadblocks we went through but after we overcame those challenges with the help of IBM and the patience of both the Harley and IBM team it became quite easy to roll out these technologies to other users the nice thing is right we at harley-davidson have been taking the time to educate our users today up for example we had what we call the data bytes a Lunch and Learn and so in that Lunch and Learn what we did is we took our entire GIS team our global information services team which is all of IT through these new technologies it was a form of over 250 people with our CIO and CTO on and taking them through how do we use these tools what are the purpose of schools why do we need governance to maintain these pools why is metadata management important to the organization that piece of it seems to be much easier than just our initial scanning it up so it's good enough to start letting users in well sounds like you had real sponsorship from from leadership and input from leadership and they were kind of leaning into the whole process first of all is that true and how important is that for success oh it's essential we often said when we were first standing up the tools to be quite honest is our CIO really understand what it is that were for standing up as our CIO really understand governance because we didn't have the time to really get that face-to-face interaction with our leadership so I myself made it a mandate having done this previously at Blue Cross to get in front of my CIO and my CTO and educate them on what it is we are exactly standing up and once we did that it was very easy to get at an executive steering committee as well as an executive membership Council right I'm boarded with our governance council and now they're the champions of that it's never easy that was selling governance to leadership and the ROI is never easy because it's not something that you can easily calculate it's something that has to show its return on investment over time and that means that you're bringing dashboards you're educating your CIO and CTO and how you're bringing people together how groups are now talking about solutions and technologies in a domain like environment right where you have people from at an international level we have people from Asia from Europe from China that join calls every Thursday to talk about the data quality issue specific to dealer for example what systems were using what solutions on there are on the horizon to solve them so that now instead of having people from other countries that work for Harley as well as just even within the US right creating one-off solutions that are answering the same business questions using the same data but creating multiple solutions right to solve the same problem we're now bringing them together and we're solving together and we're prioritizing those as well so that return on investment necessarily down the line you can show that is you know what instead of this printing into five projects we've now turned this into one and instead of implementing four systems we've now implemented one and guess what we have the business rules and we have the classification I to this system so that you CIO or CTO right you now go in and reference this information a glossary a user interface something that a c-level can read interpret understand quickly write dissect the information for their own need without having to take the long lengthy time to talk to a technologist about what does this information mean and how do i how do I use it you know what's interesting is take away based on what you just said is you know harley-davidson is an iconic brand cool company with fuckin motorcycles right and but you came out of an insurance background which is a regulated industry where you know governance is sort of de rigueur right I mean it's it's a table steak so how are you able that arleigh to balance the sort of tension between governance and the sort of business flexibility so there's different there's different lovers I would call them right obviously within healthcare in insurance the importance becomes compliance and risk and regulatory right they're big pushes gosh I don't want to pay millions of dollars for fines start classifying this information enabling security reducing risk all that good stuff right for Harley Davidson it was much different it was more or less we have a mission right we want to invest in our technologies yet we want to save money how do we cut down the technologies that we have today reduce our technology spend yet and able our users have access to more information in a timely manner that's not an easy that's not an easy pass right um so what we did is I took that my married governance part-time model and our time model is specific worried they're gonna tolerate an application we're going to invest in an application we're gonna migrate an application or we're gonna eliminate that so I'm talking to my CIO said you know we can use governance the classifier system help act as a catalyst when we start to implement what it is we're doing with our technologies which technologies are we going to eliminate tomorrow we as IG cannot do that unless we discuss some sort of business impact unless you look at a system and say how many users are using us what reports are essential the business teams do they need this system is this something that's critical for users today to eat is this duplicate 'iv right we have many systems that are solving the same capability that is how I sold that off my CIO and it made it important to the rest of the organization they knew we had a mandate in front of us we had to reduce technology spend and that really for me made it quite easy and talking to other technologists as well as business users on why if governance is important why it's going to help harley-davidson and their mission to save money going forward I will tell you though that the businesses of biggest value right is the fact that they now owns the data they're more likely right to use your master data management systems like I said I'm the owner of our MDM services today as well as our customer knowledge center today they're more likely to access and reference those systems if they feel that they built the rule and they own the rules in those systems so that's another big value add to write as many business users will say ok you know you think I need access to this system I don't know I'm not sure I don't know what the data looks like within it is it easily accessible is it gonna give me the reporting metrics that I need that's where governance will help them for example like our state a scientist beam using a catalog right you can browse your metadata you can look at your server your database your tables your fields understand what those mean understand the classifications the formulas within them right they're all documented in a glossary versus having to go and ask for access to six different systems throughout the enterprise hoping right that's Sally next few that told you you needed access to these systems was right just to find out that you don't need the access and hence it took you three days to get the access anyway that's why a glossary is really a catalyst a lot of that well it's really interesting what you just said about you went through essentially an application rationalization exercise which which saved your organization money that's not always easy because you know businesses even though the you know IIT may be spending money on these systems businesses don't want to give them up but you were able to use it sounds like you're able to use data to actually inform which applications you should invest in versus you know sunset as well you'd sounds like you were giving the business a real incentive to go through this exercise because they ended up as you said owning the data well then what's great right who wants pepper what's using the old power and driving a new car if they can buy the I'm sorry bull owning the old car right driving the old park if they can truly own a new car for a cheaper price nobody wants to do that I've even looked at Tesla's right I can buy a Tesla for the same prices I can buy a minivan these days I think I might buy the Tesla but what I will say is that we also use that we built out a capabilities model with our enterprise architecture team and building that capabilities model we started to bucket our technologies within those capabilities models right like AI machine learning warehouse on cloud technologies are even warehousing technologies governance technologies you know those types of classifications today integrations technologies reporting technologies by kind of grouping all those into a capabilities matrix right and was Eve it was easy for us to then start identifying alright we're the system owners for these when it comes to technologies who are the business users for these based on that right let's go talk to this team the dealer management team about access to this new profiling capability with an IBM or this new catalog with an IBM right that they can use stay versus this sharepoint excel spreadsheets they were using for their metadata management right or the profiling tools that were old you know ten years old some of our sa peoples that they were using before right let's sell them on the noodles and start migrating them that becomes pretty easy because I mean unless you're buying some really old technology when you give people a purview into those new tools and those new capabilities especially with some of the IBM's new tools we have today there the buy-in is pretty quick it's pretty easy to sell somebody on something shiny and it's much easier to use than some of the older technologies let's talk about the business impact in my understanding is you were trying to increase the improve the effectiveness of the dealers not not just go out and brute force sign up more dealers were you able to achieve that outcome and what does it meant for your business yes actually we were so right now what we did is we slipped something called a CDR and that's our consumer dealer and development repository right that's where a lot of our dealer information resides today it's actually argue ler warehouse we had some other systems that we're collecting that information Kalinin like speed for example we were able to bring all that reporting man to one location sunset some of those other technologies but then also enable for that centralized reporting layer which we've also used data virtualization to start to marry submit information to db2 warehouse on cloud for users so we're allowing basically those that want to access CDR and our db2 warehouse and called dealer information to do that within one reporting layer um in doing so we were able to create something called a dealer harmonized ID really which is our version of we have so many dealers today right and some of those dealers actually sell bytes some of those dealers sell just apparel material some of those dealers just sell parts of those dealers right can we have certain you IDs kind of a golden record mastered information if you will right bought back in reporting so that we can accurately assess the dealer performance up to two years ago right it was really hard to do that we had information spread out all over it was really hard to get a good handle on what dealers were performing and what dealers weren't because was it was tough right for our analysts to wrangle that information and bring it together it took time many times we you would get multiple answers to one business question which is never good right one one question should have one answer if it's accurate um that is what we worked on within us last year and that's where really our CEO so the value at is now we can start to act on what dealers are performing at an optimal level versus what dealers are struggling and that's allowed even our account reps or field steel fields that right to go work with those struggling dealers and start to share with them the information of you know these are what some of our stronger dealer performing dealers are doing today that is making them more affecting it inside sorry effective is selling bikes you know these are some of the best practices you can implement that's where we make right our field staff smarter and our dealers smarter we're not looking to shut down dealers we just want to educate them on how to do better well and to your point about a single version of the truth if you will the the lines of business kind of owning their own data that's critical because you're not spending all your time you know pointing at fingers trying to understand the data if the if the users own it then they own it I and so how does self-service fit in were you able to achieve you know some level of self-service how far could you and you go there we were we did use some other tools I'll be quite honest aside from just the IBM tools today that's enabled some of that self-service analytics si PSAC was one of them Alteryx is another big one that we like to that our analyst team likes to use today to wrangle and bring that data together but that really allowed for our analysts spread in our reporting teams to start to build their own derivations their transformations for reporting themselves because they're more user interface space versus going in the backend systems and having to write straight pull right sequel queries things of that nature it usually takes time then requires a deeper level of knowledge then what we'd like to allow for our analysts right to have today I can say the same thing with the data scientist scheme you know they use a lot of the R and Python coding today what we've tried to do is make sure that the tools are available so that they can do everything they need to do without us really having to touch anything and I will be quite honest we have not had to touch much of anything we have a very skilled data scientist team so I will tell you that the tools that we put in place today Watson explore some of the other tools as well they haven't that has enabled the data scientists to really quickly move do what they need to do for reporting and even in cases where maybe Watson or Explorer may not be the optimal technology right for them to use we've also allowed for them to use some of our other resources are open source resources to build some of the models that they're that they were looking to build well I'm glad you brought that up Victoria because IBM makes a big deal out of you know being open and so you're kind of confirming that you can use third-party tools and and if you like you know tool vendor ABC you can use them as part of this framework yeah it's really about TCO right so take a look at what you have today if it's giving you at least 80% of what you need for the business or for your data scientists or reporting analysts right to do what they need to do it's to me it's good enough right it's giving you what you need it's pretty hard to find anything that's exactly 100 percent it's about being open though to when you're scientists or your analysts find another reporting tool right that requires minimal maintenance or let's just say did a scientist flow that requires minimal maintenance it's free right because it's open source IBM can integrate with that and we can enable that to be a quicker way for them to do what they need to do versus telling them no right you can't use the other technologies or the other open source information out there for you today you've got to use just these spools that's pretty tough to do and I think that would shut most IT shops down pretty quick within larger enterprises because it would really act as a roadblock to allow most of our teams right to do what they need to do reporting well last question so a big part of this the data ops you know borrowing from DevOps is this continuous integration continuous improvement you know kind of ongoing MOOC raising the bar if you will what do you see going from here oh I definitely see I see a world I see a world of where we're allowing for that rapid prototyping like I was talking about earlier I see a very big change in the data industry you said it yourself right we are in the brink of big data and it's only gonna get bigger there are organizations right right now that have literally understood how much of an asset their data really is today but they're starting to sell their data ah to other of their similar people are smaller industries right similar vendors within the industry similar spaces right so they can make money off of it because data truly is an asset now the key to it that was obviously making sure that it's curated that it's cleanse that it's rusted so that when you are selling that back you can't really make money off of it but we've seen though and what I really see on the horizon is the ability to vet that data right is in the past what have you been doing the past decade or just buying big data sets we're trusting that it's you know good information we're not doing a lot of profiling at most organizations arts you're gonna pay this big top dollar you're gonna receive this third-party data set and you're not gonna be able to use it the way you need to what I see on the horizon is us being able to do that you know we're building data Lake houses if you will right we're building um really those Hadoop link environments those data lakes right where we can land information we can quickly access it we can quickly profile it with tools that it would take hours for an ALICE write a bunch of queries do to understand what the profile of that data look like we did that recently at harley-davidson we bought and some third-party data evaluated it quickly through our agile scrum team right within a week we determined that the data was not as good as it as the vendor selling it right pretty much sold it to be and so we told the vendor we want our money back the data is not what we thought it would be please take the data sets back now that's just one use case right but to me that was golden it's a way to save money and start betting the data that we're buying otherwise what I would see in the past or what I've seen in the past is many organizations are just buying up big third-party data sets and just saying okay now it's good enough we think that you know just because it comes from the motorcycle and council right for motorcycles and operation Council then it's good enough it may not be it's up to us to start vetting that and that's where technology is going to change data is going to change analytics is going to change is a great example you're really in the cutting edge of this whole data op trend really appreciate you coming on the cube and sharing your insights and there's more in the crowd chatter crowd chatter off the Thank You Victoria for coming on the cube well thank you Dave nice to meet you it was a pleasure speaking with you yeah really a pleasure was all ours and thank you for watching everybody as I say crowd chatting at flash data op or more detail more Q&A this is Dave Volante for the cube keep it right there but right back right after this short break [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Volante | PERSON | 0.99+ |
Asia | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
five projects | QUANTITY | 0.99+ |
Victoria Stasiewicz | PERSON | 0.99+ |
China | LOCATION | 0.99+ |
Tesla | ORGANIZATION | 0.99+ |
Victoria | PERSON | 0.99+ |
Harley | ORGANIZATION | 0.99+ |
Harley Davidson | ORGANIZATION | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
Blue Cross | ORGANIZATION | 0.99+ |
Blue Cross | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
US | LOCATION | 0.99+ |
Harley-Davidson Motor Company | ORGANIZATION | 0.99+ |
harley-davidson | PERSON | 0.99+ |
six different systems | QUANTITY | 0.99+ |
Dave Volante | PERSON | 0.99+ |
last year | DATE | 0.99+ |
over 250 people | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
three days | QUANTITY | 0.99+ |
100 percent | QUANTITY | 0.99+ |
IG | ORGANIZATION | 0.99+ |
Watson | TITLE | 0.99+ |
Boston | LOCATION | 0.99+ |
tomorrow | DATE | 0.98+ |
one business question | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
ABC | ORGANIZATION | 0.98+ |
one answer | QUANTITY | 0.97+ |
four systems | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
Victoria stayshia | PERSON | 0.96+ |
Watson Explorer | TITLE | 0.96+ |
Explorer | TITLE | 0.96+ |
2019 | DATE | 0.96+ |
agile | ORGANIZATION | 0.95+ |
Vik | PERSON | 0.95+ |
two years ago | DATE | 0.95+ |
one question | QUANTITY | 0.95+ |
two weeks | QUANTITY | 0.94+ |
both | QUANTITY | 0.93+ |
excel | TITLE | 0.93+ |
Sally | PERSON | 0.92+ |
a week | QUANTITY | 0.92+ |
harley | ORGANIZATION | 0.91+ |
Watson Studio | TITLE | 0.91+ |
last half of the year | DATE | 0.89+ |
Alteryx | ORGANIZATION | 0.88+ |
millions of dollars | QUANTITY | 0.87+ |
single version | QUANTITY | 0.86+ |
every Thursday | QUANTITY | 0.86+ |
R | TITLE | 0.85+ |