Image Title

Search Results for Caitlin Alfre:

IBM DataOps in Action Panel | IBM DataOps 2020


 

from the cube studios in Palo Alto in Boston connecting with thought leaders all around the world this is a cube conversation hi buddy welcome to this special noob digital event where we're focusing in on data ops data ops in Acton with generous support from friends at IBM let me set up the situation here there's a real problem going on in the industry and that's that people are not getting the most out of their data data is plentiful but insights perhaps aren't what's the reason for that well it's really a pretty complicated situation for a lot of organizations there's data silos there's challenges with skill sets and lack of skills there's tons of tools out there sort of a tools brief the data pipeline is not automated the business lines oftentimes don't feel as though they own the data so that creates some real concerns around data quality and a lot of finger-point quality the opportunity here is to really operationalize the data pipeline and infuse AI into that equation and really attack their cost-cutting and revenue generation opportunities that are there in front of you think about this virtually every application this decade is going to be infused with AI if it's not it's not going to be competitive and so we have organized a panel of great practitioners to really dig in to these issues first I want to introduce Victoria Stassi with who's an industry expert in a top at Northwestern you two'll very great to see you again thanks for coming on excellent nice to see you as well and Caitlin Alfre is the director of AI a vai accelerator and also part of the peak data officers organization at IBM who has actually eaten some of it his own practice what a creep let me say it that way Caitlin great to see you again and Steve Lewis good to see you again see vice president director of management associated a bank and Thompson thanks for coming on thanks Dave make speaker alright guys so you heard my authority with in terms of operationalizing getting the most insight hey data is wonderful insights aren't but getting insight in real time is critical in this decade each of you is a sense as to where you are on that journey or Victoria your taste because you're brand new to Northwestern Mutual but you have a lot of deep expertise in in health care and manufacturing financial services but where you see just the general industry climate and we'll talk about the journeys that you are on both personally and professionally so it's all fair sure I think right now right again just me going is you need to have speech insight right so as I experienced going through many organizations are all facing the same challenges today and a lot of those pounds is hard where do my to live is my data trust meaning has a bank curated has been Clinton's visit qualified has a big a lot of that is ready what we see often happen is businesses right they know their KPIs they know their business metrics but they can't find where that data Linda Barragan asked there's abundant data disparity all over the place but it is replicated because it's not well managed it's a lot of what governance in the platform of pools that governance to speak right offer fact it organizations pay is just that piece of it I can tell you where data is I can tell you what's trusted that when you can quickly access information and bring back answers to business questions that is one answer not many answers leaving the business to question what's the right path right which is the correct answer which which way do I go at the executive level that's the biggest challenge where we want the industry to go moving forward right is one breaking that down along that information to be published quickly and to an emailing data virtualization a lot of what you see today is most businesses right it takes time to build out large warehouses at an enterprise level we need to pivot quicker so a lot of what businesses are doing is we're leaning them towards taking advantage of data virtualization allowing them to connect to these data sources right to bring that information back quickly so they don't have to replicate that information across different systems or different applications right and then to be able to provide that those answers back quickly also allowing for seamless access to from the analysts that are running running full speed right try and find the answers as quickly as they find great okay and I want to get into that sort of how news Steve let me go to you one of the things that we talked about earlier was just infusing this this mindset of a data cult and thinking about data as a service so talk a little bit about how you got started what was the starting NICUs through that sure I think the biggest thing for us there is to change that mindset from data being just for reporting or things that have happened in the past to do some insights on us and some data that already existed well we've tried to shift the mentality there is to start to use data and use that into our actual applications so that we're providing those insight in real time through the applications as they're consumed helping with customer experience helping with our personalization and an optimization of our application the way we've started down that path or kind of the journey that we're still on was to get the foundation laid birch so part of that has been making sure we have access to all that data whether it's through virtualization like vic talked about or whether it's through having more of the the data selected in a data like that that where we have all of that foundational data available as opposed to waiting for people to ask for it that's been the biggest culture shift for us is having that availability of data to be ready to be able to provide those insights as opposed to having to make the businesses or the application or asked for that day Oh Kailyn when I first met into pulp andari the idea wobble he paid up there yeah I was asking him okay where does a what's the role of that at CBO and and he mentioned a number of things but two of the things that stood out is you got to understand how data affect the monetization of your company that doesn't mean you know selling the data what role does it play and help cut cost or ink revenue or productivity or no customer service etc the other thing he said was you've got a align with the lines of piss a little sounded good and this is several years ago and IBM took it upon itself Greek its own champagne I was gonna say you know dogfooding whatever but it's not easy just flip a switch and an infuse a I and automate the data pipeline you guys had to go you know some real of pain to get there and you did you were early on you took some arrows and now you're helping your customers better on thin debt but talk about some of the use cases that where you guys have applied this obviously the biggest organization you know one of the biggest in the world the real challenge is they're sure I'm happy today you know we've been on this journey for about four years now so we stood up our first book to get office 2016 and you're right it was all about getting what data strategy offered and executed internally and we want to be very transparent because as you've mentioned you know a lot of challenges possible think differently about the value and so as we wrote that data strategy at that time about coming to enterprise and then we quickly of pivoted to see the real opportunity and value of infusing AI across all of our needs were close to your question on a couple of specific use cases I'd say you know we invested that time getting that platform built and implemented and then we were able to take advantage of that one particular example that I've been really excited about I have a practitioner on my team who's a supply chain expert and a couple of years ago he started building out supply chain solution so that we can better mitigate our risk in the event of a natural disaster like the earthquake hurricane anywhere around the world and be cuz we invest at the time and getting the date of pipelines right getting that all of that were created and cleaned and the quality of it we were able to recently in recent weeks add the really critical Kovach 19 data and deliver that out to our employees internally for their preparation purposes make that available to our nonprofit partners and now we're starting to see our first customers take advantage too with the health and well-being of their employees mine so that's you know an example I think where and I'm seeing a lot of you know my clients I work with they invest in the data and AI readiness and then they're able to take advantage of all of that work work very quickly in an agile fashion just spin up those out well I think one of the keys there who Kaelin is that you know we can talk about that in a covet 19 contact but it's that's gonna carry through that that notion of of business resiliency is it's gonna live on you know in this post pivot world isn't it absolutely I think for all of us the importance of investing in the business continuity and resiliency type work so that we know what to do in the event of either natural disaster or something beyond you know it'll be grounded in that and I think it'll only become more important for us to be able to act quickly and so the investment in those platforms and approach that we're taking and you know I see many of us taking will really be grounded in that resiliency so Vic and Steve I want to dig into this a little bit because you know we use this concept of data op we're stealing from DevOps and there are similarities but there are also differences now let's talk about the data pipeline if you think about the data pipeline as a sort of quasi linear process where you're investing data and you might be using you know tools but whether it's Kafka or you know we have a favorite who will you have and then you're transforming that that data and then you got a you know discovery you got to do some some exploration you got to figure out your metadata catalog and then you're trying to analyze that data to get some insights and then you ultimately you want to operationalize it so you know and and you could come up with your own data pipeline but generally that sort of concept is is I think well accepted there's different roles and unlike DevOps where it might be the same developer who's actually implementing security policies picking it the operations in in data ops there might be different roles and fact very often are there's data science there's may be an IT role there's data engineering there's analysts etc so Vic I wonder if you could you could talk about the challenges in in managing and automating that data pipeline applying data ops and how practitioners can overcome them yeah I would say a perfect example would be a client that I was just recently working for where we actually took a team and we built up a team using agile methodologies that framework right we're rapidly ingesting data and then proving out data's fit for purpose right so often now we talk a lot about big data and that is really where a lot of industries are going they're trying to add an enrichment to their own data sources so what they're doing is they're purchasing these third-party data sets so in doing so right you make that initial purchase but what many companies are doing today is they have no real way to vet that so they'll purchase the information they aren't going to vet it upfront they're going to bring it into an environment there it's going to take them time to understand if the data is of quality or not and by the time they do typically the sales gone and done and they're not going to ask for anything back but we were able to do it the most recent claim was use an instructure data source right bring that and ingest that with modelers using this agile team right and within two weeks we were able to bring the data in from the third-party vendor what we considered rapid prototyping right be able to profile the data understand if the data is of quality or not and then quickly figure out that you know what the data's not so in doing that we were able to then contact the vendor back tell them you know it sorry the data set up to snuff we'd like our money back we're not gonna go forward with it that's enabling businesses to be smarter with what they're doing with 30 new purchases today as many businesses right now um as much as they want to rely on their own data right they actually want to rely on cross the data from third-party sources and that's really what data Ops is allowing us to do it's allowing us to think at a broader a higher level right what to bring the information what structures can we store them in that they don't necessarily have to be modeled because a modeler is great right but if we have to take time to model all the information before we even know we want to use it that's gonna slow the process now and that's slowing the business down the business is looking for us to speed up all of our processes a lot of what we heard in the past raised that IP tends to slow us down and that's where we're trying to change that perception in the industry is no we're actually here to speed you up we have all the tools and technologies to do so and they're only getting better I would say also on data scientists right that's another piece of the pie for us if we can bring the information in and we can quickly catalog it in a metadata and burn it bring in the information in the backend data data assets right and then supply that information back to scientists gone are the days where scientists are going and asking for connections to all these different data sources waiting days for access requests to be approved just to find out that once they figure out how it with them the relationship diagram right the design looks like in that back-end database how to get to it write the code to get to it and then figure out this is not the information I need that Sally next to me right fold me the wrong information that's where the catalog comes in that's where due to absent data governance having that catalog that metadata management platform available to you they can go into a catalog without having to request access to anything quickly and within five minutes they can see the structures what if the tables look like what did the fields look like are these are these the metrics I need to bring back answers to the business that's data apps it's allowing us to speed up all of that information you know taking stuff that took months now down two weeks down two days down two hours so Steve I wonder if you could pick up on that and just help us understand what data means you we talked about earlier in our previous conversation I mentioned it upfront is this notion of you know the demand for for data access is it was through the roof and and you've gone from that to sort of more of a self-service environment where it's not IT owning the data it's really the businesses owning the data but what what is what is all this data op stuff meaning in your world sure I think it's very similar it's it's how do we enable and get access to that clicker showing the right controls showing the right processes and and building that scalability and agility and into all of it so that we're we're doing this at scale it's much more rapidly available we can discover new data separately determine if it's right or or more importantly if it's wrong similar to what what Vic described it's it's how do we enable the business to make those right decisions on whether or not they're going down the right path whether they're not the catalog is a big part of that we've also introduced a lot of frameworks around scale so just the ability to rapidly ingest data and make that available has been a key for us we've also focused on a prototyping environment so that sandbox mentality of how do we rapidly stand those up for users and and still provide some controls but have provide that ability for people to do that that exploration what we're finding is that by providing the platform and and the foundational layers that were we're getting the use cases to sort of evolve and come out of that as opposed to having the use cases prior to then go build things from we're shifting the mentality within the organization to say we don't know what we need yet let's let's start to explore that's kind of that data scientist mentality and culture it more of a way of thinking as opposed to you know an actual project or implement well I think that that cultural aspect is important of course Caitlin you guys are an AI company or at least that you know part of what you do but you know you've you for four decades maybe centuries you've been organized around different things by factoring plant but sales channel or whatever it is but-but-but-but how has the chief data officer organization within IBM been able to transform itself and and really infuse a data culture across the entire company one of the approaches you know we've taken and we talk about sort of the blueprint to drive AI transformation so that we can achieve and deliver these really high value use cases we talked about the data the technology which we've just pressed on with organizational piece of it duration are so important the change management enabling and equipping our data stewards I'll give one a civic example that I've been really excited about when we were building our platform and starting to pull districting structured unstructured pull it in our ADA stewards are spending a lot of time manually tagging and creating business metadata about that data and we identified that that was a real pain point costing us a lot of money valuable resources so we started to automate the metadata and doing that in partnership with our deep learning practitioners and some of the models that they were able to build that capability we pushed out into our contacts our product last year and one of the really exciting things for me to see is our data stewards who be so value exporters and the skills that they bring have reported that you know it's really changed the way they're able to work it's really sped up their process it's enabled them to then move on to higher value to abilities and and business benefits so they're very happy from an organizational you know completion point of view so I think there's ways to identify those use cases particularly for taste you know we drove some significant productivity savings we also really empowered and hold our data stewards we really value to make their job you know easier more efficient and and help them move on to things that they are more you know excited about doing so I think that's that you know another example of approaching taken yes so the cultural piece the people piece is key we talked a little bit about the process I want to get into a little bit into the tech Steve I wonder if you could tell us you know what's it what's the tech we have this bevy of tools I mentioned a number of them upfront you've got different data stores you've got open source pooling you've got IBM tooling what are the critical components of the technology that people should be thinking about tapping in architecture from ingestion perspective we're trying to do a lot of and a Python framework and scaleable ingestion pipe frameworks on the catalog side I think what we've done is gone with IBM PAC which provides a platform for a lot of these tools to stay integrated together so things from the discovery of data sources the cataloging the documentation of those data sources and then all the way through the actual advanced analytics and Python models and our our models and the open source ID combined with the ability to do some data prep and refinery work having that all in an integrated platform was a key to us for us that the rollout and of more of these tools in bulk as opposed to having the point solutions so that's been a big focus area for us and then on the analytic side and the web versus IDE there's a lot of different components you can go into whether it's meal soft whether it's AWS and some of the native functionalities out there you mentioned before Kafka and Anissa streams and different streaming technologies those are all the ones that are kind of in our Ketil box that we're starting to look at so and one of the keys here is we're trying to make decisions in as close to real time as possible as opposed to the business having to wait you know weeks or months and then by the time they get insights it's late and really rearview mirror so Vic your focus you know in your career has been a lot on data data quality governance master data management data from a data quality standpoint as well what are some of the key tools that you're familiar with that you've used that really have enabled you operationalize that data pipeline you know I would say I'm definitely the IBM tools I have the most experience with that also informatica though as well those are to me the two top players IBM definitely has come to the table with a suite right like Steve said cloud pack for data is really a one-stop shop so that's allowing that quick seamless access for business user versus them having to go into some of the previous versions that IBM had rolled out where you're going into different user interfaces right to find your information and that can become clunky it can add the process it can also create almost like a bad taste and if in most people's mouths because they don't want to navigate from system to system to system just to get their information so cloud pack to me definitely brings everything to the table in one in a one-stop shop type of environment in for me also though is working on the same thing and I would tell you that they haven't come up with a solution that really comes close to what IBM is done with cloud pack for data I'd be interested to see if they can bring that on the horizon but really IBM suite of tools allows for profiling follow the analytics write metadata management access to db2 warehouse on cloud those are the tools that I've worked in my past to implement as well as cloud object store to bring all that together to provide that one stop that at Northwestern right we're working right now with belieber I think calibra is a great set it pool are great garments catalog right but that's really what it's truly made for is it's a governance catalog you have to bring some other pieces to the table in order for it to serve up all the cloud pack does today which is the advanced profiling the data virtualization that cloud pack enables today the machine learning at the level where you can actually work with our and Python code and you put our notebooks inside of pack that's some of this the pieces right that are missing in some of the under vent other vendor schools today so one of the things that you're hearing here is the theme of openness others addition we've talked about a lot of tools and not IBM tools all IBM tools there there are many but but people want to use what they want to use so Kaitlin from an IBM perspective what's your commitment the openness number one but also to you know we talked a lot about cloud packs but to simplify the experience for your client well and I thank Stephen Victoria for you know speaking to their experience I really appreciate feedback and part of our approach has been to really take one the challenges that we've had I mentioned some of the capabilities that we brought forward in our cloud platform data product one being you know automating metadata generation and that was something we had to solve for our own data challenges in need so we will continue to source you know our use cases from and grounded from a practitioner perspective of what we're trying to do and solve and build and the approach we've really been taking is co-creation line and that we roll these capability about the product and work with our customers like Stephen light victorious you really solicit feedback to product route our dev teams push that out and just be very open and transparent I mean we want to deliver a seamless experience we want to do it in partnership and continue to solicit feedback and improve and roll out so no I think that will that has been our approach will continue to be and really appreciate the partnerships that we've been able to foster so we don't have a ton of time but I want to go to practitioners on the panel and ask you about key key performance indicators when I think about DevOps one of the things that we're measuring is the elapsed time the deploy applications start finished where we're measuring the amount of rework that has to be done the the quality of the deliverable what are the KPIs Victoria that are indicators of success in operationalizing date the data pipeline well I would definitely say your ability to deliver quickly right so how fast can you deliver is that is that quicker than what you've been able to do in the past right what is the user experience like right so have you been able to measure what what the amount of time was right that users are spending to bring information to the table in the past versus have you been able to reduce that time to delivery right of information business answers to business questions those are the key performance indicators to me that tell you that the suite that we've put in place today right it's providing information quickly I can get my business answers quickly but quicker than I could before and the information is accurate so being able to measure is it quality that I've been giving that I've given back or is this not is it the wrong information and yet I've got to go back to the table and find where I need to gather that from from somewhere else that to me tells us okay you know what the tools we've put in place today my teams are working quicker they're answering the questions they need to accurately that is when we know we're on the right path Steve anything you add to that I think she covered a lot of the people components the around the data quality scoring right for all the different data attributes coming up with a metric around how to measure that and and then showing that trend over time to show that it's getting better the other one that we're doing is just around overall date availability how how much data are we providing to our users and and showing that trend so when I first started you know we had somewhere in the neighborhood of 500 files that had been brought into the warehouse and and had been published and available in the neighborhood of a couple thousand fields we've grown that into weave we have thousands of cables now available so it's it's been you know hundreds of percent in scale as far as just the availability of that data how much is out there how much is is ready and available for for people to just dig in and put into their their analytics and their models and get those back into the other application so that's another key metric that we're starting to track as well so last question so I said at the top that every application is gonna need to be infused with AI this decade otherwise that application not going to be as competitive as it could be and so for those that are maybe stuck in their journey don't really know where to get started I'll start with with Caitlin and go to Victoria and then and then even bring us home what advice would you give the people that need to get going on this my advice is I think you pull the folks that are either producing or accessing your data and figure out what the rate is between I mentioned some of the data management challenges we were seeing this these processes were taking weeks and prone to error highly manual so part was ripe for AI project so identifying those use cases I think that are really causing you know the most free work and and manual effort you can move really quickly and as you build this platform out you're able to spin those up on an accelerated fashion I think identifying that and figuring out the business impact are able to drive very early on you can get going and start really seeing the value great yeah I would actually say kids I hit it on the head but I would probably add to that right is the first and foremost in my opinion right the importance around this is data governance you need to implement a data governance at an enterprise level many organizations will do it but they'll have silos of governance you really need an interface I did a government's platform that consists of a true framework of an operational model model charters right you have data domain owners data domain stewards data custodians all that needs to be defined and while that may take some work in in the beginning right the payoff down the line is that much more it's it it's allowing your business to truly own the data once they own the data and they take part in classifying the data assets for technologists and for analysts right you can start to eliminate some of the technical debt that most organizations have acquired today they can start to look at what are some of the systems that we can turn off what are some of the systems that we see valium truly build out a capability matrix we can start mapping systems right to capabilities and start to say where do we have wares or redundancy right what can we get rid of that's the first piece of it and then the second piece of it is really leveraging the tools that are out there today the IBM tools some of the other tools out there as well that enable some of the newer next-generation capabilities like unit nai right for example allowing automation for automation which right for all of us means that a lot of the analysts that are in place today they can access the information quicker they can deliver the information accurately like we've been talking about because it's been classified that pre works being done it's never too late to start but once you start that it just really acts as a domino effect to everything else where you start to see everything else fall into place all right thank you and Steve bring us on but advice for your your peers that want to get started sure I think the key for me too is like like those guys have talked about I think all everything they said is valid and accurate thing I would add is is from a starting perspective if you haven't started start right don't don't try to overthink that over plan it it started just do something and and and start the show that progress and value the use cases will come even if you think you're not there yet it's amazing once you have the national components there how some of these things start to come out of the woodwork so so it started it going may have it have that iterative approach to this and an open mindset it's encourage exploration and enablement look your organization in the eye to say why are their silos why do these things like this what are our problem what are the things getting in our way and and focus and tackle those those areas as opposed to trying to put up more rails and more boundaries and kind of encourage that silo mentality really really look at how do you how do you focus on that enablement and then the last comment would just be on scale everything should be focused on scale what you think is a one-time process today you're gonna do it again we've all been there you're gonna do it a thousand times again so prepare for that prepare forever that you're gonna do everything a thousand times and and start to instill that culture within your organization a great advice guys data bringing machine intelligence an AI to really drive insights and scaling with a cloud operating model no matter where that data live it's really great to have have three such knowledgeable practitioners Caitlyn Toria and Steve thanks so much for coming on the cube and helping support this panel all right and thank you for watching everybody now remember this panel was part of the raw material that went into a crowd chat that we hosted on May 27th Crouch at net slash data ops so go check that out this is Dave Volante for the cube thanks for watching [Music]

Published Date : May 28 2020

**Summary and Sentiment Analysis are not been shown because of improper transcript**

ENTITIES

EntityCategoryConfidence
Steve LewisPERSON

0.99+

Caitlyn ToriaPERSON

0.99+

StevePERSON

0.99+

Linda BarraganPERSON

0.99+

Dave VolantePERSON

0.99+

two weeksQUANTITY

0.99+

Victoria StassiPERSON

0.99+

Caitlin AlfrePERSON

0.99+

two hoursQUANTITY

0.99+

VicPERSON

0.99+

two daysQUANTITY

0.99+

May 27thDATE

0.99+

500 filesQUANTITY

0.99+

IBMORGANIZATION

0.99+

Palo AltoLOCATION

0.99+

PythonTITLE

0.99+

five minutesQUANTITY

0.99+

30 new purchasesQUANTITY

0.99+

last yearDATE

0.99+

CaitlinPERSON

0.99+

ClintonPERSON

0.99+

first pieceQUANTITY

0.99+

first bookQUANTITY

0.99+

DavePERSON

0.99+

second pieceQUANTITY

0.99+

BostonLOCATION

0.99+

SallyPERSON

0.99+

todayDATE

0.99+

AWSORGANIZATION

0.99+

hundreds of percentQUANTITY

0.98+

Stephen VictoriaPERSON

0.98+

oneQUANTITY

0.98+

Northwestern MutualORGANIZATION

0.98+

KaitlinPERSON

0.97+

four decadesQUANTITY

0.97+

firstQUANTITY

0.97+

two top playersQUANTITY

0.97+

several years agoDATE

0.96+

about four yearsQUANTITY

0.96+

first customersQUANTITY

0.95+

tons of toolsQUANTITY

0.95+

KailynPERSON

0.95+

bothQUANTITY

0.95+

twoQUANTITY

0.94+

NorthwesternORGANIZATION

0.94+

NorthwesternLOCATION

0.93+

eachQUANTITY

0.91+

CrouchPERSON

0.91+

CBOORGANIZATION

0.91+

DevOpsTITLE

0.91+

two ofQUANTITY

0.89+

AIORGANIZATION

0.87+

thingsQUANTITY

0.87+

three such knowledgeable practitionersQUANTITY

0.87+