Chris Bannocks, ING & Steven Eliuk, IBM | IBM CDO Fall Summit 2018

(light music) >> Live from Boston. It's theCUBE. Covering IBM Chief Data Officer Summit. Brought to you by IBM. >> Welcome back everyone, to theCUBE's live coverage of the IBM CDO Summit here in Boston, Massachusetts. I'm your host, Rebecca Night. And I'm joined by my co-host, Paul Gillen. We have two guests for this segment. We have Steven Eliuk, who is the Vice President of Deep Learning Global Chief Data Officer at IBM. And Christopher Bannocks, Group Chief Data Officer at IMG. Thanks so much for coming on theCUBE. >> My pleasure. >> Before we get started, Steve, I know you have some very important CUBE fans that you need-- >> I do. >> To give a shout out to. Please. >> For sure. So I missed them on the last three runs of CUBE, so I'd like to just shout out to Santiago, my son. Five years old. And the shortest one, which is Elana. Miss you guys tons and now you're on the air. (all laughing) >> Excellent. To get that important piece of business out. >> Absolutely. >> So, let's talk about Metadata. What's the problem with Metadata? >> The one problem, or the many (chuckles)? >> (laughing) There are a multitude of problems. >> How long ya got? The problem is, it's everywhere. And there's lots of it. And bringing context to that and understanding it from enterprise-wide perspective is a huge challenge. Just connecting to it finding it, or collecting centrally and then understanding the context and what it means. So, the standardization of it or the lack of standardization of it across the board. >> Yeah, it's incredibly challenging. Just the immense scale of metadata at the same time dealing with metadata as Chris mentioned. Just coming up with your own company's glossary of terms to describe your own data. It's kind of step one in the journey of making your data discoverable and governed. Alright, so it's challenging and it's not well understood and I think we're very early on in these stages of describing our data. >> Yeah. >> But we're getting there. Slowly but surely. >> And perhaps in that context it's not only the fact that it's everywhere but actually we've not created structural solutions in a consistent way across industries to be able to structure it and manage it in an appropriate way. >> So, help people do it better. What are some of the best practices for creating, managing metadata? >> Well you can look at diff, I mean, it's such a broad space you can look at different ones. Let's just take the work we do around describing our data and we do that for for the purposes of regulation. For the purposes of GDPR et cetera et cetera. It's really about discovering and providing context to the data that we have in the organization today. So, in that respect it's creating a catalog and making sure that we have the descriptions and the structures of the data that we manage and use in the organization and to give you perhaps a practical example when you have a data quality problem you need to know how to fix it. So, you store, so you create and structure metadata around well, where does it come from, first of all. So what's the journey it's taken to get to the point where you've identified that there's a problem. But also then, who do we go to to fix it? Where did it go wrong in the chain? And who's responsible for it? Those are very simple examples of the metadata around, the transformations the data might have come through to get to its heading point. The quality metrics associated with it. And then, the owner or the data steward that it has to be routed back to to get fixed. >> Now all of those are metadata elements >> All of those, yeah. >> Right? >> 'Cause we're not really talking about the data. The data might be a debit or a credit. Something very simple like that in banking terms. But actually it's got lots of other attributes associated with it which essentially describe that data. So, what is it? Who owns it? What are the data quality metrics? How do I know whether what it's quality is? >> So where do organizations make mistakes? Do they create too much metadata? Do they create poor, is it poorly labeled? Is it not federated? >> Yes. (all laughing) >> I think it's a mix of all of them. One of the things that you know Chris alluded to and you might of understood is that it's incredibly labor-intensive task. There's a lot of people involved. And when you get a lot of people involved in sadly a quite time-consuming, slightly boring job there's errors and there's problem. And that's data quality, that's GDPR, that's government owned entities, regulatory issues. Likewise, if you can't discover the data 'cause it's labeled wrong, that's potential insight that you've now lost. Because that data's not discoverable to a potential project that's looking for similar types of data. Alright, so, kind of step one is trying to scribe your metadata to the organization. Creating a taxonomy of metadata. And getting everybody on board to label that data whether it be short and long descriptions, having good tools et cetera. >> I mean look, the simple thing is... we struggle as... As a capability in any organization we struggle with these terms, right? Metadata, well ya know, if you're talking to the business they have no idea what you're talking about. You've already confused them the minute you mentioned meta. >> Hashtag. >> Yeah (laughs) >> It's a hashtag. >> That's basically what it is. >> Essentially what it is it's just data about data. It's the descriptive components that tell you what it is you're dealing with. If you just take a simple example from finance; An interest rate on it's own tells you nothing. It could be the interest rate on a savings account. It can the interest rate on a bond. But on its own you have no clue, what you're talking about. A maturity date, or a date in general. You have to provide the context. And that is it's relationships to other data and the contexts that it's in. But also the description of what it is you're looking at. And if that comes from two different systems in an organization, let's say one in Spain and one in France and you just receive a date. You don't know what you're looking at. You have not context of what you're looking at. And simply you have to have that context. So, you have to be able to label it there and then map it to a generic standard that you implement across the organization in order to create that control that you need in order to govern your data. >> Are there standards? I'm sorry Rebecca. >> Yes. >> Are there standards efforts underway industry standard why difference? >> There are open metadata standards that are underway and gaining great deal of traction. There are an internally use that you have to standardize anyway. Irrespective of what's happening across the industry. You don't have the time to wait for external standards to exist in order to make sure you standardize internally. >> Another difficult point is it can be region or country specific. >> Yeah. >> Right, so, it makes it incredibly challenging 'cause every region you might work in you might have to have a own sub-glossary of terms for that specific region. And you might have to control the export of certain data with certain terms between regions and between countries. It gets very very challenging. >> Yeah. And then somehow you have to connect to it all to be able to see what it all is because the usefulness of this is if one system calls exactly the same, maps to let's say date. And it's local definition of that is maturity date. Whereas someone else's map date to birthdate you know you've got a problem. You just know you've got a problem. And exposing the problem is part of the process. Understanding hey that mapping's wrong guys. >> So, where do you begin? If your mission is to transform your organization to be one that is data-centric and the business side is sort of eyes glazing over at the mention of metadata. What kind of communication needs to happen? What kind of teamwork, collaboration? >> So, I mean teamwork and collaboration are absolutely key. The communication takes time. Don't expect one blast of communication to solve the problem. It is going to take education and working with people to actually get 'em to realize the importance of things. And to do that you need to start something. Just the communication of the theory doesn't work. No one can ever connect to it. You have to have people who are working on the data for a reason that is business critical. And you need have them experience the problem to recognize that metadata is important. Until they experience the problem you don't get the right amount of traction. So you have to start small and grow. >> And you can use potentially the whip as well. Governance, the regulatory requirements that's a nice one to push things along. That's often helpful. >> It's helpful, but not necessarily popular. >> No, no. >> So you have to give-- >> Balance. >> We're always struggling with that balance. There's a lot of regulation that drives the need for this. But equally, that same regulation essentially drives all of the same needs that you need for analytics. For good measurement of the data. For growth of customers. For delivering better services to customers. All of these things are important. Just the web click information you have that's all essentially metadata. The way we interact with our clients online and through mobile. That's all metadata. So it's not all whip or stick. There's some real value that is in there as well. >> These would seem to be a domain that is ideal for automation. That through machine learning contextualization machines should be able to figure a lot of this stuff out. Am I wrong? >> No, absolutely right. And I think there's, we're working on proof of concepts to prove that case. And we have IBM AMG as well. The automatic metadata generation capability using machine learning and AI to be able to start to auto-generate some of this insight by using existing catalogs, et cetera et cetera. And we're starting to see real value through that. It's still very early days but I think we're really starting to see that one of the solutions can be machine learning and AI. For sure. >> I think there's various degrees of automation that will come in waves for the next, immediately right now we have certain degrees where we have a very small term set that is very high confidence predictions. But then you want to get specific to the specificity of a company which have 30,000 terms sometimes. Internally, we have 6,000 terms at IBM. And that level of specificity to have complete automation we're not there yet. But it's coming. It's a trial. >> It takes time because the machine is learning. And you have to give the machine enough inputs and gradually take time. Humans are involved as well. It's not about just throwing the machine at something and letting it churn. You have to have that human involvement. It takes time to have the machine continue to learn and grow and give it more terms. And give it more context. But over time I think we're going to see good results. >> I want to ask about that human-in-the-loop as IBM so often calls it. One of the things that Nander Paul Bendery was talking about is how the CDO needs to be a change engine in chief. So how are the rank and file interpreting this move to automation and increase in machine learning in their organizations? Is it accepted? It is (chuckles) it is a source of paranoia and worry? >> I think it's a mix. I think we're kind of blessed at least in the CDO at IBM, the global CDO. Is that everyone's kind of on board for that mission. That's what we're doing >> Right, right. >> There's team members 25, 30 years on IMBs roster and they're just as excited as I am and I've only been there for 16 months. But it kind of depends on the project too. Ones that have a high impact. Everyone's really gung ho because we've seen process times go from 90 days down to a couple of days. That's a huge reduction. And that's the governance regulatory aspects but more for us it's a little bit about we're looking for the linkage and availability of data. So that we can get more insights from that data and better outcomes for different types of enterprise use cases. >> And a more satisfying work day. >> Yeah it's fun. >> That's a key point. Much better to be involved in this than doing the job itself. The job of tagging and creating metadata associated with the vast number of data elements is very hard work. >> Yeah. >> It's very difficult. And it's much better to be working with machine learning to do it and dealing with the outliers or the exceptions than it is chugging through. Realistically it just doesn't scale. You can't do this across 30,000 elements in any meaningful way or a way that really makes sense from a financial perspective. So you really do need to be able to scale this quickly and machine learning is the way to do it. >> Have you found a way to make data governance fun? Can you gamify it? >> Are you suggesting that data governance isn't fun? (all laughing) Yes. >> But can you gamify it? Can you compete? >> We're using gamification in various in many ways. We haven't been using it in terms of data governance yet. Governance is just a horrible word, right? People have really negative connotations associated with it. But actually if you just step one degree away we're talking about quality. Quality means better decisions. And that's actually all governance is. Governance is knowing where your data is. Knowing who's responsible for fixing if it goes wrong. And being able to measure whether it's right or wrong in the first place. And it being better means we make better decisions. Our customers have better engagement with us. We please our customers more and therefore they hopefully engage with us more and buy more services. I think we should that your governance is something we invented through the need for regulation. And the need for control. And from that background. But realistically it's just, we should be proud about the data that we use in the organization. And we should want the best results from it. And it's not about governance. It's about us being proud about what we do. >> Yeah, a great note to end on. Thank you so much Christopher and Steven. >> Thank you. >> Cheers. >> I'm Rebecca Night for Paul Gillen we will have more from the IBM CDO Summit here in Boston coming up just after this. (electronic music)

Published Date : Nov 15 2018

SUMMARY :

Brought to you by IBM. of the IBM CDO Summit here in Boston, Massachusetts. To give a shout out to. And the shortest one, which is Elana. To get that important piece of business out. What's the problem with Metadata? And bringing context to that It's kind of step one in the journey But we're getting there. it's not only the fact that What are some of the best practices and the structures of the data that we manage and use What are the data quality metrics? (all laughing) One of the things that you know Chris alluded to I mean look, the simple thing is... It's the descriptive components that tell you Are there standards? You don't have the time to wait it can be region or country specific. And you might have to control the export And then somehow you have to connect to it all What kind of communication needs to happen? And to do that you need to start something. And you can use potentially the whip as well. but not necessarily popular. essentially drives all of the same needs that you need machines should be able to figure a lot of this stuff out. And we have IBM AMG as well. And that level of specificity And you have to give the machine enough inputs is how the CDO needs to be a change engine in chief. in the CDO at IBM, the global CDO. But it kind of depends on the project too. Much better to be involved in this And it's much better to be Are you suggesting And the need for control. Yeah, a great note to end on. we will have more from the IBM CDO Summit here in Boston

ENTITIES

Entity	Category	Confidence
Steve	PERSON	0.99+
Chris	PERSON	0.99+
Steven Eliuk	PERSON	0.99+
Paul Gillen	PERSON	0.99+
Christopher Bannocks	PERSON	0.99+
Spain	LOCATION	0.99+
France	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
Rebecca	PERSON	0.99+
Rebecca Night	PERSON	0.99+
Five years	QUANTITY	0.99+
90 days	QUANTITY	0.99+
16 months	QUANTITY	0.99+
30,000 elements	QUANTITY	0.99+
6,000 terms	QUANTITY	0.99+
30,000 terms	QUANTITY	0.99+
Boston	LOCATION	0.99+
one	QUANTITY	0.99+
Chris Bannocks	PERSON	0.99+
One	QUANTITY	0.99+
two guests	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
Christopher	PERSON	0.99+
25	QUANTITY	0.99+
Nander Paul Bendery	PERSON	0.99+
GDPR	TITLE	0.99+
Steven	PERSON	0.99+
two different systems	QUANTITY	0.98+
Elana	PERSON	0.98+
ING	ORGANIZATION	0.98+
IBM CDO Summit	EVENT	0.97+
Santiago	PERSON	0.96+
Vice President	PERSON	0.95+
30 years	QUANTITY	0.94+
step one	QUANTITY	0.94+
IBM Chief Data Officer Summit	EVENT	0.93+
one degree	QUANTITY	0.93+
first	QUANTITY	0.93+
IBM CDO Fall Summit 2018	EVENT	0.93+
today	DATE	0.93+
one problem	QUANTITY	0.92+
IBM AMG	ORGANIZATION	0.92+
theCUBE	ORGANIZATION	0.89+
days	QUANTITY	0.88+
one system	QUANTITY	0.82+
CUBE	ORGANIZATION	0.81+
three runs	QUANTITY	0.8+
Chief Data Officer	PERSON	0.75+
Deep Learning	ORGANIZATION	0.64+
of people	QUANTITY	0.62+
Global	PERSON	0.58+
IMG	ORGANIZATION	0.57+
couple	QUANTITY	0.56+
Data	PERSON	0.49+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for IMG: