Seth Dobrin, IBM | IBM Think 2021

>>From around the globe. It's the queue with digital coverage of IBM. Think 20, 21 brought to you by IBM. Okay. We're back with our coverage of IBM. Think 2021. This is Dave Vellante and this is the cube. We're not going to dig into AI and explore the issue of trust in machine intelligence. And we're very excited to welcome. Long-time cube alum, Seth Dobrin, who is the global chief AI officer at IBM. Seth. Always good to see you. Thanks so much for making time for us. >>Yeah, always good to see you, Dave. Thanks for having me on again. >>It's our pleasure. Um, look, language matters and IBM has been talking about language automation and, and trust. And I know you're doing a session on trustworthy AI. Can you talk about trust in the world of machine intelligence and AI, what we should, what do we need to know? >>Yeah. So, you know, as you mentioned, you know, language, language matters, automation matters and to do either of those well, you need to really trust, uh, the AI that's coming out of, of your models and trusting the AI really means five things. There's I think of it as there's five pillars, uh, the AI needs to be transparent that AI needs to be fair. It needs to be explainable. It needs to be robust, robust, and it needs to ensure privacy. And without those five things, all combined, you don't really have the ability to trust your AI either as a consumer or even as an end user. So imagine I'm a business user that's supposed to use this cool new AI, that's going to help me make a decision if I don't understand it. And I can't figure out how it got to a decision, I'm going to be less likely to consume it and use it in my day-to-day work. Whereas if I can really understand how and why it got to a decision and know that it's, you know, protecting the, the ultimate end users data, it's gonna be a lot more easy, a lot more likely that I'll, I'll end up using that AI, >>But there is the black box problem with, with, with AI and, and, uh, you know, how is that a technical issue? Uh, I mean, how do you get around that? I mean, this is, it seems non-trivial. >>Yeah. So, you know, I think solving the black box problem of AI specifically of either complex traditional machine learning models or what we think of as deep learning models is not a trivial problem, but, you know, IBM and others over the years have been really trying to tackle this, this problem of explainable AI. And we've come to a point in the world now where we've, we really believe that we have a good handle on how to, how to explain these black box models. How do you basically interpret it, interpret their results and explain them from, from the end point and understand what went into each decision at each layer in the model, if it's deep learning model to kind of be able to extract why and how it's making a certain decision. So I don't, you know, three years ago it was, you know, we thought of it as an intractable problem, but we knew we'd be able to solve it in the future. I think we're at that future today, where unless you get into something incredibly complex, we can, we can explain how and why it got to a decision. And we do this through various sets of tooling that we have, some are open source, you know, we're, we're so committed to, to explainable and fair and fair and trust when it comes to AI. That probably half of what we do is that the open source community in the form of what we call our AI fairness three 60 tool kits. Right. >>Great. Thank you for that. So let me ask you another sort of probing question here is, is there a risk, I mean, people talk about that. There's maybe a risk of, of putting too much attention on trust mean early days in the AI journey. And people are worried that it's going to stifle projects maybe slow down innovation, or maybe even be a headwind to AI adoption, adoption and scale. What are your thoughts on that? >>You know, I, I, I, I think it's a slippery slope to say it's to serve soon to ever be concerned about fairness, trust, privacy, bias, explainability, you know, what we think of as trustworthy AI. Um, I think if you, you can do very interesting and very exciting and very innovative and game changing things in the context of doing what's right. Um, and it is right, especially when you're building AIS or anything that actually impacts people's lives to make sure that it's trustworthy to make sure that it's, you know, it's, it's transparent, it's fair, it's explainable, it's robust. And it ensures the privacy of the underlying data in that model. Otherwise, you know, you get into a point where you may be able to do cool things, but those cool things get undermined by previous missteps that have caused the industry or the tools or the technology to get a bad rap. >>I mean, a great example of that is, I mean, look at the conversational AI that were released in the wild and Twitter and Facebook without any kind of thought about how do you keep them trustworthy. Um, you know, that, that, you know, that went bad really quick. And we want to make sure that, you know, our, we don't, we don't, you know, IBM's a consumer facing company we're kind of the, you know, the IBM inside, if you will, right. We want to make sure that when, you know, when the, the world's largest companies are deploying, IBM's AI, we're using IBM's pools to deploy their own AI that it's done in a way that gives them the ability to make sure that things don't go off the rails quickly, because we're not talking about a conversational Twitter bot we're talking about potentially, you know, an AI that's going to help make a life changing decision. Like, do you, you know, do we keep Dave a mortgage? Uh, do we let Dave at a jail? Is he likely to recidivate? Um, you know, is he, you know, things like that that are actually life-changing and not just going to be embarrassment for the company, it's important to keep trucking >>Great points you make. I mean, you're right. And it did turn bad, uh, very quickly. And it's not resolved. I mean, a lot of the social companies are saying, well, government, you figure it out. We can't. So let's bring it back to the enterprise. That's what we're kind of interested in where IBM's main focuses right now. And where do you see it going? I mean, you mentioned, you know, things like recidivism and mortgages. I mean, these, these really are events that you can predict with very high probability. You know, maybe you don't get it a hundred percent. Right. Uh, but it really is world changing in, in many ways. Where's the focus now? Where do you see it headed? >>Yeah. So, so I think the focus is now, or has been for a little while and continues to be, and probably will be in the future on augmenting intelligence. So especially when it comes to life-changing decisions, we don't really want an AI making that decision independent of the human. Uh, we want that AI guiding the decisions that humans make. Um, and, you know, but reducing the, the, the, the universe of those decisions, something down to something that's digestible, uh, by, by a human and also at the same time using the AI to help eliminate biases, cognitive biases that may exist within, within us as humans. So, and when we think about things like bias, uh, we have to remember too, that the, by the math itself is not where the bias is coming from, right? The bias is coming from the data. And the bias in the data comes from prior decisions of humans that were themselves done for, for bias reasons, right? >>And so by leveraging AI to remove the bias from the decisions that are surface to humans, it helps eliminate some of these things. So for instance, you know, back to the mortgage example, right, if we look at the impact of redlining, right, where people in certain zip codes or certain addresses, certain areas didn't get mortgages that red lining still exists in the data. I don't know of a single mortgage provider today that wants to have that in part of their day-to-day practices helps remove that from them and surface a decision that's based on the context of the individual, based on their, you know, their, their ability to repay in the case of a mortgage, as opposed to what they look like or where they live. >>I mean, I liked the concept of the common editorial power of machines and humans. Um, but, but I think there's, well, there's all, I wonder if you could sort of educate me on them. And there seems to be a lot of potential use cases for many companies, but IBM as well for inference, you know, at the edge. I mean, everybody's talking about the edge OpenShift obviously is a big play there hybrid cloud. Uh, so how do you see that, that kind of real-time inference playing it is that date, certain data comes back to the model and that's where the humans come in. How do you see that? >>Yeah, so, so I often get asked, what do you, you know, what do you see as the future of AI? And my response is the future of AI is edge. And, and the reason for that is if I can solve for an edge use case, I can solve for every use case between the edge and the data center. Right. And some of the challenges that you brought up, uh, as far as being on the edge, get back to, you know, and it actually helps address other problems too, such as how do I handle data sovereignty regulations when it comes to, to training models, even models themselves. Right. Um, but when you think about, you know, I have 50 models deployed around the world, there are 50 50 versions of the same model deployed around the world at different scoring end points or different places where I'm inferencing. >>Um, how do I, without having to bring the model back or all the data back, how do I keep all those models in sync? How do I make sure that, you know, back to the social media example, that one of them doesn't go completely off the rails. And we do this through federated learning, right? And this is, or distributed, distributed learning, federated learning, whatever you want to call it. It's this concept of you have models running at discrete edge locations or discrete distributed locations, those models over time, learn from the data that they're, that they're scoring on. Um, and instead of sending the data back to retrain, or even the model back to retrain, you send back the, the height, the changes in the parameters, uh, that have been updated in that model. You can then pull, put all those together. So you have 50 different distributions that you're managing. You pull all those together, and you can even assign weights to the different ones based on bias that might exist, you know, uh, not biopsies, but different distributions that may exist. That one node or another, you can do it based on the amount of data that's been going into econ into changing those weights. And so you, you combine these models back into a single model, then you push them back out to the edge. >>So I just thinking about, I mean, cause Mo most of the work in AI today, correct me if I'm wrong is in, in, in, I would say modeling versus influencing. And, but, but if you're laying out a future where that's going to change, and if I think about some of the things that we're familiar with today, things like fraud detection and maybe weather, um, supply chains and, and, and that's just going to get better and better and better. But I think about some of the areas, and I'm curious if you could maybe talk to some of the use cases you're seeing in examples both today and the future, but I think of things like, you know, smart power grid, smart factories, automated retail, I mean, he seemed like wheelhouses for, for IBM. So maybe you could share with us your thoughts on that. And some other examples. Yeah. >>So you brought up, you know, fraud fraud is a really good example of an edge use case that might not seem like an edge use case. Um, and so, you know, as you're swiping a credit card and let's just focus on credit card transactions, most of those transactions occur on a mainframe and, you know, they, they need a response time that's, you know, less than a millisecond. And so if I'm, if I'm, if I'm responsible for making sure that my bank doesn't have any credit card fraud, and I have a model that's going to do it, I can't have, in this case, the mainframe call-out to someplace else to score the model and then come back and this gets back to the power of hybrid cloud, right? And so if I can deploy that model on my main frame, where those transactions are happening, you know, half a millisecond at a time, I can then score every single transaction that comes back and on the mainframe directly without having to go out, which enables me to keep my SLA enables me to keep that less than half a millisecond response time while also preventing any fraudulent transactions from happening. >>That's a great example of what I would call let's. Let's fall into my, everything is edge bucket, right? Where, yeah, you're training the model somewhere else where you don't have the cost of running it, training it on the mainframe, but I want to score it back there. And we've actually done this with a couple of banks where we've trained models in the cloud, on GPS and done the influencing and scoring on the mainframe for just exactly that for fraud >>Edge. If you can make it there, you can make it anywhere. Uh, Seth, we gotta leave it there. Uh, really, really appreciate your time. >>All right. Great to have great for having me. Thanks, Dave. I appreciate it as always. >>Great to see you hope we can see you later this year, face to face, or at least in 22. And, uh, and thank you. I hope so. Yeah. Let's, let's make that happen. So that's where those virtual shake on it. Thanks everybody for watching our ongoing coverage of IBM think 2021, the virtual edition is Dave Volante for the cube. Keep it right there for more great content from the show.

Published Date : May 4 2021

SUMMARY :

Think 20, 21 brought to you by IBM. Yeah, always good to see you, Dave. Can you talk about trust it got to a decision and know that it's, you know, protecting the, and, uh, you know, how is that a technical issue? And we do this through various sets of tooling that we have, some are open source, you know, So let me ask you another sort of probing question here is, Otherwise, you know, you get into a point where you may be able to do cool things, And we want to make sure that, you know, our, we don't, And where do you see it going? Um, and, you know, but reducing the, the, the, the universe of those decisions, that's based on the context of the individual, based on their, you know, for inference, you know, at the edge. And some of the challenges that you brought up, the data back to retrain, or even the model back to retrain, you send back the future, but I think of things like, you know, smart power grid, smart factories, Um, and so, you know, as you're swiping the mainframe for just exactly that for fraud Uh, Seth, we gotta leave it there. Great to have great for having me. Great to see you hope we can see you later this year, face to face, or at least in 22.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Seth	PERSON	0.99+
Dave Volante	PERSON	0.99+
50 models	QUANTITY	0.99+
five things	QUANTITY	0.99+
five things	QUANTITY	0.99+
50	QUANTITY	0.99+
each layer	QUANTITY	0.99+
less than half a millisecond	QUANTITY	0.99+
less than a millisecond	QUANTITY	0.99+
later this year	DATE	0.99+
five pillars	QUANTITY	0.98+
three years ago	DATE	0.98+
50 different distributions	QUANTITY	0.98+
half a millisecond	QUANTITY	0.98+
Facebook	ORGANIZATION	0.98+
today	DATE	0.98+
one	QUANTITY	0.97+
Twitter	ORGANIZATION	0.97+
both	QUANTITY	0.97+
single model	QUANTITY	0.95+
Think 2021	COMMERCIAL_ITEM	0.95+
Think 20	COMMERCIAL_ITEM	0.92+
each decision	QUANTITY	0.91+
60 tool kits	QUANTITY	0.88+
hundred percent	QUANTITY	0.86+
single mortgage	QUANTITY	0.84+
IBM think 2021	TITLE	0.78+
22	QUANTITY	0.75+
OpenShift	TITLE	0.74+
2021	TITLE	0.74+
50 versions	QUANTITY	0.72+
21	COMMERCIAL_ITEM	0.72+
three	QUANTITY	0.69+
single transaction	QUANTITY	0.65+
Think	COMMERCIAL_ITEM	0.55+
couple	QUANTITY	0.51+

BOS10 Seth Dobrin VTT

(Upbeat music) >> Narrator: From around the globe. It's The Cube with digital coverage of IBM Think 2021, brought to you by IBM. >> Okay. We're back with our coverage of IBM Think 2021. This is Dave Vellante and this is The Cube. We're now going to dig into AI and explore the issue of trust in machine intelligence. And we're very excited to welcome longtime Cube alum Seth Dobrin, who is a global chief AI officer at IBM. Seth, always good to see you. Thanks so much for making time for us. >> Yeah, always good to see you, Dave. Thanks for having me on again. >> It's our pleasure. Look, language matters and IBM has been talking about language, automation, and-- and trust. And I know you're doing a session on trustworthy AI. Can you talk about trust in the world of machine intelligence and AI, what we should-- what do we need to know? >> Yeah. So, you know as you mentioned, you know, language, language matters. Automation matters. And to do either of those well, you need to really trust the AI that's coming out of-- of your models. And trusting the AI really means five things. There's-- I think of it as there's five pillars. The AI needs to be transparent. The AI needs to be fair. It needs to be explainable. It needs to be robust. And it needs to ensure privacy. And without those five things all combined, you don't really have the ability to trust your AI either as a consumer or even as an end user. So if I imagine I'm a business user that's supposed to use this cool, new AI that's going to help me make a decision, if I don't understand it and I can't figure out how it got to a decision, I'm going to be less likely to consume it and use it in my day-to-day work. Whereas if I can really understand how and why it got to a decision and know that it's, you know, protecting the ultimate end user's data, it's going to be a lot more easy a lot more likely that I'll end up using that AI. >> But there is the black box problem with, with, with AI and, and, you know, how-- is that a technical issue? I mean, how do you get around that? I mean, this is, it seems non-trivial. >> Yeah. So, you know I think solving the black box problem of AI specifically of either complex traditional machine learning models or what we think of as deep learning models is not a trivial problem, but, you know, IBM and others over the years have been really trying to tackle this this problem of explainable AI. And we've come to a point in the world now where we really believe that we have a good handle on how to, how to explain these black box models. How do you basically interpret it-- interpret their results and explain them from the end point and understand what went into each decision at each layer in the model, if it's deep learning model, to kind of be able to extract why and how it's making a certain decision. So I don't, you know, three years ago it was, you know we thought of it as an intractable problem but we knew we'd be able to solve it in the future. I think we're at that future today, where unless you get into something incredibly complex, we can, we can explain how and why it got to a decision. >> Awesome >> And we do this through various sets of tooling that we have. Some are open source, you know, we're so committed to, to explainable and fair and trust when it comes to AI, that probably half of what we do is at the open source community in the form of what we call our AI fairness 360 tool kits. >> Great. Thank you for that. So let me ask you another sort of probing question here. Is there a risk-- I mean, people talk about that there's maybe a risk of putting too much attention on trust and these are early days in the AI journey and people are worried that it's going to stifle projects, maybe slow down innovation, or maybe even be a headwind to AI adoption and scale. What are your thoughts on that? >> You know, I, I, I-- I think it's a slippery slope to say it's too soon to ever be concerned about fairness, trust, privacy, bias, explainability, you know, what we think of as trustworthy AI. I think if you-- you can do very interesting and very exciting and very innovative and game changing things in the context of doing what's right. And it is right, especially when you're building AIs or anything that actually impacts people's lives to make sure that it's trustworthy. To make sure that it's, you know, it's, it's transparent, it's fair, it's explainable, it's robust. And it ensures the privacy of the underlying data in that model. Otherwise, you know, you get into a point where you may be able to do cool things but those cool things get undermined by previous missteps that have caused the industry or the tools or the technology to get a bad rap. I mean, a great example of that is, I mean, look at the conversational AI that were released in the wild in Twitter and Facebook without any kind of thought about how do you keep them trustworthy? You know, that-- you know, that went bad really quick. And we want to make sure that, you know, our-- we don't-- we don't, you know, IBM's not a consumer facing company, we're kind of the, you know, the IBM inside, if you will. >> Dave: Right >> Right? We want to make sure that when, you know, when the-- the world's largest companies are deploying, IBM's AI, or using IBM's tools to deploy their own AI, that it's done in a way that gives them the ability to make sure that things don't go off the rail quickly. 'Cause we're not talking about a conversational Twitter bot. We're talking about potentially, you know, an AI that's going to help make a life-changing decision. Like, do you, you know, do we give Dave a mortgage? Do we let Dave out of jail? Is he likely to recidivate? You know, is he, you know-- things like that that are actually life changing and not just going to be embarrassment for the company. So it's important to keep trust upfront. >> Great points you make. I mean, you're right. And it did turn bad very quickly and it's not resolved. I mean, a lot of the social companies are saying, "Oh, government you figure it out." We can't. (chuckles) So let's bring it back to the enterprise. That's what we're kind of interested in where IBM's main focus is right now. And where do you see it going? I mean, you mentioned, you know, things like recidivism and mortgages. I mean, these, these really are events that you can predict with very high probability. You know, maybe you don't get it a hundred percent right. But it really is world changing in, in many ways. Where's the focus now? And where do you see it headed? >> Yeah, so, so I think the focus is now, or has been for a little while and continues to be, and probably will be in the future on augmenting intelligence. >> Dave: Mmm hmm >> So especially when it comes to life-changing decisions, we don't really want an AI making that decision independent of the human. We want that AI guiding the decisions that humans make and you know-- but reducing the universe of those decisions, something-- down to something that's digestible by, by a human, and also at the same time using the AI to help eliminate biases, cognitive biases that that may exist within us as humans. So, and when we think about things like bias, we have to remember too, that the bi- the math itself is not where the bias is coming from, right? The bias is coming from the data. And the bias in the data comes from prior decisions of humans that were themselves done for bias reasons, right? And so by leveraging AI to remove the bias from the decisions that are surface to humans, it helps eliminate some of these things. So for instance, you know, back to the mortgage example, right? If we look at the impact of redlining, right? Where people in certain zip codes or certain addresses, certain areas didn't get mortgages, that redlining still exists in the data. I don't know of a single mortgage provider today that wants to have that in part of their day-to-day practice. This helps remove that from them and surface a decision that's based on the context of the individual, based on their-- you know, their ability to repay, in the case of a mortgage, as opposed to what they look like or where they live. >> So, I mean, I liked the concept of the common and pictorial power of machines and humans, but I think there's-- well, there's all-- I wonder if you could sort of educate me on them. There seems to be a lot of potential use cases for many companies, but IBM as well, for inference, you know, at the edge. I mean, everybody's talking about the edge. Open shift obviously is a big play there. Hybrid cloud... So how do you see-- that-- that kind of real time inference playing in? Does that-- certain data comes back to the model and that's where the humans come in? W-- H-- How do you see that? >> Yeah, so, so-- I often get asked, what do you, you know what do you see as the future of AI? And my response is the future of AI is edge. And, and the reason for that is if I can solve for an edge use case, I can solve for every use case between the edge and the data center. >> Dave: Uh huh. >> Right? And some of the challenges that you brought up as far as being on the edge, get back to, you know, and it actually helps address other problems too, such as how do I handle data sovereignty regulations when it comes to training models, and even models themselves. >> Dave: Right. >> But when you think about, you know, I have 50 models deployed around the world, there are 50 versions of the same model deployed around the world at different scoring end points or different places where I'm inferencing. How do I-- without having to bring the model back or all the data back, how do I keep all those models in sync? How do I make sure that, you know, back to the social media example, that one of them doesn't go completely off the rails. And we do this through federated learning, right? And this is-- or distributed, distributed learning, federated learning, whatever you want to call it. It's this concept of you have models running at the discrete edge locations or discrete distributed locations, those models over time learn from the data that they're, that they're scoring on. And instead of sending the data back to retrain or even the model back to retrain you send back the the-- the changes in the parameters that have been updated in that model. You can then pull-- put all those together. So you have 50 different distributions that you're managing. You pull all those together and you can even assign weights to the different ones based on bias that might, you know, not biases but different distributions that may exist at one node or another, you can do it based on the amount of data that's been going into-- gone into changing those weights. And so you-- you combine these models back into a single model and then you push them back out to the edge. >> So I just thinking about, I mean, cause mm-mo-- most of the work in AI today, correct me if I'm wrong is in, in, in-- I would say modeling versus inferencing. And, but, but if you're laying out a future where that's going to change, and if I think about some of the things that we're familiar with today, things like fraud detection, maybe weather... supply chains and, and-- and that's just going to get better and better and better. But I think about some of the areas and I'm curious if you could maybe talk to some of the use cases you're seeing in examples, both today and the future, but I think of things like, you know, smart power grid, smart factories, automated retail, I mean these seemed like wheelhouses for IBM. So maybe you could share with us your thoughts on that. And some other examples. >> Yeah, so you brought up, you know, fraud. Fraud is a really good example of an edge use case that might not seem like an edge use case. >> Dave: Yeah. >> And so, you know, as you're swiping a credit card, and let's just focus on credit card transactions, most of those transactions occur on a mainframe and you know, they, they need a response time that's, you know, less than a millisecond. And so if I'm, if I'm, you know, if I'm responsible for making sure that my bank doesn't have any credit card fraud and I have a model that's going to do it, I can't have, in this case, the mainframe call-out to someplace else to score the model and then come back. And this gets back to the power of hybrid cloud, right? And so if I can deploy that model on my mainframe, where those transactions are happening, you know, half a millisecond at a time, I can then score every single transaction that comes back and on the mainframe directly without having to go out, which enables me to keep my SLA, enables me to keep that less than half a millisecond response time, while also preventing any fraudulent transactions from happening. That's a great example of what I would call-- That falls into my "everything is edge bucket," right? Where, yeah, you're training the model somewhere else where you don't have the cost of running and training it on the mainframe, but I want to score it back there. And we've actually done this with a couple of banks, where we've trained models in the cloud on GPUs and done the inferencing and scoring on the mainframe for just exactly that, for fraud. >> Edge: if you can make it there, you can make it anywhere. (chuckles) Seth, we-- we got to leave it there. Really, really appreciate your time. >> All right. Great to have-- great for having me. Thanks, Dave, I appreciate it, as always. >> Oh, it was great to see you. Hope we can see you later this year, face to face, or at least in '22. And, and thank you! >> I hope so. >> Yeah, let's, let's make that happen Seth. We'll virtual shake on it. (chuckles) Thanks everybody for watching our ongoing coverage of IBM Think 2021, the virtual edition. This is Dave Vellante for The Cube. Keep it right there for more great content from the show. (light-hearted music) (light-hearted music continues) (light-hearted music fades) (upbeat music)

Published Date : Apr 16 2021

SUMMARY :

brought to you by IBM. and explore the issue of Yeah, always good to see you, Dave. Can you talk about trust that it's, you know, I mean, how do you get around that? So I don't, you know, three Some are open source, you know, So let me ask you another the IBM inside, if you will. when, you know, when the-- I mean, a lot of the social companies and probably will be in the future So for instance, you know, for inference, you know, at the edge. And, and the reason for that is that you brought up based on bias that might, you know, I mean, cause mm-mo-- most of the work up, you know, fraud. and you know, they, they Edge: if you can make it Great to have-- great for having me. Hope we can see you later this year, of IBM Think 2021, the virtual edition.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
50 models	QUANTITY	0.99+
Seth	PERSON	0.99+
50 versions	QUANTITY	0.99+
Seth Dobrin	PERSON	0.99+
each layer	QUANTITY	0.99+
five things	QUANTITY	0.99+
less than half a millisecond	QUANTITY	0.99+
five things	QUANTITY	0.99+
'22	DATE	0.99+
less than a millisecond	QUANTITY	0.99+
today	DATE	0.99+
half a millisecond	QUANTITY	0.98+
later this year	DATE	0.98+
Facebook	ORGANIZATION	0.98+
five pillars	QUANTITY	0.97+
three years ago	DATE	0.97+
Twitter	ORGANIZATION	0.97+
50 different distributions	QUANTITY	0.97+
each decision	QUANTITY	0.97+
Think 2021	COMMERCIAL_ITEM	0.97+
both	QUANTITY	0.97+
one	QUANTITY	0.96+
single model	QUANTITY	0.93+
hundred percent	QUANTITY	0.84+
single mortgage	QUANTITY	0.82+
360	QUANTITY	0.78+
The Cube	COMMERCIAL_ITEM	0.76+
single transaction	QUANTITY	0.7+
Cube	COMMERCIAL_ITEM	0.6+
The Cube	ORGANIZATION	0.43+

Seth Dobrin, IBM | IBM Data and AI Forum

>>live from Miami, Florida It's the Q covering. IBM is data in a I forum brought to you by IBM. >>Welcome back to the port of Miami, everybody. We're here at the Intercontinental Hotel. You're watching the Cube? The leader and I live tech covered set. Daubert is here. He's the vice president of data and I and a I and the chief data officer of cloud and cognitive software. And I'd be upset too. Good to see you again. >>Good. See, Dave, thanks for having me >>here. The data in a I form hashtag data. I I It's amazing here. 1700 people. Everybody's gonna hands on appetite for learning. Yeah. What do you see out in the marketplace? You know what's new since we last talked. >>Well, so I think if you look at some of the things that are really need in the marketplace, it's really been around filling the skill shortage. And how do you operationalize and and industrialize? You're a I. And so there's been a real need for things ways to get more productivity out of your data. Scientists not necessarily replace them. But how do you get more productivity? And we just released a few months ago, something called Auto A I, which really is, is probably the only tool out there that automates the end end pipeline automates 80% of the work on the Indian pipeline, but isn't a black box. It actually kicks out code. So your data scientists can then take it, optimize it further and understand it, and really feel more comfortable about it. >>He's got a eye for a eyes. That's >>exactly what is a eye for an eye. >>So how's that work? So you're applying machine intelligence Two data to make? Aye. Aye, more productive pick algorithms. Best fit. >>Yeah, So it does. Basically, you feed it your data and it identifies the features that are important. It does feature engineering for you. It does model selection for you. It does hyper parameter tuning and optimization, and it does deployment and also met monitors for bias. >>So what's the date of scientists do? >>Data scientist takes the code out the back end. And really, there's some tweaks that you know, the model, maybe the auto. Aye, aye. Maybe not. Get it perfect, Um, and really customize it for the business and the needs of the business. that the that the auto A I so they not understand >>the data scientist, then can can he or she can apply it in a way that is unique to their business that essentially becomes their I p. It's not like generic. Aye, aye for everybody. It's it's customized by And that's where data science to complain that I have the time to do this. Wrangling data >>exactly. And it was built in a combination from IBM Research since a great assets at IBM Research plus some cattle masters at work here at IBM that really designed and optimize the algorithm selection and things like that. And then at the keynote today, uh, wonderment Thompson was up there talking, and this is probably one of the most impactful use cases of auto. Aye, aye to date. And it was also, you know, my former team, the data science elite team, was engaged, but wonderment Thompson had this problem where they had, like, 17,000 features in their data sets, and what they wanted to do was they wanted to be able to have a custom solution for their customers. And so every time they get a customer that have to have a data scientist that would sit down and figure out what the right features and how the engineer for this customer. It was an intractable problem for them. You know, the person from wonderment Thompson have prevented presented today said he's been trying to solve this problem for eight years. Auto Way I, plus the data science elite team solve the form in two months, and after that two months, it went right into production. So in this case, oughta way. I isn't doing the whole pipeline. It's helping them identify the features and engineering the features that are important and giving them a head start on the model. >>What's the, uh, what's the acquisition bottle for all the way as a It's a license software product. Is it assassin part >>of Cloudpack for data, and it's available on IBM Cloud. So it's on IBM Cloud. You can use it paper use so you get a license as part of watching studio on IBM Cloud. If you invest in Cloudpack for data, it could be a perpetual license or committed term license, which essentially assassin, >>it's essentially a feature at dawn of Cloudpack for data. >>It's part of Cloudpack per day and you're >>saying it can be usage based. So that's key. >>Consumption based hot pack for data is all consumption based, >>so people want to use a eye for competitive advantage. I said by my open that you know, we're not marching to the cadence of Moore's Law in this industry anymore. It's a combination of data and then cloud for scale. So so people want competitive advantage. You've talked about some things that folks are doing to gain that competitive advantage. But the same time we heard from Rob Thomas that only about 4 to 10% penetration for a I. What? What are the key blockers that you see and how you're knocking them >>down? Well, I think there's. There's a number of key blockers, so one is of access to data, right? Cos have tons of data, but being able to even know what data is, they're being able to pull it all together and being able to do it in a way that is compliant with regulation because you got you can't do a I in a vacuum. You have to do it in the context of ever increasing regulation like GDP R and C, C, P A and all these other regulator privacy regulations that are popping up. So so that's that's really too so access to data and regulation can be blockers. The 2nd 1 or the 3rd 1 is really access to appropriate skills, which we talked a little bit about. Andi, how do you retrain, or how do you up skill, the talent you have? And then how do you actually bring in new talent that can execute what you want on then? Sometimes in some cos it's a lack of strategy with appropriate measurement, right? So what is your A II strategy, and how are you gonna measure success? And you and I have talked about this on Cuban on Cube before, where it's gotta measure your success in dollars and cents right cost savings, net new revenue. That's really all your CFO is care about. That's how you have to be able to measure and monitor your success. >>Yes. Oh, it's so that's that Last one is probably were where most organizations start. Let's prioritize the use cases of the give us the best bang for the buck, and then business guys probably get really excited and say Okay, let's go. But to up to truly operationalize that you gotta worry about these other things. You know, the compliance issues and you gotta have the skill sets. Yeah, it's a scale. >>And sometimes that's actually the first thing you said is sometimes a mistake. So focusing on the one that's got the most bang for the buck is not necessarily the best place to start for a couple of reasons. So one is you may not have the right data. It may not be available. It may not be governed properly. Number one, number two the business that you're building it for, may not be ready to consume it right. They may not be either bought in or the processes need to change so much or something like that, that it's not gonna get used. And you can build the best a I in the world. If it doesn't get used, it creates zero value, right? And so you really want to focus on for the first couple of projects? What are the one that we can deliver the best value, not Sarah, the most value, but the best value in the shortest amount of time and ensure that it gets into production because especially when you're starting off, if you don't show adoption, people are gonna lose interest. >>What are you >>seeing in terms of experimentation now in the customer base? You know, when you talk to buyers and you talk about, you know, you look at the I T. Spending service. People are concerned about tariffs. The trade will hurt the 2020 election. They're being a little bit cautious. But in the last two or three years have been a lot of experimentation going on. And a big part of that is a I and machine learning. What are you seeing in terms of that experimentation turning into actually production project that we can learn from and maybe do some new experiments? >>Yeah, and I think it depends on how you're doing the experiments. There's, I think there's kind of academic experimentation where you have data science, Sistine Data science teams that come work on cool stuff that may or may not have business value and may or may not be implemented right. They just kind of latch on. The business isn't really involved. They latch on, they do projects, and that's I think that's actually bad experimentation if you let it that run your program. The good experimentation is when you start identity having a strategy. You identify the use cases you want to go after and you experiment by leveraging, agile to deliver these methodologies. You deliver value in two weeks prints, and you can start delivering value quickly. You know, in the case of wonderment, Thompson again 88 weeks, four sprints. They got value. That was an experiment, right? That was an experiment because it was done. Agile methodologies using good coding practices using good, you know, kind of design up front practices. They were able to take that and put it right into production. If you're doing experimentation, you have to rewrite your code at the end. And it's a waste of time >>T to your earlier point. The moon shots are oftentimes could be too risky. And if you blow it on a moon shot, it could set you back years. So you got to be careful. Pick your spots, picked ones that maybe representative, but our lower maybe, maybe lower risk. Apply agile methodologies, get a quick return, learn, develop those skills, and then then build up to the moon ship >>or you break that moon shot down its consumable pieces. Right, Because the moon shot may take you two years to get to. But maybe there are sub components of that moon shot that you could deliver in 34 months and you start delivering knows, and you work up to the moon shot. >>I always like to ask the dog food in people. And I said, like that. Call it sipping your own champagne. What do you guys done internally? When we first met, it was and I think, a snowy day in Boston, right at the spark. Some it years ago. And you did a big career switch, and it's obviously working out for you, But But what are some of the things? And you were in part, brought in to help IBM internally as well as Interpol Help IBM really become data driven internally? Yeah. How has that gone? What have you learned? And how are you taking that to customers? >>Yeah, so I was hired three years ago now believe it was that long toe lead. Our internal transformation over the last couple of years, I got I don't want to say distracted there were really important business things I need to focus on, like gpr and helping our customers get up and running with with data science, and I build a data science elite team. So as of a couple months ago, I'm back, you know, almost entirely focused on her internal transformation. And, you know, it's really about making sure that we use data and a I to make appropriate decisions on DSO. Now we have. You know, we have an app on her phone that leverages Cognos analytics, where at any point, Ginny Rometty or Rob Thomas or Arvin Krishna can pull up and look in what we call E P M. Which is enterprise performance management and understand where the business is, right? What what do we do in third quarter, which just wrapped up what was what's the pipeline for fourth quarter? And it's at your fingertips. We're working on revamping our planning cycle. So today planning has been done in Excel. We're leveraging Planning Analytics, which is a great planning and scenario planning tool that with the tip of a button, really let a click of a button really let you understand how your business can perform in the future and what things need to do to get it perform. We're also looking across all of cloud and cognitive software, which data and A I sits in and within each business unit and cloud and cognitive software. The sales teams do a great job of cross sell upsell. But there's a huge opportunity of how do we cross sell up sell across the five different businesses that live inside of cloud and cognitive software. So did an aye aye hybrid cloud integration, IBM Cloud cognitive Applications and IBM Security. There's a lot of potential interplay that our customers do across there and providing a I that helps the sales people understand when they can create more value. Excuse me for our customers. >>It's interesting. This is the 10th year of doing the Cube, and when we first started, it was sort of the beginning of the the big data craze, and a lot of people said, Oh, okay, here's the disruption, crossing the chasm. Innovator's dilemma. All that old stuff going away, all the new stuff coming in. But you mentioned Cognos on mobile, and that's this is the thing we learned is that the key ingredients to data strategies. Comprised the existing systems. Yes. Throw those out. Those of the systems of record that were the single version of the truth, if you will, that people trusted you, go back to trust and all this other stuff built up around it. Which kind of created dissidents. Yeah. And so it sounds like one of the initiatives that you you're an IBM I've been working on is really bringing in the new pieces, modernizing sort of the existing so that you've got sort of consistent data sets that people could work. And one of the >>capabilities that really has enabled this transformation in the last six months for us internally and for our clients inside a cloud pack for data, we have this capability called IBM data virtualization, which we have all these independent sources of truth to stomach, you know? And then we have all these other data sources that may or may not be as trusted, but to be able to bring them together literally. With the click of a button, you drop your data sources in the Aye. Aye, within data. Virtualization actually identifies keys across the different things so you can link your data. You look at it, you check it, and it really enables you to do this at scale. And all you need to do is say, pointed out the data. Here's the I. P. Address of where the data lives, and it will bring that in and help you connect it. >>So you mentioned variances in data quality and consumer of the data has to have trust in that data. Can you use machine intelligence and a I to sort of give you a data confidence meter, if you will. Yeah. So there's two things >>that we use for data confidence. I call it dodging this factor, right. Understanding what the dodging this factor is of the data. So we definitely leverage. Aye. Aye. So a I If you have a date, a dictionary and you have metadata, the I can understand eight equality. And it can also look at what your data stewards do, and it can do some of the remediation of the data quality issues. But we all in Watson Knowledge catalog, which again is an in cloudpack for data. We also have the ability to vote up and vote down data. So as much as the team is using data internally. If there's a data set that had a you know, we had a hive data quality score, but it wasn't really valuable. It'll get voted down, and it will help. When you search for data in the system, it will sort it kind of like you do a search on the Internet and it'll it'll down rank that one, depending on how many down votes they got. >>So it's a wisdom of the crowd type of. >>It's a crowd sourcing combined with the I >>as that, in your experience at all, changed the dynamics of politics within organizations. In other words, I'm sure we've all been a lot of meetings where somebody puts foursome data. And if the most senior person in the room doesn't like the data, it doesn't like the implication he or she will attack the data source, and then the meeting's over and it might not necessarily be the best decision for the organization. So So I think it's maybe >>not the up, voting down voting that does that, but it's things like the E PM tool that I said we have here. You know there is a single source of truth for our finance data. It's on everyone's phone. Who needs access to it? Right? When you have a conversation about how the company or the division or the business unit is performing financially, it comes from E. P M. Whether it's in the Cognos app or whether it's in a dashboard, a separate dashboard and Cognos or is being fed into an aye aye, that we're building. This is the source of truth. Similarly, for product data, our individual products before me it comes from here's so the conversation at the senior senior meetings are no longer your data is different from my data. I don't believe it. You've eliminated that conversation. This is the data. This is the only data. Now you can have a conversation about what's really important >>in adult conversation. Okay, Now what are we going to do? It? It's >>not a bickering about my data versus your data. >>So what's next for you on? You know, you're you've been pulled in a lot of different places again. You started at IBM as an internal transformation change agent. You got pulled into a lot of customer situations because yeah, you know, you're doing so. Sales guys want to drag you along and help facilitate activity with clients. What's new? What's what's next for you. >>So really, you know, I've only been refocused on the internal transformation for a couple months now. So really extending IBM struck our cloud and cognitive software a data and a I strategy and starting to quickly implement some of these products, just like project. So, like, just like I just said, you know, we're starting project without even knowing what the prioritized list is. Intuitively, this one's important. The team's going to start working on it, and one of them is an aye aye project, which is around cross sell upsell that I mentioned across the portfolio and the other one we just got done talking about how in the senior leadership meeting for Claude Incognito software, how do we all work from a Cognos dashboard instead of Excel data data that's been exported put into Excel? The challenge with that is not that people don't trust the data. It's that if there's a question you can't drill down. So if there's a question about an Excel document or a power point that's up there, you will get back next meeting in a month or in two weeks, we'll have an e mail conversation about it. If it's presented in a really live dashboard, you can drill down and you can actually answer questions in real time. The value of that is immense, because now you as a leadership team, you can make a decision at that point and decide what direction you're going to do. Based on data, >>I said last time I have one more questions. You're CDO but you're a polymath on. So my question is, what should people look for in a chief data officer? What sort of the characteristics in the attributes, given your >>experience, that's kind of a loaded question, because there is. There is no good job, single job description for a chief date officer. I think there's a good solid set of skill sets, the fine for a cheap date officer and actually, as part of the chief data officer summits that you you know, you guys attend. We had were having sessions with the chief date officers, kind of defining a curriculum for cheap date officers with our clients so that we can help build the chief. That officer in the future. But if you look a quality so cheap, date officer is also a chief disruption officer. So it needs to be someone who is really good at and really good at driving change and really good at disrupting processes and getting people excited about it changes hard. People don't like change. How do you do? You need someone who can get people excited about change. So that's one thing. On depending on what industry you're in, it's got to be. It could be if you're in financial or heavy regulated industry, you want someone that understands governance. And that's kind of what Gardner and other analysts call a defensive CDO very governance Focus. And then you also have some CDOs, which I I fit into this bucket, which is, um, or offensive CDO, which is how do you create value from data? How do you caught save money? How do you create net new revenue? How do you create new business models, leveraging data and a I? And now there's kind of 1/3 type of CDO emerging, which is CDO not as a cost center but a studio as a p N l. How do you generate revenue for the business directly from your CDO office. >>I like that framework, right? >>I can't take credit for it. That's Gartner. >>Its governance, they call it. We say he called defensive and offensive. And then first time I met Interpol. He said, Look, you start with how does data affect the monetization of my organization? And that means making money or saving money. Seth, thanks so much for coming on. The Cube is great to see you >>again. Thanks for having me >>again. All right, Keep it right to everybody. We'll be back at the IBM data in a I form from Miami. You're watching the Cube?

Published Date : Oct 22 2019

SUMMARY :

IBM is data in a I forum brought to you by IBM. Good to see you again. What do you see out in the marketplace? And how do you operationalize and and industrialize? He's got a eye for a eyes. So how's that work? Basically, you feed it your data and it identifies the features that are important. And really, there's some tweaks that you know, the data scientist, then can can he or she can apply it in a way that is unique And it was also, you know, my former team, the data science elite team, was engaged, Is it assassin part You can use it paper use so you get a license as part of watching studio on IBM Cloud. So that's key. What are the key blockers that you see and how you're knocking them the talent you have? You know, the compliance issues and you gotta have the skill sets. And sometimes that's actually the first thing you said is sometimes a mistake. You know, when you talk to buyers and you talk You identify the use cases you want to go after and you experiment by leveraging, And if you blow it on a moon shot, it could set you back years. Right, Because the moon shot may take you two years to And how are you taking that to customers? with the tip of a button, really let a click of a button really let you understand how your business And so it sounds like one of the initiatives that you With the click of a button, you drop your data sources in the Aye. to sort of give you a data confidence meter, if you will. So a I If you have a date, a dictionary and you have And if the most senior person in the room doesn't like the data, so the conversation at the senior senior meetings are no longer your data is different Okay, Now what are we going to do? a lot of customer situations because yeah, you know, you're doing so. So really, you know, I've only been refocused on the internal transformation for What sort of the characteristics in the attributes, given your And then you also have some CDOs, which I I I can't take credit for it. The Cube is great to see you Thanks for having me We'll be back at the IBM data in a I form from Miami.

ENTITIES

Entity	Category	Confidence
Seth	PERSON	0.99+
Arvin Krishna	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Daubert	PERSON	0.99+
Boston	LOCATION	0.99+
Rob Thomas	PERSON	0.99+
Dave	PERSON	0.99+
Ginny Rometty	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
IBM Research	ORGANIZATION	0.99+
two years	QUANTITY	0.99+
Miami	LOCATION	0.99+
Excel	TITLE	0.99+
eight years	QUANTITY	0.99+
88 weeks	QUANTITY	0.99+
Rob Thomas	PERSON	0.99+
Gardner	PERSON	0.99+
Sarah	PERSON	0.99+
Miami, Florida	LOCATION	0.99+
34 months	QUANTITY	0.99+
17,000 features	QUANTITY	0.99+
two things	QUANTITY	0.99+
10th year	QUANTITY	0.99+
two weeks	QUANTITY	0.99+
1700 people	QUANTITY	0.99+
Gartner	ORGANIZATION	0.99+
Cognos	TITLE	0.99+
three years ago	DATE	0.99+
two months	QUANTITY	0.99+
first time	QUANTITY	0.98+
one	QUANTITY	0.98+
today	DATE	0.98+
each business	QUANTITY	0.97+
first couple	QUANTITY	0.97+
Interpol	ORGANIZATION	0.96+
about 4	QUANTITY	0.96+
Thompson	PERSON	0.96+
third quarter	DATE	0.96+
five different businesses	QUANTITY	0.95+
Two data	QUANTITY	0.95+
Intercontinental Hotel	ORGANIZATION	0.94+
IBM Data	ORGANIZATION	0.94+
first	QUANTITY	0.93+
single job	QUANTITY	0.93+
first thing	QUANTITY	0.92+
Cognos	ORGANIZATION	0.91+
last couple of years	DATE	0.91+
single source	QUANTITY	0.89+
few months ago	DATE	0.89+
one more questions	QUANTITY	0.89+
couple months ago	DATE	0.88+
Cloudpack	TITLE	0.87+
single version	QUANTITY	0.87+
Cube	COMMERCIAL_ITEM	0.86+
80% of	QUANTITY	0.85+
last six months	DATE	0.84+
Claude Incognito	ORGANIZATION	0.84+
agile	TITLE	0.84+
10%	QUANTITY	0.84+
years	DATE	0.84+
Moore	ORGANIZATION	0.82+
zero	QUANTITY	0.81+
three years	QUANTITY	0.8+
2020 election	EVENT	0.8+
E PM	TITLE	0.79+
four sprints	QUANTITY	0.79+
Watson	ORGANIZATION	0.77+
2nd 1	QUANTITY	0.75+

Seth Dobrin, IBM | IBM CDO Summit 2019

>> Live from San Francisco, California, it's the theCUBE, covering the IBM Chief Data Officer Summit, brought to you by IBM. >> Welcome back to San Francisco everybody. You're watching theCUBE, the leader in live tech coverage. We go out to the events, we extract the signal from the noise and we're here at the IBM Chief Data Officer Summit, 10th anniversary. Seth Dobrin is here, he's the Vice President and Chief Data Officer of the IBM Analytics Group. Seth, always a pleasure to have you on. Good to see you again. >> Yeah, thanks for having me back Dave. >> You're very welcome. So I love these events you get a chance to interact with chief data officers, guys like yourself. We've been talking a lot today about IBM's internal transformation, how IBM itself is operationalizing AI and maybe we can talk about that, but I'm most interested in how you're pointing that at customers. What have you learned from your internal experiences and what are you bringing to customers? >> Yeah, so, you know, I was hired at IBM to lead part of our internal transformation, so I spent a lot of time doing that. >> Right. >> I've also, you know, when I came over to IBM I had just left Monsanto where I led part of their transformation. So I spent the better part of the first year or so at IBM not only focusing on our internal efforts, but helping our clients transform. And out of that I found that many of our clients needed help and guidance on how to do this. And so I started a team we call, The Data Science an AI Elite Team, and really what we do is we sit down with clients, we share not only our experience, but the methodology that we use internally at IBM so leveraging things like design thinking, DevOps, Agile, and how you implement that in the context of data science and AI. >> I've got a question, so Monsanto, obviously completely different business than IBM-- >> Yeah. >> But when we talk about digital transformation and then talk about the difference between a business and a digital business, it comes down to the data. And you've seen a lot of examples where you see companies traversing industries which never used to happen before. You know, Apple getting into music, there are many, many examples, and the theory is, well, it's 'cause it's data. So when you think about your experiences of a completely different industry bringing now the expertise to IBM, were there similarities that you're able to draw upon, or was it a completely different experience? >> No, I think there's tons of similarities which is, which is part of why I was excited about this and I think IBM was excited to have me. >> Because the chances for success were quite high in your mind? >> Yeah, yeah, because the chance for success were quite high, and also, you know, if you think about it there's on the, how you implement, how you execute, the differences are really cultural more than they're anything to do with the business, right? So it's, the whole role of a Chief Data Officer, or Chief Digital Officer, or a Chief Analytics Officer, is to drive fundamental change in the business, right? So it's how do you manage that cultural change, how do you build bridges, how do you make people, how do you make people a little uncomfortable, but at the same time get them excited about how to leverage things like data, and analytics, and AI, to change how they do business. And really this concept of a digital transformation is about moving away from traditional products and services, more towards outcome-based services and not selling things, but selling, as a Service, right? And it's the same whether it's IBM, you know, moving away from fully transactional to Cloud and subscription-based offerings. Or it's a bank reimagining how they interact with their customers, or it's oil and gas company, or it's a company like Monsanto really thinking about how do we provide outcomes. >> But how do you make sure that every, as a Service, is not a snowflake and it can scale so that you can actually, you know, make it a business? >> So underneath the, as a Service, is a few things. One is, data, one is, machine learning and AI, the other is really understanding your customer, right, because truly digital companies do everything through the eyes of their customer and so every company has many, many versions of their customer until they go through an exercise of creating a single version, right, a customer or a Client 360, if you will, and we went through that exercise at IBM. And those are all very consistent things, right? They're all pieces that kind of happen the same way in every company regardless of the industry and then you get into understanding what the desires of your customer are to do business with you differently. >> So you were talking before about the Chief Digital Officer, a Chief Data Officer, Chief Analytics Officer, as a change agent making people feel a little bit uncomfortable, explore that a little bit what's that, asking them questions that intuitively they, they know they need to have the answer to, but they don't through data? What did you mean by that? >> Yeah so here's the conversations that usually happen, right? You go and you talk to you peers in the organization and you start having conversations with them about what decisions are they trying to make, right? And you're the Chief Data Officer, you're responsible for that, and inevitably the conversation goes something like this, and I'm going to paraphrase. Give me the data I need to support my preconceived notions. >> (laughing) Yeah. >> Right? >> Right. >> And that's what they want to (voice covers voice). >> Here's the answer give me the data that-- >> That's right. So I want a Dashboard that helps me support this. And the uncomfortableness comes in a couple of things in that. It's getting them to let go of that and allow the data to provide some inkling of things that they didn't know were going on, that's one piece. The other is, then you start leveraging machine learning, or AI, to actually help start driving some decisions, so limiting the scope from infinity down to two or three things and surfacing those two or three things and telling people in your business your choices are one of these three things, right? That starts to make people feel uncomfortable and really is a challenge for that cultural change getting people used to trusting the machine, or in some instances even, trusting the machine to make the decision for you, or part of the decision for you. >> That's got to be one of the biggest cultural challenges because you've got somebody who's, let's say they run a big business, it's a profitable business, it's the engine of cashflow at the company, and you're saying, well, that's not what the data says. And you're, say okay, here's a future path-- >> Yeah. >> For success, but it's going to be disruptive, there's going to be a change and I can see people not wanting to go there. >> Yeah, and if you look at, to the point about, even businesses that are making the most money, or parts of a business that are making the most money, if you look at what the business journals say you start leveraging data and AI, you get double-digit increases in your productivity, in your, you know, in differentiation from your competitors. That happens inside of businesses too. So the conversation even with the most profitable parts of the business, or highly, contributing the most revenue is really what we could do better, right? You could get better margins on this revenue you're driving, you could, you know, that's the whole point is to get better leveraging data and AI to increase your margins, increase your revenue, all through data and AI. And then things like moving to, as a Service, from single point to transaction, that's a whole different business model and that leads from once every two or three or five years, getting revenue, to you get revenue every month, right? That's highly profitable for companies because you don't have to go in and send your sales force in every time to sell something, they buy something once, and they continue to pay as long as you keep 'em happy. >> But I can see that scaring people because if the incentives don't shift to go from a, you know, pay all up front, right, there's so many parts of the organization that have to align with that in order for that culture to actually occur. So can you give some examples of how you've, I mean obviously you ran through that at IBM, you saw-- >> Yeah. >> I'm sure a lot of that, got a lot of learnings and then took that to clients. Maybe some examples of client successes that you've had, or even not so successes that you've learned from. >> Yeah, so in terms of client success, I think many of our clients are just beginning this journey, certainly the ones I work with are beginning their journey so it's hard for me to say, client X has successfully done this. But I can certainly talk about how we've gone in, and some of the use cases we've done-- >> Great. >> With certain clients to think about how they transformed their business. So maybe the biggest bang for the buck one is in the oil and gas industry. So ExxonMobile was on stage with me at, Think, talking about-- >> Great. >> Some of the work that we've done with them in their upstream business, right? So every time they drop a well it costs them not thousands of dollars, but hundreds of millions of dollars. And in the oil and gas industry you're talking massive data, right, tens or hundreds of petabytes of data that constantly changes. And no one in that industry really had a data platform that could handle this dynamically. And it takes them months to get, to even start to be able to make a decision. So they really want us to help them figure out, well, how do we build a data platform on this massive scale that enables us to be able to make decisions more rapidly? And so the aim was really to cut this down from 90 days to less than a month. And through leveraging some of our tools, as well as some open-source technology, and teaching them new ways of working, we were able to lay down this foundation. Now this is before, we haven't even started thinking about helping them with AI, oil and gas industry has been doing this type of thing for decades, but they really were struggling with this platform. So that's a big success where, at least for the pilot, which was a small subset of their fields, we were able to help them reduce that timeframe by a lot to be able to start making a decision. >> So an example of a decision might be where to drill next? >> That's exactly the decision they're trying to make. >> Because for years, in that industry, it was boop, oh, no oil, boop, oh, no oil. >> Yeah, well. >> And they got more sophisticated, they started to use data, but I think what you're saying is, the time it took for that analysis was quite long. >> So the time it took to even overlay things like seismic data, topography data, what's happened in wells, and core as they've drilled around that, was really protracted just to pull the data together, right? And then once they got the data together there were some really, really smart people looking at it going, well, my experience says here, and it was driven by the data, but it was not driven by an algorithm. >> A little bit of art. >> True, a lot of art, right, and it still is. So now they want some AI, or some machine learning, to help guide those geophysicists to help determine where, based on the data, they should be dropping wells. And these are hundred million and billion dollar decisions they're making so it's really about how do we help them. >> And that's just one example, I mean-- >> Yeah. >> Every industry has it's own use cases, or-- >> Yeah, and so that's on the front end, right, about the data foundation, and then if you go to a company that was really advanced in leveraging analytics, or machine learning, JPMorgan Chase, in their, they have a division, and also they were on stage with me at, Think, that they had, basically everything is driven by a model, so they give traders a series of models and they make decisions. And now they need to monitor those models, those hundreds of models they have for misuse of those models, right? And so they needed to build a series of models to manage, to monitor their models. >> Right. >> And this was a tremendous deep-learning use case and they had just bought a power AI box from us so they wanted to start leveraging GPUs. And we really helped them figure out how do you navigate and what's the difference between building a model leveraging GPUs, compared to CPUs? How do you use it to accelerate the output, and again, this was really a cost-avoidance play because if people misuse these models they can get in a lot of trouble. But they also need to make these decisions very quickly because a trader goes to make a trade they need to make a decision, was this used properly or not before that trade is kicked off and milliseconds make a difference in the stock market so they needed a model. And one of the things about, you know, when you start leveraging GPUs and deep learning is sometimes you need these GPUs to do training and sometimes you need 'em to do training and scoring. And this was a case where you need to also build a pipeline that can leverage the GPUs for scoring as well which is actually quite complicated and not as straight forward as you might think. In near real time, in real time. >> Pretty close to real time. >> You can't get much more real time then those things, potentially to stop a trade before it occurs to protect the firm. >> Yeah. >> Right, or RELug it. >> Yeah, and don't quote, I think this is right, I think they actually don't do trades until it's confirmed and so-- >> Right. >> Or that's the desire as to not (voice covers voice). >> Well, and then now you're in a competitive situation where, you know. >> Yeah, I mean people put these trading floors as close to the stock exchange as they can-- >> Physically. >> Physically to (voice covers voice)-- >> To the speed of light right? >> Right, so every millisecond counts. >> Yeah, read Flash Boys-- >> Right, yeah. >> So, what's the biggest challenge you're finding, both at IBM and in your clients, in terms of operationalizing AI. Is it technology? Is it culture? Is it process? Is it-- >> Yeah, so culture is always hard, but I think as we start getting to really think about integrating AI and data into our operations, right? As you look at what software development did with this whole concept of DevOps, right, and really rapidly iterating, but getting things into a production-ready pipeline, looking at continuous integration, continuous development, what does that mean for data and AI? And these concept of DataOps and AIOps, right? And I think DataOps is very similar to DevOps in that things don't change that rapidly, right? You build your data pipeline, you build your data assets, you integrate them. They may change on the weeks, or months timeframe, but they're not changing on the hours, or days timeframe. As you get into some of these AI models some of them need to be retrained within a day, right, because the data changes, they fall out of parameters, or the parameters are very narrow and you need to keep 'em in there, what does that mean? How do you integrate this for your, into your CI/CD pipeline? How do you know when you need to do regression testing on the whole thing again? Does your data science and AI pipeline even allow for you to integrate into your current CI/CD pipeline? So this is actually an IBM-wide effort that my team is leading to start thinking about, how do we incorporate what we're doing into people's CI/CD pipeline so we can enable AIOps, if you will, or MLOps, and really, really IBM is the only company that's positioned to do that for so many reasons. One is, we're the only one with an end-to-end toolchain. So we do everything from data, feature development, feature engineering, generating models, whether selecting models, whether it's auto AI, or hand coding or visual modeling into things like trust and transparency. And so we're the only one with that entire toolchain. Secondly, we've got IBM research, we've got decades of industry experience, we've got our IBM Services Organization, all of us have been tackling with this with large enterprises so we're uniquely positioned to really be able to tackle this in a very enterprised-grade manner. >> Well, and the leverage that you can get within IBM and for your customers. >> And leveraging our clients, right? >> It's off the charts. >> We have six clients that are our most advanced clients that are working with us on this so it's not just us in a box, it's us with our clients working on this. >> So what are you hoping to have happen today? We're just about to get started with the keynotes. >> Yeah. >> We're going to take a break and then come back after the keynotes and we've got some great guests, but what are you hoping to get out of today? >> Yeah, so I've been with IBM for 2 1/2 years and I, and this is my eighth CEO Summit, so I've been to many more of these than I've been at IBM. And I went to these religiously before I joined IBM really for two reasons. One, there's no sales pitch, right, it's not a trade show. The second is it's the only place where I get the opportunity to listen to my peers and really have open and candid conversations about the challenges they're facing and how they're addressing them and really giving me insights into what other industries are doing and being able to benchmark me and my organization against the leading edge of what's going on in this space. >> I love it and that's why I love coming to these events. It's practitioners talking to practitioners. Seth Dobrin thanks so much for coming to theCUBE. >> Yeah, thanks always, Dave. >> Always a pleasure. All right, keep it right there everybody we'll be right back right after this short break. You're watching, theCUBE, live from San Francisco. Be right back.

Published Date : Jun 24 2019

SUMMARY :

brought to you by IBM. Seth, always a pleasure to have you on. Yeah, thanks for and what are you bringing to customers? to lead part of our DevOps, Agile, and how you implement that bringing now the expertise to IBM, and I think IBM was excited to have me. and analytics, and AI, to to do business with you differently. Give me the data I need to And that's what they want to and allow the data to provide some inkling That's got to be there's going to be a and they continue to pay as that have to align with that and then took that to clients. and some of the use cases So maybe the biggest bang for the buck one And so the aim was really That's exactly the decision it was boop, oh, no oil, boop, oh, they started to use data, but So the time it took to help guide those geophysicists And so they needed to build And one of the things about, you know, to real time. to protect the firm. Or that's the desire as to not Well, and then now so every millisecond counts. both at IBM and in your clients, and you need to keep 'em in there, Well, and the leverage that you can get We have six clients that So what are you hoping and being able to benchmark talking to practitioners. Yeah, after this short break.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Seth Dobrin	PERSON	0.99+
San Francisco	LOCATION	0.99+
Seth	PERSON	0.99+
JPMorgan Chase	ORGANIZATION	0.99+
Monsanto	ORGANIZATION	0.99+
90 days	QUANTITY	0.99+
two	QUANTITY	0.99+
six clients	QUANTITY	0.99+
Dave	PERSON	0.99+
hundred million	QUANTITY	0.99+
tens	QUANTITY	0.99+
Apple	ORGANIZATION	0.99+
one piece	QUANTITY	0.99+
ExxonMobile	ORGANIZATION	0.99+
IBM Analytics Group	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
San Francisco, California	LOCATION	0.99+
less than a month	QUANTITY	0.99+
2 1/2 years	QUANTITY	0.99+
three	QUANTITY	0.99+
one example	QUANTITY	0.99+
today	DATE	0.99+
thousands of dollars	QUANTITY	0.99+
one	QUANTITY	0.99+
five years	QUANTITY	0.98+
One	QUANTITY	0.98+
second	QUANTITY	0.98+
two reasons	QUANTITY	0.98+
hundreds of petabytes	QUANTITY	0.97+
hundreds of millions of dollars	QUANTITY	0.97+
hundreds of models	QUANTITY	0.97+
10th anniversary	QUANTITY	0.97+
IBM Chief Data Officer Summit	EVENT	0.97+
three things	QUANTITY	0.96+
single point	QUANTITY	0.96+
decades	QUANTITY	0.95+
billion dollar	QUANTITY	0.95+
Flash Boys	TITLE	0.95+
single version	QUANTITY	0.95+
Secondly	QUANTITY	0.94+
both	QUANTITY	0.92+
IBM Services Organization	ORGANIZATION	0.9+
IBM Chief Data Officer Summit	EVENT	0.9+
first year	QUANTITY	0.89+
once	QUANTITY	0.87+
IBM CDO Summit 2019	EVENT	0.83+
DataOps	TITLE	0.72+
years	QUANTITY	0.72+
Vice President	PERSON	0.69+
Think	ORGANIZATION	0.69+
every millisecond	QUANTITY	0.68+
DevOps	TITLE	0.68+
once every	QUANTITY	0.67+
double-	QUANTITY	0.62+
eighth CEO	QUANTITY	0.62+
Chief Data Officer	PERSON	0.6+
UBE	ORGANIZATION	0.59+
360	COMMERCIAL_ITEM	0.58+
RELug	ORGANIZATION	0.56+

Sreesha Rao, Niagara Bottling & Seth Dobrin, IBM | Change The Game: Winning With AI 2018

>> Live, from Times Square, in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI. Brought to you by IBM. >> Welcome back to the Big Apple, everybody. I'm Dave Vellante, and you're watching theCUBE, the leader in live tech coverage, and we're here covering a special presentation of IBM's Change the Game: Winning with AI. IBM's got an analyst event going on here at the Westin today in the theater district. They've got 50-60 analysts here. They've got a partner summit going on, and then tonight, at Terminal 5 of the West Side Highway, they've got a customer event, a lot of customers there. We've talked earlier today about the hard news. Seth Dobern is here. He's the Chief Data Officer of IBM Analytics, and he's joined by Shreesha Rao who is the Senior Manager of IT Applications at California-based Niagara Bottling. Gentlemen, welcome to theCUBE. Thanks so much for coming on. >> Thank you, Dave. >> Well, thanks Dave for having us. >> Yes, always a pleasure Seth. We've known each other for a while now. I think we met in the snowstorm in Boston, sparked something a couple years ago. >> Yep. When we were both trapped there. >> Yep, and at that time, we spent a lot of time talking about your internal role as the Chief Data Officer, working closely with Inderpal Bhandari, and you guys are doing inside of IBM. I want to talk a little bit more about your other half which is working with clients and the Data Science Elite Team, and we'll get into what you're doing with Niagara Bottling, but let's start there, in terms of that side of your role, give us the update. >> Yeah, like you said, we spent a lot of time talking about how IBM is implementing the CTO role. While we were doing that internally, I spent quite a bit of time flying around the world, talking to our clients over the last 18 months since I joined IBM, and we found a consistent theme with all the clients, in that, they needed help learning how to implement data science, AI, machine learning, whatever you want to call it, in their enterprise. There's a fundamental difference between doing these things at a university or as part of a Kaggle competition than in an enterprise, so we felt really strongly that it was important for the future of IBM that all of our clients become successful at it because what we don't want to do is we don't want in two years for them to go "Oh my God, this whole data science thing was a scam. We haven't made any money from it." And it's not because the data science thing is a scam. It's because the way they're doing it is not conducive to business, and so we set up this team we call the Data Science Elite Team, and what this team does is we sit with clients around a specific use case for 30, 60, 90 days, it's really about 3 or 4 sprints, depending on the material, the client, and how long it takes, and we help them learn through this use case, how to use Python, R, Scala in our platform obviously, because we're here to make money too, to implement these projects in their enterprise. Now, because it's written in completely open-source, if they're not happy with what the product looks like, they can take their toys and go home afterwards. It's on us to prove the value as part of this, but there's a key point here. My team is not measured on sales. They're measured on adoption of AI in the enterprise, and so it creates a different behavior for them. So they're really about "Make the enterprise successful," right, not "Sell this software." >> Yeah, compensation drives behavior. >> Yeah, yeah. >> So, at this point, I ask, "Well, do you have any examples?" so Shreesha, let's turn to you. (laughing softly) Niagara Bottling -- >> As a matter of fact, Dave, we do. (laughing) >> Yeah, so you're not a bank with a trillion dollars in assets under management. Tell us about Niagara Bottling and your role. >> Well, Niagara Bottling is the biggest private label bottled water manufacturing company in the U.S. We make bottled water for Costcos, Walmarts, major national grocery retailers. These are our customers whom we service, and as with all large customers, they're demanding, and we provide bottled water at relatively low cost and high quality. >> Yeah, so I used to have a CIO consultancy. We worked with every CIO up and down the East Coast. I always observed, really got into a lot of organizations. I was always observed that it was really the heads of Application that drove AI because they were the glue between the business and IT, and that's really where you sit in the organization, right? >> Yes. My role is to support the business and business analytics as well as I support some of the distribution technologies and planning technologies at Niagara Bottling. >> So take us the through the project if you will. What were the drivers? What were the outcomes you envisioned? And we can kind of go through the case study. >> So the current project that we leveraged IBM's help was with a stretch wrapper project. Each pallet that we produce--- we produce obviously cases of bottled water. These are stacked into pallets and then shrink wrapped or stretch wrapped with a stretch wrapper, and this project is to be able to save money by trying to optimize the amount of stretch wrap that goes around a pallet. We need to be able to maintain the structural stability of the pallet while it's transported from the manufacturing location to our customer's location where it's unwrapped and then the cases are used. >> And over breakfast we were talking. You guys produce 2833 bottles of water per second. >> Wow. (everyone laughs) >> It's enormous. The manufacturing line is a high speed manufacturing line, and we have a lights-out policy where everything runs in an automated fashion with raw materials coming in from one end and the finished goods, pallets of water, going out. It's called pellets to pallets. Pellets of plastic coming in through one end and pallets of water going out through the other end. >> Are you sitting on top of an aquifer? Or are you guys using sort of some other techniques? >> Yes, in fact, we do bore wells and extract water from the aquifer. >> Okay, so the goal was to minimize the amount of material that you used but maintain its stability? Is that right? >> Yes, during transportation, yes. So if we use too much plastic, we're not optimally, I mean, we're wasting material, and cost goes up. We produce almost 16 million pallets of water every single year, so that's a lot of shrink wrap that goes around those, so what we can save in terms of maybe 15-20% of shrink wrap costs will amount to quite a bit. >> So, how does machine learning fit into all of this? >> So, machine learning is way to understand what kind of profile, if we can measure what is happening as we wrap the pallets, whether we are wrapping it too tight or by stretching it, that results in either a conservative way of wrapping the pallets or an aggressive way of wrapping the pallets. >> I.e. too much material, right? >> Too much material is conservative, and aggressive is too little material, and so we can achieve some savings if we were to alternate between the profiles. >> So, too little material means you lose product, right? >> Yes, and there's a risk of breakage, so essentially, while the pallet is being wrapped, if you are stretching it too much there's a breakage, and then it interrupts production, so we want to try and avoid that. We want a continuous production, at the same time, we want the pallet to be stable while saving material costs. >> Okay, so you're trying to find that ideal balance, and how much variability is in there? Is it a function of distance and how many touches it has? Maybe you can share with that. >> Yes, so each pallet takes about 16-18 wraps of the stretch wrapper going around it, and that's how much material is laid out. About 250 grams of plastic that goes on there. So we're trying to optimize the gram weight which is the amount of plastic that goes around each of the pallet. >> So it's about predicting how much plastic is enough without having breakage and disrupting your line. So they had labeled data that was, "if we stretch it this much, it breaks. If we don't stretch it this much, it doesn't break, but then it was about predicting what's good enough, avoiding both of those extremes, right? >> Yes. >> So it's a truly predictive and iterative model that we've built with them. >> And, you're obviously injecting data in terms of the trip to the store as well, right? You're taking that into consideration in the model, right? >> Yeah that's mainly to make sure that the pallets are stable during transportation. >> Right. >> And that is already determined how much containment force is required when your stretch and wrap each pallet. So that's one of the variables that is measured, but the inputs and outputs are-- the input is the amount of material that is being used in terms of gram weight. We are trying to minimize that. So that's what the whole machine learning exercise was. >> And the data comes from where? Is it observation, maybe instrumented? >> Yeah, the instruments. Our stretch-wrapper machines have an ignition platform, which is a Scada platform that allows us to measure all of these variables. We would be able to get machine variable information from those machines and then be able to hopefully, one day, automate that process, so the feedback loop that says "On this profile, we've not had any breaks. We can continue," or if there have been frequent breaks on a certain profile or machine setting, then we can change that dynamically as the product is moving through the manufacturing process. >> Yeah, so think of it as, it's kind of a traditional manufacturing production line optimization and prediction problem right? It's minimizing waste, right, while maximizing the output and then throughput of the production line. When you optimize a production line, the first step is to predict what's going to go wrong, and then the next step would be to include precision optimization to say "How do we maximize? Using the constraints that the predictive models give us, how do we maximize the output of the production line?" This is not a unique situation. It's a unique material that we haven't really worked with, but they had some really good data on this material, how it behaves, and that's key, as you know, Dave, and probable most of the people watching this know, labeled data is the hardest part of doing machine learning, and building those features from that labeled data, and they had some great data for us to start with. >> Okay, so you're collecting data at the edge essentially, then you're using that to feed the models, which is running, I don't know, where's it running, your data center? Your cloud? >> Yeah, in our data center, there's an instance of DSX Local. >> Okay. >> That we stood up. Most of the data is running through that. We build the models there. And then our goal is to be able to deploy to the edge where we can complete the loop in terms of the feedback that happens. >> And iterate. (Shreesha nods) >> And DSX Local, is Data Science Experience Local? >> Yes. >> Slash Watson Studio, so they're the same thing. >> Okay now, what role did IBM and the Data Science Elite Team play? You could take us through that. >> So, as we discussed earlier, adopting data science is not that easy. It requires subject matter, expertise. It requires understanding of data science itself, the tools and techniques, and IBM brought that as a part of the Data Science Elite Team. They brought both the tools and the expertise so that we could get on that journey towards AI. >> And it's not a "do the work for them." It's a "teach to fish," and so my team sat side by side with the Niagara Bottling team, and we walked them through the process, so it's not a consulting engagement in the traditional sense. It's how do we help them learn how to do it? So it's side by side with their team. Our team sat there and walked them through it. >> For how many weeks? >> We've had about two sprints already, and we're entering the third sprint. It's been about 30-45 days between sprints. >> And you have your own data science team. >> Yes. Our team is coming up to speed using this project. They've been trained but they needed help with people who have done this, been there, and have handled some of the challenges of modeling and data science. >> So it accelerates that time to --- >> Value. >> Outcome and value and is a knowledge transfer component -- >> Yes, absolutely. >> It's occurring now, and I guess it's ongoing, right? >> Yes. The engagement is unique in the sense that IBM's team came to our factory, understood what that process, the stretch-wrap process looks like so they had an understanding of the physical process and how it's modeled with the help of the variables and understand the data science modeling piece as well. Once they know both side of the equation, they can help put the physical problem and the digital equivalent together, and then be able to correlate why things are happening with the appropriate data that supports the behavior. >> Yeah and then the constraints of the one use case and up to 90 days, there's no charge for those two. Like I said, it's paramount that our clients like Niagara know how to do this successfully in their enterprise. >> It's a freebie? >> No, it's no charge. Free makes it sound too cheap. (everybody laughs) >> But it's part of obviously a broader arrangement with buying hardware and software, or whatever it is. >> Yeah, its a strategy for us to help make sure our clients are successful, and I want it to minimize the activation energy to do that, so there's no charge, and the only requirements from the client is it's a real use case, they at least match the resources I put on the ground, and they sit with us and do things like this and act as a reference and talk about the team and our offerings and their experiences. >> So you've got to have skin in the game obviously, an IBM customer. There's got to be some commitment for some kind of business relationship. How big was the collective team for each, if you will? >> So IBM had 2-3 data scientists. (Dave takes notes) Niagara matched that, 2-3 analysts. There were some working with the machines who were familiar with the machines and others who were more familiar with the data acquisition and data modeling. >> So each of these engagements, they cost us about $250,000 all in, so they're quite an investment we're making in our clients. >> I bet. I mean, 2-3 weeks over many, many weeks of super geeks time. So you're bringing in hardcore data scientists, math wizzes, stat wiz, data hackers, developer--- >> Data viz people, yeah, the whole stack. >> And the level of skills that Niagara has? >> We've got actual employees who are responsible for production, our manufacturing analysts who help aid in troubleshooting problems. If there are breakages, they go analyze why that's happening. Now they have data to tell them what to do about it, and that's the whole journey that we are in, in trying to quantify with the help of data, and be able to connect our systems with data, systems and models that help us analyze what happened and why it happened and what to do before it happens. >> Your team must love this because they're sort of elevating their skills. They're working with rock star data scientists. >> Yes. >> And we've talked about this before. A point that was made here is that it's really important in these projects to have people acting as product owners if you will, subject matter experts, that are on the front line, that do this everyday, not just for the subject matter expertise. I'm sure there's executives that understand it, but when you're done with the model, bringing it to the floor, and talking to their peers about it, there's no better way to drive this cultural change of adopting these things and having one of your peers that you respect talk about it instead of some guy or lady sitting up in the ivory tower saying "thou shalt." >> Now you don't know the outcome yet. It's still early days, but you've got a model built that you've got confidence in, and then you can iterate that model. What's your expectation for the outcome? >> We're hoping that preliminary results help us get up the learning curve of data science and how to leverage data to be able to make decisions. So that's our idea. There are obviously optimal settings that we can use, but it's going to be a trial and error process. And through that, as we collect data, we can understand what settings are optimal and what should we be using in each of the plants. And if the plants decide, hey they have a subjective preference for one profile versus another with the data we are capturing we can measure when they deviated from what we specified. We have a lot of learning coming from the approach that we're taking. You can't control things if you don't measure it first. >> Well, your objectives are to transcend this one project and to do the same thing across. >> And to do the same thing across, yes. >> Essentially pay for it, with a quick return. That's the way to do things these days, right? >> Yes. >> You've got more narrow, small projects that'll give you a quick hit, and then leverage that expertise across the organization to drive more value. >> Yes. >> Love it. What a great story, guys. Thanks so much for coming to theCUBE and sharing. >> Thank you. >> Congratulations. You must be really excited. >> No. It's a fun project. I appreciate it. >> Thanks for having us, Dave. I appreciate it. >> Pleasure, Seth. Always great talking to you, and keep it right there everybody. You're watching theCUBE. We're live from New York City here at the Westin Hotel. cubenyc #cubenyc Check out the ibm.com/winwithai Change the Game: Winning with AI Tonight. We'll be right back after a short break. (minimal upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by IBM. at Terminal 5 of the West Side Highway, I think we met in the snowstorm in Boston, sparked something When we were both trapped there. Yep, and at that time, we spent a lot of time and we found a consistent theme with all the clients, So, at this point, I ask, "Well, do you have As a matter of fact, Dave, we do. Yeah, so you're not a bank with a trillion dollars Well, Niagara Bottling is the biggest private label and that's really where you sit in the organization, right? and business analytics as well as I support some of the And we can kind of go through the case study. So the current project that we leveraged IBM's help was And over breakfast we were talking. (everyone laughs) It's called pellets to pallets. Yes, in fact, we do bore wells and So if we use too much plastic, we're not optimally, as we wrap the pallets, whether we are wrapping it too little material, and so we can achieve some savings so we want to try and avoid that. and how much variability is in there? goes around each of the pallet. So they had labeled data that was, "if we stretch it this that we've built with them. Yeah that's mainly to make sure that the pallets So that's one of the variables that is measured, one day, automate that process, so the feedback loop the predictive models give us, how do we maximize the Yeah, in our data center, Most of the data And iterate. the Data Science Elite Team play? so that we could get on that journey towards AI. And it's not a "do the work for them." and we're entering the third sprint. some of the challenges of modeling and data science. that supports the behavior. Yeah and then the constraints of the one use case No, it's no charge. with buying hardware and software, or whatever it is. minimize the activation energy to do that, There's got to be some commitment for some and others who were more familiar with the So each of these engagements, So you're bringing in hardcore data scientists, math wizzes, and that's the whole journey that we are in, in trying to Your team must love this because that are on the front line, that do this everyday, and then you can iterate that model. And if the plants decide, hey they have a subjective and to do the same thing across. That's the way to do things these days, right? across the organization to drive more value. Thanks so much for coming to theCUBE and sharing. You must be really excited. I appreciate it. I appreciate it. Change the Game: Winning with AI Tonight.

ENTITIES

Entity	Category	Confidence
Shreesha Rao	PERSON	0.99+
Seth Dobern	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Walmarts	ORGANIZATION	0.99+
Costcos	ORGANIZATION	0.99+
Dave	PERSON	0.99+
30	QUANTITY	0.99+
Boston	LOCATION	0.99+
New York City	LOCATION	0.99+
California	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
60	QUANTITY	0.99+
Niagara	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Shreesha	PERSON	0.99+
U.S.	LOCATION	0.99+
Sreesha Rao	PERSON	0.99+
third sprint	QUANTITY	0.99+
90 days	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
Inderpal Bhandari	PERSON	0.99+
Niagara Bottling	ORGANIZATION	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
ibm.com/winwithai	OTHER	0.99+
one	QUANTITY	0.99+
Terminal 5	LOCATION	0.99+
two years	QUANTITY	0.99+
about $250,000	QUANTITY	0.98+
Times Square	LOCATION	0.98+
Scala	TITLE	0.98+
2018	DATE	0.98+
15-20%	QUANTITY	0.98+
IBM Analytics	ORGANIZATION	0.98+
each	QUANTITY	0.98+
today	DATE	0.98+
each pallet	QUANTITY	0.98+
Kaggle	ORGANIZATION	0.98+
West Side Highway	LOCATION	0.97+
Each pallet	QUANTITY	0.97+
4 sprints	QUANTITY	0.97+
About 250 grams	QUANTITY	0.97+
both side	QUANTITY	0.96+
Data Science Elite Team	ORGANIZATION	0.96+
one day	QUANTITY	0.95+
every single year	QUANTITY	0.95+
Niagara Bottling	PERSON	0.93+
about two sprints	QUANTITY	0.93+
one end	QUANTITY	0.93+
R	TITLE	0.92+
2-3 weeks	QUANTITY	0.91+
one profile	QUANTITY	0.91+
50-60 analysts	QUANTITY	0.91+
trillion dollars	QUANTITY	0.9+
2-3 data scientists	QUANTITY	0.9+
about 30-45 days	QUANTITY	0.88+
almost 16 million pallets of water	QUANTITY	0.88+
Big Apple	LOCATION	0.87+
couple years ago	DATE	0.87+
last 18 months	DATE	0.87+
Westin Hotel	ORGANIZATION	0.83+
pallet	QUANTITY	0.83+
#cubenyc	LOCATION	0.82+
2833 bottles of water per second	QUANTITY	0.82+
the Game: Winning with AI	TITLE	0.81+

Seth Dobrin, IBM & Asim Tewary, Verizon | IBM CDO Summit Spring 2018

>> Narrator: Live from downtown San Francisco, it's The Cube, covering IBM chief data officer strategy summit 2018, brought to you by IBM. (playful music) >> Welcome back to the IBM chief data officer strategy summit in San Francisco. We're here at the Parc 55. My name is Dave Vellante, and you're watching The Cube, the leader in live tech coverage, #IBMCDO. Seth Dobrin is here. He's the chief data officer for IBM analytics. Seth, good to see you again. >> Good to see you again, Dave. >> Many time Cube alum; thanks for coming back on. Asim Tewary, Tewary? Tewary; sorry. >> Tewary, yes. >> Asim Tewary; I can't read my own writing. Head of data science and advanced analytics at Verizon, and from Jersey. Two east coast boys, three east coast boys. >> Three east coast boys. >> Yeah. >> Welcome, gentlemen. >> Thank you. >> Asim, you guys had a panel earlier today. Let's start with you. What's your role? I mean, we talked you're the defacto chief data officer at Verizon. >> Yes, I'm responsible for all the data ingestion platform, big data, and the data science for Verizon, for wireless, wire line, and enterprise businesses. >> It's a relatively new role at Verizon? You were saying previously you were CDO at a financial services organization. Common that a financial service organization would have a chief data officer. How did the role come about at Verizon? Are you Verizon's first CDO or-- >> I was actually brought in to really pull together the analytics and data across the enterprise, because there was a realization that data only creates value when you're able to get it from all the difference sources. We had separate teams in the past. My role was to bring it all together, to have a common platform, common data science team to drive revenue across the businesses. >> Seth, this is a big challenge, obviously. We heard Caitlyn this morning, talking about the organizational challenges. You got data in silos. Inderpal and your team are basically, I call it dog-fooding. You're drinking your own champagne. >> Champagne-ing, yeah. >> Yeah, okay, but you have a similar challenge. You have big company, complex, a lot of data silos coming. Yeah, I mean, IBM is really, think of it as five companies, right? Any one of them would be a fortune 500 company in and of themselves. Even within each of those, there were silos, and then Inderpal trying to bring them across, you know, the data from across all of them is really challenging. Honestly, the technology part, the bringing it together is the easy part. It's the cultural change that goes along with it that's really, really hard, to get people to think about it as IBM's or Verizon's data, and not their data. That's really how you start getting value from it. >> That's a cultural challenge you face is, "Okay, I've got my data; I don't want to share." How do you address that? >> Absolutely. Governance and ownership of data, having clear roles and responsibilities, ensuring there's this culture where people realize that data is an asset of the firm. It is not your data or my data; it is firm's data, and the value you create for the business is from that data. It is a transformation. It's changing the people culture aspect, so there's a lot of education. You know, you have to be an evangelist. You wear multiple hats to show people the value, why they should do. Obviously, I had an advantage because coming in, Verizon management was completely sold to the idea that the data has to be managed as an enterprise asset. Business was ready and willing to own data as an enterprise asset, and so it was relatively easier. However, it was a journey to try to get everyone on the same page in terms of ensuring that it wasn't the siloed mentality. This was a enterprise asset that we need to manage together. >> A lot of organizations tell me that, first of all, you got to have top-down buy-in. Clearly, you had that, but a lot of the times I hear that the C-suite says, "Okay, we're going to do this," but the middle management is sort of, they got to PNL, they've got to make their plan, and it takes them longer to catch up. Did you face that challenge, and how do you ... How were you addressing it? >> Absolutely. What we had to do was really make sure that we were not trying to boil the ocean, that we were trying to show the values. We found champions. For example, finance, you know, was a good champion for us, where we used the data and analytics to really actually launch some very critical initiatives for the firm, asset-backed securities. For the first time, Verizon launched ABS, and we actually enabled that. That created the momentum, if you will, as to, "Okay, there's value in this." That then created the opportunity for all the other business to jump on and start leveraging data. Then we all are willing to help and be part of the journey. >> Seth, before you joined IBM, obviously the company was embarking on this cognitive journey. You know, Watson, the evolution of Watson, the kind of betting a lot on cognitive, but internally you must have said, "Well, if we're going to market this externally, "we'd better become a cognitive enterprise." One of the questions that came up on the panel was, "What is a cognitive enterprise?" You guys, have you defined it? Love to ask Asim the same question. >> Yeah, so I mean, a cognitive enterprise is really about an enterprise that uses data and analytics, and cognition to run their business, right? You can't just jump to being a cognitive enterprise, right? It's a journey or a ladder, right? Where you got to get that foundation data in order. Then you've got to start even being able to do basic analytics. Then you can start doing things like machine learning, and deep learning, and then you can get into cognition. It's not a, just jump to the top of the ladder, because there's just a lot of work that's required to do it. You can do that within a business unit. The whole company doesn't need to get there, and in fact, you'll see within a company, different part of the company will be at different stages. Kind of to Asim's point about partnering with finance, and that's my experience both at IBM and before I joined. You find a partner that's going to be a champion for you. You make them immensely successful, and everyone else will follow because of shame, because they don't want to be out-competed by their peers. >> So, similar definition of a cognitive enterprise? >> Absolutely. In fact, what I would say is cognitive is a spectrum, right? Where most companies are at the low end of that spectrum where using data for decision-making, but those are reports, BI reports, and stuff like that. As you evolve to become smarter and more AI machine learning, that's when you get into predictive, where you're using the data to predict what might happen based on prior historical information. Then that evolution goes all the way to being prescriptive, where you're not only looking back and being able to predict, but you're actually able to recommend action that you want to take. Obviously, with the human involvement, because governance is an important aspect to all of this, right? Completely agree that the cognitive is really covering the spectrum of prescriptive, predictive, and using data for all your decision making. >> This actually gets into a good point, right? I mean, I think Asim has implemented some deep learning models at Verizon, but you really need to think about what's the right technology or the right, you know, the right use case for that. There's some use cases where descriptive analytics is the right answer, right? There's no reason to apply machine learning or deep learning. You just need to put that in front of someone. Then there are use cases where you do want deep learning, either because the problem is so complex, or because the accuracy needs to be there. I go into a lot of companies to talk to senior executives, and they're like, "We want to do deep learning." You ask them what the use case is, and you're like, "Really, that's rules," right? It gets back to Occam's razor, right? The simplest solution is always the answer, is always the best answer. Really understanding from your perspective, having done this at a couple of companies now, kind of when do you know when to use deep learning versus machine learning, versus just basic statistics? >> How about that? >> Yeah. >> How do you parse that? >> Absolutely. You know, like anything else, it's very important to understand what problem you're trying to solve. When you have a hammer, everything looks like a nail, and deep learning might be one of those hammers. What we do is make sure that any problem that requires explain-ability, interpret-ability, you cannot use deep learning, because you cannot explain when you're using deep learning. It's a multi-layered neural network algorithm. You can't really explain why the outcome was what it was. For that, you have to use more simpler algorithms, like decision tree, like regression, classification. By the way, 70 to 80% of the problem that you have in the company, can be solved by those algorithms. You don't always use deep learning, but deep learning is a great use case algorithm to use when you're solving complex problems. For example, when you're looking at doing friction analysis as to customer journey path analysis, that tends to be very noisy. You know, you have billions of data points that you have to go through for an algorithm. That is, you know, good for deep learning, so we're using that today, but you know, those are a narrow set of use cases where it is required, so it's important to understand what problem you're trying to solve and where you want to use deep learning. >> To use deep learning, you need a lot of label data, right? >> Yes. >> And that's-- >> A lot of what? Label data? >> Label data. So, and that's often a hurdle to companies using deep learning, even when they have a legitimate deep learning use cases. Just the massive amount of label data you need for that use case. >> As well as scale, right? >> Yeah. >> The whole idea is that when you have massive amounts of data with a lot of different variables, you need deep learning to be able to make that decision. That means you've got to have scale and real time capability within the platform, that has the elasticity and compute, to be able to crunch all that data. >> Yeah. >> Initially, when we started on this journey, our infrastructure was not able to handle that. You know, we had a lot of failures, and so obviously we had to enhance our infrastructure to-- >> You spoke to Samit Gupta and Ed earlier, about, you know, GPUs, and flash storage, and the need for those types of things to do these complex, you know, deep learning problems. We struggled with that even inside of IBM when we first started building this platform as, how do we get the best performance of ingesting the data, getting it labeled, and putting it into these models, these deep learning models, and some of the instance we use that. >> Yeah, my takeaway is that infrastructure for AI has to be flexible, you got to be great granularity. It's got to not only be elastic, but it's got to be, sometimes we call it plastic. It's got to sometimes retain its form. >> Yes. >> Right? Then when you bring in some new unknown workload, you've got to be able to adjust it without ripping down the entire infrastructure. You have to purpose built a whole next set of infrastructure, which is kind of how we built IT over the years. >> Exactly. >> I think, Dave, too, When you and I first spoke four or five years ago, it was all about commodity hardware, right? It was going to Hadoop ecosystem, minimizing, you know, getting onto commodity hardware, and now you're seeing a shift away from commodity hardware, in some instances, toward specialized hardware, because you need it for these use cases. So we're kind of making that. We shifted to one extreme, and now we're kind of shifting, and I think we're going to get to a good equilibrium where it's a balance of commodity and specialized hardware for big data, as much as I hate that word, and advanced analytics. >> Well, yeah, even your cloud guys, all the big cloud guys, they used to, you know, five, six years ago, say, "Oh, it's all commodity stuff," and now it's a lot of custom, because they're solving problems that you can't solve with a commodity. I want to ask you guys about this notion of digital business. To us, the difference between a business and a digital business is how you use data. As you become a digital business, which is essentially what you're doing with cognitive and AI, historically, you may have organized around, I don't know, your network, and certain you've got human skills that are involved, and your customers. I mean, IBM in your case, it's your products, your services, your portfolio, your clients. Increasingly, you're organizing around your data, aren't you? Which brings back to cultural change, but what about the data model? I presume you're trying to get to a data model where the customer service, and the sales, and the marketing aren't separate entities. I don't have to deal with them when I talk to Verizon. I deal with just Verizon, right? That's not easy when the data's all inside. How are you dealing with that challenge? >> Customer is at the center of the business model. Our motto and out goal is to provide the best products to the customers, but even more important, provide the best experience. It is all about the customer, agnostic of the channel, which channel the customer is interacting with. The customer, for the customer, it's one Verizon. The way we are organizing our data platform is, first of all, breaking all the silos. You know, we need to have data from all interactions with the customer, that is all digital, that's coming through, and creating one unified model, essentially, that essentially teaches all the journeys, and all the information about the customer, their events, their behavior, their propensities, and stuff like that. Then that information, using algorithms, like predictive, prescriptive, and all of that, make it available in all channels of engagement. Essentially, you have common intelligence that is made available across all channels. Whether the customer goes to point of sale in a retail store, or calls a call center, talks to a rep, or is on the digital channel, is the same intelligence driving the experience. Whether a customer is trying to buy a phone, or has an issue with a service related aspect of it, and that's the key, which is centralized intelligence from common data lake, and then deliver a seamless experience across all channels for that customer-- >> Independent of where I bought that phone, for example, right? >> Exactly. Maintaining the context is critical. If you went to the store and you know, you're looking for a phone, and you know, you didn't find what you're looking for, you want to do some research, if you go to the digital channel, you should be able to have a seamless experience where we should know that you went, that you're looking for the phone, or you called care and you asked the agent about something. Having that context be transferred across channel and be available, so the customer feels that we know who the customer is, and provide them with a good experience, is the key. >> We have limited time, but I want to talk about skills. It's hard to come by; we talked about that. It's number five on Inderpal's sort of, list of things you've got to do as a CDO. Sometimes you can do MNA, by the weather company. You've got a lot of skills, but that's not always so practical. How have you been dealing with the skills gap? >> Look, skill is hard to find, data scientists are hard to find. The way we are envisioning our talent management is two things we need to take care of. One, we need solid big data engineers, because having a solid platform that has real trans-streaming capability is very critical. Second, data scientists, it's hard to get. However, our plan is to really take the domain experts, who really understand the business, who understand the business process and the data, and give them the tools, automation tools for data science, that essentially, you know, will put it in a box for them, in terms of which algorithm to use, and enable them to create more value. While we will continue to hire specialized data scientists who are going to work on much more of the complex problems, the skill will come from empowering and enabling the domain experts with data science capabilities that automates choosing model development and algorithm development. >> Presumably grooming people in house, right? >> Grooming people in house, and I actually break it down a little more granular. I even say there's data engineers, there's machine learning engineers, there's optimization engineers, then there's data journalists. They're the ones that tell the story. I think we were talking earlier, Asim, about you know, it's not just PhDs, right? You're not just looking for PhDs to fill these rolls anymore. You're looking for people with masters degrees, and even in some cases, bachelors degrees. With IBM's new collar job initiative, we're even bringing on some, what we call P-TECH students, which are five year high school students, and we're building a data science program for them. We're building apprenticeships, which is, you know, you've had a couple years of college, building a data science program, and people look at me like I'm crazy when I say that, but the bulk of the work of a data science program, of executing data science, is not implementing machine learning models. It's engineering features, it's cleaning data. With basic Python skills, this is something that you can very easily teach these people to do, and then under the supervision of a principal data scientist or someone with a PhD or a masters degree, they can start learning how to implement models, but they can start contributing right away with just some basic Python skills. >> Then five, seven years in, they're-- >> Yeah. >> domain experts. All right, guys, got to jump, but thanks very much, Asim, for coming on and sharing your story. Seth, always a pleasure. >> Yeah, good to see you again, Dave. >> All right. >> Thank you, Dave. >> You're welcome. Keep it right there, buddy. >> Thanks. >> We'll be back with our next guest. This is The Cube, live from IBM CDO strategy summit in San Francisco. We'll be right back. (playful music) (phone dialing)

Published Date : May 1 2018

SUMMARY :

brought to you by IBM. Seth, good to see you again. Asim Tewary, Tewary? and from Jersey. the defacto chief data officer at Verizon. the data ingestion platform, You were saying previously you were CDO We had separate teams in the past. talking about the but you have a similar challenge. How do you address that? and the value you create for and it takes them longer to catch up. and be part of the journey. One of the questions that and cognition to run and being able to predict, or because the accuracy needs to be there. the problem that you have of label data you need when you have massive amounts of data and so obviously we had to and some of the instance we use that. has to be flexible, you got You have to purpose built because you need it for these use cases. and AI, historically, you Whether the customer goes to and be available, so the How have you been dealing and enable them to create more value. but the bulk of the work All right, guys, got to jump, Keep it right there, buddy. This is The Cube,

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
David	PERSON	0.99+
Michael	PERSON	0.99+
Marc Lemire	PERSON	0.99+
Chris O'Brien	PERSON	0.99+
Verizon	ORGANIZATION	0.99+
Hilary	PERSON	0.99+
Mark	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Ildiko Vancsa	PERSON	0.99+
John	PERSON	0.99+
Alan Cohen	PERSON	0.99+
Lisa Martin	PERSON	0.99+
John Troyer	PERSON	0.99+
Rajiv	PERSON	0.99+
Europe	LOCATION	0.99+
Stefan Renner	PERSON	0.99+
Ildiko	PERSON	0.99+
Mark Lohmeyer	PERSON	0.99+
JJ Davis	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Beth	PERSON	0.99+
Jon Bakke	PERSON	0.99+
John Farrier	PERSON	0.99+
Boeing	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Dave Nicholson	PERSON	0.99+
Cassandra Garber	PERSON	0.99+
Peter McKay	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Dave Brown	PERSON	0.99+
Beth Cohen	PERSON	0.99+
Stu Miniman	PERSON	0.99+
John Walls	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
Seattle	LOCATION	0.99+
5	QUANTITY	0.99+
Hal Varian	PERSON	0.99+
JJ	PERSON	0.99+
Jen Saavedra	PERSON	0.99+
Michael Loomis	PERSON	0.99+
Lisa	PERSON	0.99+
Jon	PERSON	0.99+
Rajiv Ramaswami	PERSON	0.99+
Stefan	PERSON	0.99+

Seth Dobrin, IBM | Big Data SV 2018

>> Announcer: Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and it's ecosystem partners. >> Welcome back to theCUBE's continuing coverage of our own event, Big Data SV. I'm Lisa Martin, with my cohost Dave Vellante. We're in downtown San Jose at this really cool place, Forager Eatery. Come by, check us out. We're here tomorrow as well. We're joined by, next, one of our CUBE alumni, Seth Dobrin, the Vice President and Chief Data Officer at IBM Analytics. Hey, Seth, welcome back to theCUBE. >> Hey, thanks for having again. Always fun being with you guys. >> Good to see you, Seth. >> Good to see you. >> Yeah, so last time you were chatting with Dave and company was about in the fall at the Chief Data Officers Summit. What's kind of new with you in IBM Analytics since then? >> Yeah, so the Chief Data Officers Summit, I was talking with one of the data governance people from TD Bank and we spent a lot of time talking about governance. Still doing a lot with governance, especially with GDPR coming up. But really started to ramp up my team to focus on data science, machine learning. How do you do data science in the enterprise? How is it different from doing a Kaggle competition, or someone getting their PhD or Masters in Data Science? >> Just quickly, who is your team composed of in IBM Analytics? >> So IBM Analytics represents, think of it as our software umbrella, so it's everything that's not pure cloud or Watson or services. So it's all of our software franchise. >> But in terms of roles and responsibilities, data scientists, analysts. What's the mixture of-- >> Yeah. So on my team I have a small group of people that do governance, and so they're really managing our GDPR readiness inside of IBM in our business unit. And then the rest of my team is really focused on this data science space. And so this is set up from the perspective of we have machine-learning engineers, we have predictive-analytics engineers, we have data engineers, and we have data journalists. And that's really focus on helping IBM and other companies do data science in the enterprise. >> So what's the dynamic amongst those roles that you just mentioned? Is it really a team sport? I mean, initially it was the data science on a pedestal. Have you been able to attack that problem? >> So I know a total of two people that can do that all themselves. So I think it absolutely is a team sport. And it really takes a data engineer or someone with deep expertise in there, that also understands machine-learning, to really build out the data assets, engineer the features appropriately, provide access to the model, and ultimately to what you're going to deploy, right? Because the way you do it as a research project or an activity is different than using it in real life, right? And so you need to make sure the data pipes are there. And when I look for people, I actually look for a differentiation between machine-learning engineers and optimization. I don't even post for data scientists because then you get a lot of data scientists, right? People who aren't really data scientists, and so if you're specific and ask for machine-learning engineers or decision optimization, OR-type people, you really get a whole different crowd in. But the interplay is really important because most machine-learning use cases you want to be able to give information about what you should do next. What's the next best action? And to do that, you need decision optimization. >> So in the early days of when we, I mean, data science has been around forever, right? We always hear that. But in the, sort of, more modern use of the term, you never heard much about machine learning. It was more like stats, math, some programming, data hacking, creativity. And then now, machine learning sounds fundamental. Is that a new skillset that the data scientists had to learn? Did they get them from other parts of the organization? >> I mean, when we talk about math and stats, what we call machine learning today has been what we've been doing since the first statistics for years, right? I mean, a lot of the same things we apply in what we call machine learning today I did during my PhD 20 years ago, right? It was just with a different perspective. And you applied those types of, they were more static, right? So I would build a model to predict something, and it was only for that. It really didn't apply it beyond, so it was very static. Now, when we're talking about machine learning, I want to understand Dave, right? And I want to be able to predict Dave's behavior in the future, and learn how you're changing your behavior over time, right? So one of the things that a lot of people don't realize, especially senior executives, is that machine learning creates a self-fulfilling prophecy. You're going to drive a behavior so your data is going to change, right? So your model needs to change. And so that's really the difference between what you think of as stats and what we think of as machine learning today. So what we were looking for years ago is all the same we just described it a little differently. >> So how fine is the line between a statistician and a data scientist? >> I think any good statistician can really become a data scientist. There's some issues around data engineering and things like that but if it's a team sport, I think any really good, pure mathematician or statistician could certainly become a data scientist. Or machine-learning engineer. Sorry. >> I'm interested in it from a skillset standpoint. You were saying how you're advertising to bring on these roles. I was at the Women in Data Science Conference with theCUBE just a couple of days ago, and we hear so much excitement about the role of data scientists. It's so horizontal. People have the opportunity to make impact in policy change, healthcare, etc. So the hard skills, the soft skills, mathematician, what are some of the other elements that you would look for or that companies, enterprises that need to learn how to embrace data science, should look for? Someone that's not just a mathematician but someone that has communication skills, collaboration, empathy, what are some of those, openness, to not lead data down a certain, what do you see as the right mix there of a data scientist? >> Yeah, so I think that's a really good point, right? It's not just the hard skills. When my team goes out, because part of what we do is we go out and sit with clients and teach them our philosophy on how you should integrate data science in the enterprise. A good part of that is sitting down and understanding the use case. And working with people to tease out, how do you get to this ultimate use case because any problem worth solving is not one model, any use case is not one model, it's many models. How do you work with the people in the business to understand, okay, what's the most important thing for us to deliver first? And it's almost a negotiation, right? Talking them back. Okay, we can't solve the whole problem. We need to break it down in discreet pieces. Even when we break it down into discreet pieces, there's going to be a series of sprints to deliver that. Right? And so having these soft skills to be able to tease that in a way, and really help people understand that their way of thinking about this may or may not be right. And doing that in a way that's not offensive. And there's a lot of really smart people that can say that, but they can come across at being offensive, so those soft skills are really important. >> I'm going to talk about GDPR in the time we have remaining. We talked about in the past, the clocks ticking, May the fines go into effect. The relationship between data science, machine learning, GDPR, is it going to help us solve this problem? This is a nightmare for people. And many organizations aren't ready. Your thoughts. >> Yeah, so I think there's some aspects that we've talked about before. How important it's going to be to apply machine learning to your data to get ready for GDPR. But I think there's some aspects that we haven't talked about before here, and that's around what impact does GDPR have on being able to do data science, and being able to implement data science. So one of the aspects of the GDPR is this concept of consent, right? So it really requires consent to be understandable and very explicit. And it allows people to be able to retract that consent at any time. And so what does that mean when you build a model that's trained on someone's data? If you haven't anonymized it properly, do I have to rebuild the model without their data? And then it also brings up some points around explainability. So you need to be able to explain your decision, how you used analytics, how you got to that decision, to someone if they request it. To an auditor if they request it. Traditional machine learning, that's not too much of a problem. You can look at the features and say these features, this contributed 20%, this contributed 50%. But as you get into things like deep learning, this concept of explainable or XAI becomes really, really important. And there were some talks earlier today at Strata about how you apply machine learning, traditional machine learning to interpret your deep learning or black box AI. So that's really going to be important, those two things, in terms of how they effect data science. >> Well, you mentioned the black box. I mean, do you think we'll ever resolve the black box challenge? Or is it really that people are just going to be comfortable that what happens inside the box, how you got to that decision is okay? >> So I'm inherently both cynical and optimistic. (chuckles) But I think there's a lot of things we looked at five years ago and we said there's no way we'll ever be able to do them that we can do today. And so while I don't know how we're going to get to be able to explain this black box as a XAI, I'm fairly confident that in five years, this won't even be a conversation anymore. >> Yeah, I kind of agree. I mean, somebody said to me the other day, well, it's really hard to explain how you know it's a dog. >> Seth: Right (chuckles). But you know it's a dog. >> But you know it's a dog. And so, we'll get over this. >> Yeah. >> I love that you just brought up dogs as we're ending. That's my favorite thing in the world, thank you. Yes, you knew that. Well, Seth, I wish we had more time, and thanks so much for stopping by theCUBE and sharing some of your insights. Look forward to the next update in the next few months from you. >> Yeah, thanks for having me. Good seeing you again. >> Pleasure. >> Nice meeting you. >> Likewise. We want to thank you for watching theCUBE live from our event Big Data SV down the street from the Strata Data Conference. I'm Lisa Martin, for Dave Vellante. Thanks for watching, stick around, we'll be rick back after a short break.

Published Date : Mar 8 2018

SUMMARY :

brought to you by SiliconANGLE Media Welcome back to theCUBE's continuing coverage Always fun being with you guys. Yeah, so last time you were chatting But really started to ramp up my team So it's all of our software franchise. What's the mixture of-- and other companies do data science in the enterprise. that you just mentioned? And to do that, you need decision optimization. So in the early days of when we, And so that's really the difference I think any good statistician People have the opportunity to make impact there's going to be a series of sprints to deliver that. in the time we have remaining. And so what does that mean when you build a model Or is it really that people are just going to be comfortable ever be able to do them that we can do today. I mean, somebody said to me the other day, But you know it's a dog. But you know it's a dog. I love that you just brought up dogs as we're ending. Good seeing you again. We want to thank you for watching theCUBE

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Lisa Martin	PERSON	0.99+
Seth	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Seth Dobrin	PERSON	0.99+
20%	QUANTITY	0.99+
50%	QUANTITY	0.99+
TD Bank	ORGANIZATION	0.99+
San Jose	LOCATION	0.99+
two people	QUANTITY	0.99+
tomorrow	DATE	0.99+
IBM Analytics	ORGANIZATION	0.99+
two things	QUANTITY	0.99+
SiliconANGLE Media	ORGANIZATION	0.99+
one model	QUANTITY	0.99+
five years	QUANTITY	0.98+
20 years ago	DATE	0.98+
Big Data SV	EVENT	0.98+
five years ago	DATE	0.98+
GDPR	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
one	QUANTITY	0.98+
Strata Data Conference	EVENT	0.97+
today	DATE	0.97+
first statistics	QUANTITY	0.95+
CUBE	ORGANIZATION	0.94+
Women in Data Science Conference	EVENT	0.94+
both	QUANTITY	0.94+
Chief Data Officers Summit	EVENT	0.93+
Big Data SV 2018	EVENT	0.93+
couple of days ago	DATE	0.93+
years	DATE	0.9+
Forager Eatery	ORGANIZATION	0.9+
first	QUANTITY	0.86+
Watson	TITLE	0.86+
Officers Summit	EVENT	0.74+
Data Officer	PERSON	0.73+
SV	EVENT	0.71+
President	PERSON	0.68+
Strata	TITLE	0.67+
Big Data	ORGANIZATION	0.66+
earlier today	DATE	0.65+
Silicon Valley	LOCATION	0.64+
years	QUANTITY	0.6+
Chief	EVENT	0.44+
Kaggle	ORGANIZATION	0.43+

Seth Dobrin & Jennifer Gibbs | IBM CDO Strategy Summit 2017

>> Live from Boston, Massachusetts. It's The Cube! Covering IBM Chief Data Officer's Summit. Brought to you by IBM. (techno music) >> Welcome back to The Cube's live coverage of the IBM CDO Strategy Summit here in Boston, Massachusetts. I'm your host Rebecca Knight along with my Co-host Dave Vellante. We're joined by Jennifer Gibbs, the VP Enterprise Data Management of TD Bank, and Seth Dobrin who is VP and Chief Data Officer of IBM Analytics. Thanks for joining us Seth and Jennifer. >> Thanks for having us. >> Thank you. >> So Jennifer, I want to start with you can you tell our viewers a little about TD Bank, America's Most Convenient Bank. Based, of course, in Toronto. (laughs). >> Go figure. (laughs) >> So tell us a little bit about your business. >> So TD is a, um, very old bank, headquartered in Toronto. We do have, ah, a lot of business as well in the U.S. Through acquisition we've built quite a big business on the Eastern seaboard of the United States. We've got about 85 thousand employees and we're servicing 42 lines of business when it comes to our Data Management and our Analytics programs, bank wide. >> So talk about your Data Management and Analytics programs a little bit. Tell our viewers a little bit about those. >> So, we split up our office of the Chief Data Officer, about 3 to 4 years ago and so we've been maturing. >> That's relatively new. >> Relatively new, probably, not unlike peers of ours as well. We started off with a strong focus on Data Governance. Setting up roles and responsibilities, data storage organization and councils from which we can drive consensus and discussion. And then we started rolling out some of our Data Management programs with a focus on Data Quality Management and Meta Data Management, across the business. So setting standards and policies and supporting business processes and tooling for those programs. >> Seth when we first met, now you're a long timer at IBM. (laughs) When we first met you were a newbie. But we heard today, about,it used to be the Data Warehouse was king but now Process is king. Can you unpack that a little bit? What does that mean? >> So, you know, to make value of data, it's more than just having it in one place, right? It's what you do with the data, how you ingest the data, how you make it available for other uses. And so it's really, you know, data is not for the sake of data. Data is not a digital dropping of applications, right? The whole purpose of having and collecting data is to use it to generate new value for the company. And that new value could be cost savings, it could be a cost avoidance, or it could be net new revenue. Um, and so, to do that right, you need processes. And the processes are everything from business processes, to technical processes, to implementation processes. And so it's the whole, you need all of it. >> And so Jennifer, I don't know if you've seen kind of a similar evolution from data warehouse to data everywhere, I'm sure you have. >> Yeah. >> But the data quality problem was hard enough when you had this sort of central master data management approach. How are you dealing with it? Is there less of a single version of the truth now than there ever was, and how do you deal with the data quality challenge? >> I think it's important to scope out the work effort in a way that you can get the business moving in the right direction without overwhelming and focusing on the areas that are most important to the bank. So, we've identified and scoped out what we call critical data. So each line of business has to identify what's critical to them. Does relate very strongly to what Seth said around what are your core business processes and what data are you leveraging to provide value to that, to the bank. So, um, data quality for us is about a consistent approach, to ensure the most critical elements of data that used for business processes are where they need to be from a quality perspective. >> You can go down a huge rabbit whole with data quality too, right? >> Yeah. >> Data quality is about what's good enough, and defining, you know. >> Right. >> Mm-hmm (affirmative) >> It's not, I liked your, someone, I think you said, it's not about data quality, it's about, you know it's, you got to understand what good enough is, and it's really about, you know, what is the state of the data and under, it's really about understanding the data, right? Than it is perfection. There are some cases, especially in banking, where you need perfection, but there's tons of cases where you don't. And you shouldn't spend a lot of resources on something that's not value added. And I think it's important to do, even things like, data quality, around a specific use case so that you do it right. >> And what you were saying too, it that it's good enough but then that, that standard is changing too, all the time. >> Yeah and that changes over time and it's, you know, if you drive it by use case and not just, we have get this boil the ocean kind of approach where all data needs to be perfect. And all data will never be perfect. And back to your question about processes, usually, a data quality issue, is not a data issue, it's a process issue. You get bad data quality because a process is broken or it's not working for a business or it's changed and no one's documented it so there's a work around, right? And so that's really where your data quality issues come from. Um, and I think that's important to remember. >> Yeah, and I think also coming out of the data quality efforts that we're making, to your point, is it central wise or is it cross business? It's really driving important conversations around who's the producer of this data, who's the consumer of this data? What does data quality mean to you? So it's really generating a lot of conversation across lines of business so that we can start talking about data in more of a shared way versus more of a business by business point of view. So those conversations are important by-products I would say of the individual data quality efforts that we're doing across the bank. >> Well, and of course, you're in a regulated business so you can have the big hammer of hey, we've got regulations, so if somebody spins up a Hadoop Cluster in some line of business you can reel 'em in, presumably, more easily, maybe not always. Seth you operate in an unregulated business. You consult with clients that are in unregulated businesses, is that a bigger challenge for you to reel in? >> So, I think, um, I think that's changing. >> Mm-hmm (affirmative) >> You know, there's new regulations coming out in Europe that basically have global impact, right? This whole GDPR thing. It's not just if you're based in Europe. It's if you have a subject in Europe and that's an employee, a contractor, a customer. And so everyone is subject to regulations now, whether they like it or not. And, in fact, there was some level of regulation even in the U.S., which is kind of the wild, wild, west when it comes to regulations. But I think, um, you should, even doing it because of regulation is not the right answer. I mean it's a great stick to hold up. It's great to be able to go to your board and say, "Hey if we don't do this, we need to spend this money 'cause it's going to cost us, in the case of GDPR, four percent of our revenue per instance.". Yikes, right? But really it's about what's the value and how do you use that information to drive value. A lot of these regulation are about lineage, right? Understanding where your data came from, how it's being processed, who's doing what with it. A lot of it is around quality, right? >> Yep. >> And so these are all good things, even if you're not in a regulated industry. And they help you build a better connection with your customer, right? I think lots of people are scared of GDPR. I think it's a really good thing because it forces companies to build a personal relationship with each of their clients. Because you need to get consent to do things with their data, very explicitly. No more of these 30 pages, two point font, you know ... >> Click a box. >> Click a box. >> Yeah. >> It's, I am going to use your data for X. Are you okay with that? Yes or no. >> So I'm interested from, to hear from both of you, what are you hearing from customers on this? Because this is such a sensitive topic and, in particularly, financial data, which is so private. What are you, what are you hearing from customers on this? >> Um, I think customers are, um, are, especially us in our industry, and us as a bank. Our relationship with our customer is top priority and so maintaining that trust and confidence is always a top priority. So whenever we leverage data or look for use cases to leverage data, making sure that that trust will not be compromised is critically important. So finding that balance between innovating with data while also maintaining that trust and frankly being very transparent with customers around what we're using it for, why we're using it, and what value it brings to them, is something that we're focused on with, with all of our data initiatives. >> So, big part of your job is understanding how data can affect and contribute to the monetization, you know, of your businesses. Um, at the simplest level, two ways, cut costs, increase revenue. Where do you each see the emphasis? I'm sure both, but is there a greater emphasis on cutting costs 'cause you're both established, you know, businesses, with hundreds of thousands, well in your case, 85 thousand employees. Where do you see the emphasis? Is it greater on cutting costs or not necessarily? >> I think for us, I don't necessarily separate the two. Anything we can do to drive more efficiency within our business processes is going to help us focus our efforts on innovative use of data, innovative ways to interact with our customers, innovative ways to understand more about out customers. So, I see them both as, um, I don't see them mutually exclusive, I see them as contributing to each. >> Mm-hmm (affirmative) >> So our business cases tend to have an efficiency slant to them or a productivity slant to them and that helps us redirect effort to other, other things that provide extra value to our clients. So I'd say it's a mix. >> I mean I think, I think you have to do the cost savings and cost avoidance ones first. Um, you learn a lot about your data when you do that. You learn a lot about the gaps. You learn about how would I even think about bringing external data in to generate that new revenue if I don't understand my own data? How am I going to tie 'em all together? Um, and there's a whole lot of cultural change that needs to happen before you can even start generating revenue from data. And you kind of cut your teeth on that by doing the really, simple cost savings, cost avoidance ones first, right? Inevitably, maybe not in the bank, but inevitably most company's supply chain. Let's go find money we can take out of your supply chain. Most companies, if you take out one percent of the supply chain budget, you're talking a lot of money for the company, right? And so you can generate a lot of money to free up to spend on some of these other things. >> So it's a proof of concept to bring everyone along. >> Well it's a proof of concept but it's also, it's more of a cultural change, right? >> Mm-hmm (affirmative) It's not even, you don't even frame it up as a proof of concept for data or analytics, you just frame it up, we're going to save the company, you know, one percent of our supply chain, right? We're going to save the company a billion dollars. >> Yes. >> And then there's gain share there 'cause we're going to put that thing there. >> And then there's a gain share and then other people are like, "Well, how do I do that?". And how do I do that, and how do I do that? And it kind of picks up. >> Mm-hmm (affirmative) But I don't think you can jump just to making new revenue. You got to kind of get there iteratively. >> And it becomes a virtuous circle. >> It becomes a virtuous circle and you kind of change the culture as you do it. But you got to start with, I don't, I don't think they're mutually exclusive, but I think you got to start with the cost avoidance and cost savings. >> Mm-hmm (affirmative) >> Great. Well, Seth, Jennifer thanks so much for coming on The Cube. We've had a great conversation. >> Thanks for having us. >> Thanks. >> Thanks you guys. >> We will have more from the IBM CDO Summit in Boston, Massachusetts, just after this. (techno music)

Published Date : Oct 25 2017

SUMMARY :

Brought to you by IBM. Cube's live coverage of the So Jennifer, I want to start with you (laughs) So tell us a little of the United States. So talk about your Data Management and of the Chief Data Officer, And then we started met you were a newbie. And so it's the whole, you need all of it. to data everywhere, I'm sure you have. How are you dealing with it? So each line of business has to identify and defining, you know. And I think it's important to do, And what you were And back to your question about processes, across lines of business so that we can business so you can have the big hammer of So, I think, um, I and how do you use that And they help you build Are you okay with that? what are you hearing and so maintaining that Where do you each see the emphasis? as contributing to each. So our business cases tend to have And so you can generate a lot of money to bring everyone along. It's not even, you don't even frame it up to put that thing there. And it kind of picks up. But I don't think you can jump change the culture as you do it. much for coming on The Cube. from the IBM CDO Summit

ENTITIES

Entity	Category	Confidence
Seth	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Jennifer	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
Jennifer Gibbs	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
TD Bank	ORGANIZATION	0.99+
Toronto	LOCATION	0.99+
IBM	ORGANIZATION	0.99+
TD	ORGANIZATION	0.99+
42 lines	QUANTITY	0.99+
two	QUANTITY	0.99+
Boston, Massachusetts	LOCATION	0.99+
30 pages	QUANTITY	0.99+
United States	LOCATION	0.99+
one percent	QUANTITY	0.99+
both	QUANTITY	0.99+
two point	QUANTITY	0.99+
U.S.	LOCATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
each line	QUANTITY	0.99+
GDPR	TITLE	0.99+
today	DATE	0.98+
each	QUANTITY	0.98+
85 thousand employees	QUANTITY	0.98+
hundreds of thousands	QUANTITY	0.98+
four percent	QUANTITY	0.97+
first	QUANTITY	0.97+
one place	QUANTITY	0.97+
two ways	QUANTITY	0.97+
about 85 thousand employees	QUANTITY	0.95+
4 years ago	DATE	0.93+
IBM	EVENT	0.93+
IBM CDO Summit	EVENT	0.91+
IBM CDO Strategy Summit	EVENT	0.91+
Data Warehouse	ORGANIZATION	0.89+
billion dollars	QUANTITY	0.89+
IBM Chief Data Officer's	EVENT	0.88+
about 3	DATE	0.81+
tons of cases	QUANTITY	0.79+
America	ORGANIZATION	0.77+
CDO Strategy Summit 2017	EVENT	0.76+
single version	QUANTITY	0.67+
Data Officer	PERSON	0.59+
Cube	ORGANIZATION	0.58+
money	QUANTITY	0.52+
lot	QUANTITY	0.45+
The Cube	ORGANIZATION	0.36+

Seth Dobrin, IBM Analytics - IBM Fast Track Your Data 2017

>> Announcer: Live from Munich, Germany; it's The Cube. Covering IBM; fast-track your data. Brought to you by IBM. (upbeat techno music) >> For you here at the show, generally; and specifically, what are you doing here today? >> There's really three things going on at the show, three high level things. One is we're talking about our new... How we're repositioning our hybrid data management portfolio, specifically some announcements around DB2 in a hybrid environment, and some highly transactional offerings around DB2. We're talking about our unified governance portfolio; so actually delivering a platform for unified governance that allows our clients to interact with governance and data management kind of products in a more streamlined way, and help them actually solve a problem instead of just offering products. The third is really around data science and machine learning. Specifically we're talking about our machine learning hub that we're launching here in Germany. Prior to this we had a machine learning hub in San Francisco, Toronto, one in Asia, and now we're launching one here in Europe. >> Seth, can you describe what this hub is all about? This is a data center where you're hosting machine learning services, or is it something else? >> Yeah, so this is where clients can come and learn how to do data science. They can bring their problems, bring their data to our facilities, learn how to solve a data science problem in a more team oriented way; interacting with data scientists, machine learning engineers, basically, data engineers, developers, to solve a problem for their business around data science. These previous hubs have been completely booked, so we wanted to launch them in other areas to try and expand the capacity of them. >> You're hosting a round table today, right, on the main tent? >> Yep. >> And you got a customer on, you guys going to be talking about sort of applying practices and financial and other areas. Maybe describe that a little bit. >> We have a customer on from ING, Heinrich, who's the chief architect for ING. ING, IBM, and Horton Works have a consortium, if you would, or a framework that we're doing around Apache Atlas and Ranger, as the kind of open-source operating system for our unified governance platform. So much as IBM has positioned Spark as a unified, kind of open-source operating system for analytics, for a unified governance platform... For a governance platform to be truly unified, you need to be able to integrate metadata. The biggest challenge about connecting your data environments, if you're an enterprise that was not internet born, or cloud born, is that you have proprietary metadata platforms that all want to be the master. When everyone wants to be the master, you can't really get anything done. So what we're doing around Apache Atlas is we are setting up Apache Atlas as kind of a virtual translator, if you would, or a dictionary between all the different proprietary metadata platforms so that you can get a single unified view of your data environment across hybrid clouds, on premise, in the cloud, and across different proprietary vendor platforms. Because it's open-sourced, there are these connectors that can go in and out of the proprietary platforms. >> So Seth, you seem like you're pretty tuned in to the portfolio within the analytics group. How are you spending your time as the Chief Data Officer? How do you balance it between customer visits, maybe talking about some of the products, and then you're sort of day job? >> I actually have three days jobs. My job's actually split into kind of three pieces. The first, my primary mission, is really around transforming IBM's internal business unit, internal business workings, to use data and analytics to run our business. So kind of internal business unit transformation. Part of that business unit transformation is also making sure that we're compliant with regulations like GDBR and other regulations. Another third is really around kind of rethinking our offerings from a CDO perspective. As a CDO, and as you, Dave, I've only been with IBM for seven months. As a former client recently, and as a CDO, what is it that I want to see from IBM's offerings? We kind of hit on it a little bit with the unified governance platform, where I think IBM makes fantastic products. But as a client, if a salesperson shows up to me, I don't want them selling me a product, 'cause if I want an MDM solution, I'll call you up and say, "Hey, I need an MDM solution. "Give me a quote." What I want them showing up is saying, "I have a solution that's going to solve "your governance problem across your portfolio." Or, "I'm going to solve your data science problem." Or, "I'm going to help you master your data, "and manage your data across "all these different environments." So really working with the offering management and the Dev teams to define what are these three or four, kind of business platforms that we want to settle on? We know three of them at least, right? We know that we have a hybrid data management. We have unified governance. We have data science and machine learning, and you could think of the Z franchise as a fourth platform. >> Seth, can you net out how governance relates to data science? 'Cause there is governance of the statistical models, machine learning, and so forth, version control. I mean, in an end to end machine learning pipeline, there's various versions of various artifacts they have to be managed in a structured way. Is your unified governance bundle, or portfolio, does it address those requirements? Or just the data governance? >> Yeah, so the unified governance platform really kind of focuses today on data governance and how good data governance can be an enabler of rapid data science. So if you have your data all pre-governed, it makes it much quicker to get access to data and understand what you can and can't do with data; especially being here in Europe, in the context of the EU GDPR. You need to make sure that your data scientists are doing things that are approved by the user, because basically your data, you have to give explicit consent to allow things to be done with it. But long term vision is that... essentially the output of models is data, right? And how you use and deploy those models also need to be governed. So the long term vision is that we will have a governance platform for all those things, as well. I think it makes more sense for those things to be governed in the data science platform, if you would. And we... >> We often hear separate from GDPR and all that, is something called algorithmic accountability; that more is being discussed in policy circles, in government circles around the world, as strongly related to everything you're describing. Being able to trace the lineage of any algorithmic decision back to the data, the metadata, and so forth, and the machine learning models that might have driven it. Is that where IBM's going with this portfolio? >> I think that's the natural extension of it. We're thinking really in the context of them as two different pieces, but if you solve them both and you connect them together, then you have that problem. But I think you're absolutely right. As we're leveraging machine learning and artificial intelligence, in general, we need to be able to understand how we got to a decision, and that includes the model, the data, how the data was gathered, how the data was used and processed. So it is that entire pipeline, 'cause it is a pipeline. You're not doing machine learning or AI in a vacuum. You're doing it in the context of the data, and you're doing it in the context about the individuals or the organizations that you're trying to influence with the output of those models. >> I call it Dev ops for data science. >> Seth, in the early Hadoop days, the real headwind was complexity. It still is, by the way. We know that. Companies like IBM are trying to reduce that complexity. Spark helps a little bit So the technology will evolve, we get that. It seems like one of the other big headwinds right now is that most companies don't have a great understanding of how they can take data and monetize it, turn it into value. Most companies, many anyway, make the mistake of, "Well, I don't really want to sell my data," or, "I'm not really a data supplier." And they're kind of thinking about it, maybe not in the right way. But we seem to be entering a next wave here, where people are beginning to understand I can cut costs, I can do predictive maintenance, I can maybe not sell the data, but I can enhance what I'm doing and increase my revenue, maybe my customer retention. They seem to be tuning, more so; largely, I think 'cause of the chief data officer roles, helping them think that through. I wonder if you would give us your point of view on that narrative. >> I think what you're describing is kind of the digital transformation journey. I think the end game, as enterprises go through a digital transformation, the end game is how do I sell services, outcomes, those types of things. How do I sell an outcome to my end user? That's really the end game of a digital transformation in my mind. But before you can get to that, before you transform your business's objectives, there's a couple of intermediary steps that are required for that. The first is what you're describing, is those kind of data transformations. Enterprises need to really get a handle on their data and become data driven, and start then transforming their current business model; so how do I accelerate my current business leveraging data and analytics? I kind of frame that, that's like the data science kind of transformation aspect of the digital journey. Then the next aspect of it is how do I transform my business and change my business objectives? Part of that first step is in fact, how do I optimize my supply chain? How do I optimize my workforce? How do I optimize my goals? How do I get to my current, you know, the things that Wall Street cares about for business; how do I accelerate those, make those faster, make those better, and really put my company out in front? 'Cause really in the grand scheme of things, there's two types of companies today; there's the company that's going to be the disruptor, and there's companies that's going to get disrupted. Most companies want to be the disruptors, and it's a process to do that. >> So the accounting industry doesn't have standards around valuing data as an asset, and many of us feel as though waiting for that is a mistake. You can't wait for that. You've got to figure out on your own. But again, it seems to be somewhat of a headwind because it puts data and data value in this fuzzy category. But there are clearly the data haves and the data have-nots. What are you seeing in that regard? >> I think the first... When I was in my former role, my former company went through an exercise of valuing our data and our decisions. I'm actually doing that same exercise at IBM right now. We're going through IBM, at least in the analytics business unit, the part I'm responsible for, and going to all the leaders and saying, "What decisions are you making?" "Help me understand the decisions that you're making." "Help me understand the data you need "to make those decisions." And that does two things. Number one, it does get to the point of, how can we value the decisions? 'Cause each one of those decisions has a specific value to the company. You can assign a dollar amount to it. But it also helps you change how people in the enterprise think. Because the first time you go through and ask these questions, they talk about the dashboards they want to help them make their preconceived decisions, validated by data. They have a preconceived notion of the decision they want to make. They want the data to back it up. So they want a dashboard to help them do that. So when you come in and start having this conversation, you kind of stop them and say, "Okay, what you're describing is a dashboard. "That's not a decision. "Let's talk about the decision that you want to make, "and let's understand the real value of that decision." So you're doing two things, you're building a portfolio of decisions that then becomes to your point, Jim, about Dev ops for data science. It's your backlog for your data scientists, in the long run. You then connect those decisions to data that's required to make those, and you can extrapolate the data for each decision to the component that each piece of data makes up to it. So you can group your data logically within an enterprise; customer, product, talent, location, things like that, and you can assign a value to those based on decisions they support. >> Jim: So... >> Dave: Go ahead, please. >> As a CDO, following on that, are you also, as part of that exercise, trying to assess the value of not just the data, but of data science as a capability? Or particular data science assets, like machine learning models? In the overall scheme of things, that kind of valuation can then drive IBM's decision to ramp up their internal data science initiatives, or redeploy it, or, give me a... >> That's exactly what happened. As you build this portfolio of decisions, each decision has a value. So I am now assigning a value to the data science models that my team will build. As CDOs, CDOs are a relatively new role in many organizations. When money gets tight, they say, "What's this guy doing?" (Dave laughing) Having a portfolio of decisions that's saying, "Here's real value I'm adding..." So, number one, "Here's the value I can add in the future," and as you check off those boxes, you can kind of go and say, "Here's value I've added. "Here's where I've changed how the company's operating. "Here's where I've generated X billions of dollars "of new revenue, or cost savings, or cost avoidance, "for the enterprise." >> When you went through these exercises at your previous company, and now at IBM, are you using standardized valuation methodologies? Did you kind of develop your own, or come up with a scoring system? How'd you do that? >> I think there's some things around, like net promoter score, where there's pretty good standards on how to assign value to increases in net promoter score, or decreases in net promoter score for certain aspects of your business. In other ways, you need to kind of decide as an enterprise, how do we value our assets? Do we use a three year, five year, ten year MPV? Do we use some other metric? You need to kind of frame it in the reference that your CFO is used to talking about so that it's in the context that the company is used to talking about. Most companies, it's net present value. >> Okay, and you're measuring that on an ongoing basis. >> Seth: Yep. >> And fine tuning as you go along. Seth, we're out of time. Thanks so much for coming back in The Cube. It was great to see you. >> Seth: Yeah, thanks for having me. >> You're welcome, good luck this afternoon. >> Seth: Alright. >> Keep it right there, buddy. We'll be back. Actually, let me run down the day here for you, just take a second to do that. We're going to end our Cube interviews for the morning, and then we're going to cut over to the main tent. So in about an hour, Rob Thomas is going to kick off the main tent here with a keynote, talking about where data goes next. Hilary Mason's going to be on. There's a session with Dez Blanchfield on data science as a team sport. Then the big session on changing regulations, GDPRs. Seth, you've got some customers that you're going to bring on and talk about these issues. And then, sort of balancing act, the balancing act of hybrid data. Then we're going to come back to The Cube and finish up our Cube interviews for the afternoon. There's also going to be two breakout sessions; one with Hilary Mason, and one on GDPR. You got to go to IBMgo.com and log in and register. It's all free to see those breakout sessions. Everything else is open. You don't even have to register or log in to see that. So keep it right here, everybody. Check out the main tent. Check out siliconangle.com, and of course IBMgo.com for all the action here. Fast track your data. We're live from Munich, Germany; and we'll see you a little later. (upbeat techno music)

Published Date : Jun 24 2017

SUMMARY :

Brought to you by IBM. that allows our clients to interact with governance and expand the capacity of them. And you got a customer on, you guys going to be talking about and Ranger, as the kind of open-source operating system How are you spending your time as the Chief Data Officer? and the Dev teams to define what are these three or four, I mean, in an end to end machine learning pipeline, in the data science platform, if you would. and the machine learning models that might have driven it. and you connect them together, then you have that problem. I can maybe not sell the data, How do I get to my current, you know, But again, it seems to be somewhat of a headwind of decisions that then becomes to your point, Jim, of not just the data, but of data science as a capability? and as you check off those boxes, you can kind of go and say, You need to kind of frame it in the reference that your CFO And fine tuning as you go along. and we'll see you a little later.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
ING	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
Germany	LOCATION	0.99+
Jim	PERSON	0.99+
Hilary Mason	PERSON	0.99+
Rob Thomas	PERSON	0.99+
ten year	QUANTITY	0.99+
five year	QUANTITY	0.99+
seven months	QUANTITY	0.99+
Asia	LOCATION	0.99+
three year	QUANTITY	0.99+
three	QUANTITY	0.99+
four	QUANTITY	0.99+
Heinrich	PERSON	0.99+
Horton Works	ORGANIZATION	0.99+
Dez Blanchfield	PERSON	0.99+
two types	QUANTITY	0.99+
siliconangle.com	OTHER	0.99+
three days	QUANTITY	0.99+
two things	QUANTITY	0.99+
each piece	QUANTITY	0.99+
today	DATE	0.99+
Dav	PERSON	0.99+
each	QUANTITY	0.99+
first	QUANTITY	0.99+
Munich, Germany	LOCATION	0.99+
third	QUANTITY	0.99+
both	QUANTITY	0.99+
billions of dollars	QUANTITY	0.99+
one	QUANTITY	0.99+
One	QUANTITY	0.98+
two different pieces	QUANTITY	0.98+
three things	QUANTITY	0.98+
DB2	TITLE	0.98+
first step	QUANTITY	0.98+
GDPR	TITLE	0.97+
Apache Atlas	ORGANIZATION	0.97+
fourth platform	QUANTITY	0.97+
2017	DATE	0.97+
three pieces	QUANTITY	0.97+
IBM Analytics	ORGANIZATION	0.96+
first time	QUANTITY	0.96+
single	QUANTITY	0.96+
Spark	TITLE	0.95+
Ranger	ORGANIZATION	0.91+
two breakout sessions	QUANTITY	0.88+
about an hour	QUANTITY	0.86+
each decision	QUANTITY	0.85+
Cube	COMMERCIAL_ITEM	0.84+
each one	QUANTITY	0.83+
this afternoon	DATE	0.82+
Cube	ORGANIZATION	0.8+
San Francisco, Toronto	LOCATION	0.79+
GDPRs	TITLE	0.76+
GDBR	TITLE	0.75+

Seth Dobrin, IBM - IBM CDO Strategy Summit - #IBMCDO - #theCUBE

>> (lively music) (lively music) >> [Narrator] Live, from Fisherman's Wharf in San Francisco, it's theCUBE. Covering IBM Chief Data Officers Strategy Summit Spring 2017. Brought to you by IBM. >> Hey, welcome back everybody. >> Jeff Flick here with theCUBE alongside Peter Burris, our chief research officer from Wikibon. We're at the IBM Chief Data Officers Strategy Summit Sprint 2017. It's a mouthful but it's an important event. There's 170 plus CDO's here sharing information, really binding their community, sharing best practices and of course, IBM is sharing their journey which is pretty interesting cause they're taking their own transformational journey, writing up a blue print and going to deliver it in October. Drinking their own champagne as they like to say. We're really excited to have CUBE alumni, many time visitor Seth Dobrin. He is the chief data officer of IBM Analytics. Seth welcome. >> Yeah, thanks for having me again. >> Absolutely, so again, these events are interesting. There's a series of them. They're in multiple cities. They're, now, going to go to multiple countries. And it's really intended, I believe, or tell me, it's a learning experience in this great, little, tight community for this, very specific, role. >> Yeah, so these events are, actually, really good. I've been participating in these since the second one. >> So, since the first one in Boston about 2 1/2 years ago. They're really great events because it's an opportunity for CDO's or de facto CDO's in organizations to have in depth conversations with their peers about struggles, challenges, successes. >> It really helps to, kind of, one piece says you can benchmark yourself, how are we doing as an organization and how am I doing as a CDO and where do I fit within the bigger community or within your industry? >> How have you seen it evolve? Not just the role, per say, but some of the specific challenges or implementation issues that these people have had in trying to deliver a value inside their company. >> Yeah, so when they started, three years ago, there, really, were not a whole lot of tools that CDO's could use to solve your data science problems, to solve your cloud problems, to solve your governance problem. We're starting to get to a place in the world where there are actual tools out there that help you do these things. So you don't struggle to figure out how do I find talent that can build the tools internally and deploy em. It's now getting the talent to, actually, start implementing things that already exist. >> Is the CDO job well enough defined at this point in time? Do you think that you can, actually, start thinking about tools as opposed to the challenges of the business? In other words, is every CDO different or are the practices, now, becoming a little bit more and the conventions becoming a little bit better understood and stable so you >> can outdo a better job of practicing the CDO role? >> Yeah, I think today, the CDO role is still very ill defined. It's, really, industry by industry and company by company even, CDO's play different roles within each of those. I've only been with IBM for the last four months. I've been spending a lot of that time talking to our clients. Financial services, manufacturing, all over the board and really, the CDO's in those people are all industry specific, they're in different places and even company by company, they're in different places. It really depends on where the company's are on their data and digital journey what role the CDO has. Is it really a defensive play to make sure we're not going to violate any regulations or is it an offensive play and how do we disrupt our industry instead of being disrupted because, really, every industry is in a place where you're either going to be the disruptor or you're going to be the distruptee. And so, that's the scope, the breadth of, I think, the role the CDO plays. >> Do you see it all eventually converging to a common point? Cause, obviously, the CFO and the CMO, those are pretty good at standardized functions over time that wasn't always that way. >> Well, I sure hope it does. I think CDO's are becoming pretty pervasive. I think you're starting to see, when this started, the first one I went to, there were, literally, 35 people >> and only 1/2 of then were called CDO's. We've progressed now where we've got 100 people over 170 some odd people that are here that are CDO's. Most of them have the CDO title even. >> The fact that that title is much more pervasive says that we're heading that way. I think industry by industry you'll start seeing similar responsibilities for CDO's but I don't think you'll start seeing it across the board like a CFO where a CFO does the same thing regardless of the industry. I don't think you'll see that in a CDO for quite some time. >> Well one of the things, certainly, we find interesting is that the role the data's playing in business involvement. And it, partly, the CDO's job is to explain to his or her peers, at that chief level, how using data is going to change the way that they do things from the way that they're function works. And that's part of the reason, I think, why you're suggesting that on a vertical basis that the CDO's job is different. Cause different industries are being impacted themselves by data differently. So as you think about the job that you're performing and the job the CDO's are performing, what part is technical? What part is organizational? What part is political? Et cetera. >> I think a lot of the role of a CDO is political. Most of the CDO's that I know have built their careers on stomping on people's toes. How do I drive change by infringing on other people's turf effectively? >> Peter: In a nice way. >> Well, it depends. In the appropriate way, right? >> Peter: In a productive way. >> In the appropriate way. It could be nice, it could not be nice >> depending on the politics and the culture of the organization. I think a lot of the role of a CDO, it's, almost, like chief disruption officer as much as it is data officer. I think it's a lot about using data >> but, I think, more importantly, it's about using analytics. >> So how do you use analytics to, actually, drive insights and next best action from the data? I think just looking at data and still using gut based on data is not good enough. For chief data officers to really have an impact and really be successful, it's how do you use analytics on that data whether it's machine learning, deep learning, operations research, to really change how the business operates? Because as chief data officers, you need to justify your existence a lot. The way you do that is you tie real value to decisions that your company is making. The data and the analytics that are needed for those decisions. That's, really, the role of a CDO in my mind is, how do I tie value of data based on decisions and how do I use analytics to make those decisions more effective? >> Were the early days more defensive and now, shifting to offensive? It sounds like it. That's a typical case where you use technology, initially, often to save money before you start to use it to create new value, new revenue streams. Is that consistent here? By answering that, you say they have to defend themselves sometimes when you would think it'd be patently obvious >> that if you're not getting on a data software defined train, you're going to be left behind. >> I think there's two types. There's CDO's that are there to protect freedom to operate and that's what I call, think of, as defensive. And then, there's offensive CDO's and that's really bringing more value out of existing processes. In my mind, every company is on this digital transformation journey and there's two steps to it. >> One is this data science transformation which is where you use data and analytics to accelerate your businesses current goals. How do I use data analytics to accelerate my businesses march towards it's current goals? Then there's the second stage which is the true digital transformation which is how do I use data and analytics to, fundamentally, change how my industry and my company operates? So, actually, changing the goals of the industry. For example, moving from selling physical products to selling outcomes. You can't do that until you've done this data transformation till you've started operating on data, till you've started operating on analytics. You can't sell outcomes until you've done that. It's this two step journey. >> You said this a couple of times and I want to test an idea on you and see what you think. Industry classifications are tied back to assets. So, you look at industries and they have common organization of assets, right? >> Seth: Yep. Data, as an asset, has very, very, different attributes because it can be shared. It's not scarce, it's something that can be shared. As we become more digital and as this notion of data science or analytics, the world of data places in asset and analytics plays as assets becomes more pervasive, does that start to change the notion of industry because, now, by using data differently, you can use other assets and deploy other assets differently? >> Yeah, I think it, fundamentally, changes how business operates and even how businesses are measured because you hit on this point pretty well which is data is reusable. And so as I build these data or digital assets, the quality of a company's margins should change. For every dollar of revenue I generate. Maybe today I generate 15% profit. As you start moving to a digital being a more digital company built on data and analytics, that percent of profit based on revenue should go up. Because these assets that you're building to reuse them is extremely cheap. I don't have to build another factory to scale up, I buy a little bit more compute time. Or I develop a new machine learning model. And so it's very scalable unlike building physical products. I think you will see a fundamental shift in how businesses are measured. What standards that investors hold businesses to. I think, another good point is, a mind set shift that needs to happen for companies is that companies need to stop thinking of data as a digital dropping of applications and start thinking of it as an asset. Cause data has value. It's no longer just something that's dropped on the table from applications that I built. It's we are building to, fundamentally, create data to drive analytics, to generate value, to build new revenue for a company that didn't exist today. >> Well the thing that changes the least, ultimately, is the customer. And so it suggests that companies that have customers can use data to get in a new product, or new service domains faster than companies who don't think about data as an asset and are locked into how can I take my core set up, my organization, >> my plant, my machinery and keep stamping out something that's common to it or similar to it. So this notion of customer becomes the driver, increasingly, of what industry you're in or what activities you perform. Does that make sense? >> I think everything needs to be driven from the prospective of the customer. As you become a data driven or a digital company, everything needs to be shifted in that organization from the perspective of the customer. Even companies that are B to B. B to B companies need to start thinking about what is the ultimate end user. How are they going to use what I'm building, for my business partner, my B to B partner, >> what is their, actual, human being that's sitting down using it, how are they going to use it? How are they going to interact with it? It really, fundamentally, changes how businesses approach B to B relationships. It, fundamentally, changes the type of information that, if I'm a B to B company, how do I get more information about the end users and how do I connect? Even if I don't come in direct contact with them, how do I understand how they're using my product better. That's a fundamental just like you need to stop thinking of data as a digital dropping. Every question needs to come from how is the end user, ultimately, going to use this? How do I better deploy that? >> So the utility that the customer gets capturing data about the use of that, the generation of that utility and drive it all the way back. Does the CDO have to take a more explicit role in getting people to see that? >> Yes, absolutely. I think that's part of the cultural shift that needs to happen. >> Peter: So how does the CDO do that? >> I think every question needs to start with what impact does this have on the end user? >> What is the customer perspective on this? Really starting to think about. >> I'm sorry for interrupting. I'd turn that around. I would say it's what impact does the customer have on us? Because you don't know unless you capture data. That notion of the customer impact measurement >> which we heard last time, the measureability and then drive that all the way back. That seems like it's going to become an increasingly, a central design point. >> Yeah, it's a loop and you got to start using these new methodologies that are out there. These design thinking methodologies. It's not just about building an Uber app. It's not just about building an app. It's about how do I, fundamentally, shift my business to this design thinking methodology to start thinking cause that's what design thinking is all about. It's all about how is this going to be used? And every aspect of your business you need to approach that way. >> Seth, I'm afraid they're going to put us in the chaffing dish here if we don't get off soon. >> Seth: I think so too, yeah. >> So we're going to leave it there. It's great to see you again and we look forward to seeing you at the next one of these things. >> Yeah, thanks so much. >> He's Seth, he's Peter, I'm Jeff. You're watching theCUBE from the IBM Chief Data Officers Strategy Summit Spring 2017, I got it all in in a mouthful. We'll be back after lunch which they're >> setting up right now. (laughs) (lively music) (drum beats)

Published Date : Mar 29 2017

SUMMARY :

Brought to you by IBM. Drinking their own champagne as they like to say. They're, now, going to go to multiple countries. Yeah, so these events are, actually, really good. to have in depth conversations with their peers but some of the specific challenges data science problems, to solve your cloud problems, And so, that's the scope, the breadth of, Cause, obviously, the CFO and the CMO, I think you're starting to see, that are here that are CDO's. seeing it across the board like a CFO And it, partly, the CDO's job is to explain Most of the CDO's that I know have built In the appropriate way, right? In the appropriate way. and the culture of the organization. it's about using analytics. For chief data officers to really have an impact and now, shifting to offensive? that if you're not getting on There's CDO's that are there to protect freedom to operate So, actually, changing the goals of the industry. and see what you think. does that start to change the notion of industry is that companies need to stop thinking Well the thing that changes the least, something that's common to it or similar to it. in that organization from the perspective of the customer. how are they going to use it? Does the CDO have to take a more that needs to happen. What is the customer perspective on this? That notion of the customer impact measurement That seems like it's going to become It's all about how is this going to be used? Seth, I'm afraid they're going to It's great to see you again the IBM Chief Data Officers Strategy Summit (lively music)

ENTITIES

Entity	Category	Confidence
Peter Burris	PERSON	0.99+
Jeff Flick	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Jeff	PERSON	0.99+
Seth	PERSON	0.99+
Peter	PERSON	0.99+
Boston	LOCATION	0.99+
October	DATE	0.99+
two types	QUANTITY	0.99+
second stage	QUANTITY	0.99+
two step	QUANTITY	0.99+
IBM Analytics	ORGANIZATION	0.99+
35 people	QUANTITY	0.99+
100 people	QUANTITY	0.99+
two steps	QUANTITY	0.99+
second one	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
first one	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
today	DATE	0.99+
three years ago	DATE	0.98+
One	QUANTITY	0.98+
one piece	QUANTITY	0.98+
one	QUANTITY	0.97+
each	QUANTITY	0.94+
Wikibon	ORGANIZATION	0.92+
last four months	DATE	0.9+
IBM Chief Data Officers Strategy Summit Sprint 2017	EVENT	0.9+
about 2 1/2 years ago	DATE	0.89+
Chief Data Officers Strategy Summit	EVENT	0.88+
Spring 2017	DATE	0.85+
over 170	QUANTITY	0.85+
IBM Chief Data Officers Strategy Summit Spring 2017	EVENT	0.84+
15% profit	QUANTITY	0.83+
CDO	TITLE	0.82+
170 plus CDO	QUANTITY	0.79+
CDO Strategy Summit	EVENT	0.77+
Fisherman's Wharf	LOCATION	0.76+
1/2	QUANTITY	0.75+
CUBE	ORGANIZATION	0.73+
#IBMCDO	ORGANIZATION	0.69+
theCUBE	ORGANIZATION	0.52+
IBM	EVENT	0.51+

Seth Dobrin, IBM - IBM Interconnect 2017 - #ibminterconnect - #theCUBE

>> Announcer: Live from Las Vegas, it's theCUBE, covering InterConnect 2017. Brought to you by IBM. >> Okay welcome back everyone. We are here live in Las Vegas from Mandalay Bay for IBM InterConnect 2017. This is theCUBE's three day coverage of IBM InterConnect. I'm John Furrier with my co-host Dave Vellante. Or next guest is Seth Dobrin, Vice President and Chief Data Officer for IBM Analytics. Welcome to theCUBE, welcome back. >> Yeah, thanks for having me again. I love sittin' down and chattin' with you guys. >> You're a CDO, Chief Data Officer and that's a really kind of a really pivotal role because you got to look at, as a chief, over all of the data with IBM Analytics. Also you have customers you're delivering a lot solutions to and it's cutting edge. I like the keynote on day one here. You had Chris Moody at Twitter. He's a data guy. >> Seth: Yep. >> I mean you guys have a deal with Twitter so he got more data. You've got the weather company, you got that data set. You have IBM customer data. You guys are full with data right now. >> We're first seat at the scenes with data and that's a good thing. >> So what's the strategy and what are you guys working on and what's the key points that you guys are honing in on? Obviously, Cognitive to the Core is Robetti's theme. How are you guys making data work for IBM and your customers? >> If you think about IBM Analytics, we're really focusing on five key areas, five things that we think if we get right, we'll help our clients learn how to drive their business and data strategies right. One is around how do I manage data across hybrid environments? So what's my hybrid data management strategy? It used to be how do I get to public cloud, but really what it is, it's a conversation about every enterprise has their business critical assets, what people call legacy. If we call them business critical and we think about-- These are how companies got here today. This is what they make their money on today. The real challenge is how do we help them tie those business critical assets to their future state cloud, whether it's public cloud, private cloud, or something in between our hybrid cloud. One of the key strategies for us is hybrid data management. Another one is around unified governance. If you look at governance in the past, governance in the past was an inhibitor. It was something that people went (groan) "Governance, so I have to do it." >> John: Barb wire. >> Right, you know. When I've been at companies before, and thought about building a data strategy, we spent the first six months building data strategy trying to figure out how to avoid data governance, or the word data governance, and really, we need to embrace data governance as an enabler. If you do it right, if you do it upfront, if you wrap things that include model management, how do I make sure that my data scientists can get to the data they need upfront by classifying data ahead of time; understanding entitlements, understanding what intent when people gave consent was. You also take out of the developer hands the need to worry about governance because now in a unified governance platform, right, it's all API-driven. Just like our applications are all API-driven, how do we make our governance platform API-driven? If I'm an application developer, by the way, I'm not, I can now call on API to manage governance for me, so I don't need to worry about am I giving away the shop. Am I going to get the company sued? Am I going to get fired? Now I'm calling on API. That's only two of them, right? The third one is really around data science and machine learning. So how do we make machine learning pervasive across enterprises and things like data science experience. Watson, IBM, machine learning. We're now bringing that machine-learning capability to the private cloud, right, because 90% of data that exists can't be Googled so it's behind firewalls. How do we bring machine learning to that? >> One more! >> One more! That's around, God, I gave you quite a list-- >> Hybrid data management, you defined governance, data science and machine learning-- >> Oh, the other one is Open Source, our commitment to Open Source. Our commitment to Open Source, like Hadoop, Spark, as we think about unified governance, a truly unified governed platform needs to be built on top of Open Source, so IBM is doubling down on our commitment to Apache Spark as a framework backbone, a metadata framework for our unified governed platform. >> What's the biggest para >> Wait, did we miss one? Hybrid data management, unified governance, data science machine learning (talking over another), pervasive, and open source. >> That's four. >> I thought it was five. >> No. >> Machine learning and data science are two, so typically five. >> There's only four. If I said five, there's only four. >> Cover the data governance thing because this unification is interesting to me because one of the things we see in the marketplace, people hungry for data ops. Like what data ops was for cloud was a whole application developer model developing where as a new developer persona emerging where I want to code and I want to just tap data handled by brilliant people who are cognitive engines that just serve me up what I need like a routine or a procedure, or a subroutine, whatever you want to call it, that's a data DevOps model kind of thing. How will you guys do it? Do you agree with that and how does that play out? >> That's a combination, in my mind, that's a combination of an enterprise creating data assets, so treating data as the asset it is and not a digital dropping of applications, and it's that combined with metadata. It gets back to the Apache Atlas conversation. If you want to understand your data and know where it is, it's a metadata problem. What's the data; what's the lineage; where is it; where does it live; how do I get to it; what can I, can't I do with it, and so that just reinforces the need for an Open Source ubiquitous metadata catalog, a single catalog, and then a single catalog of policies associated with that all driven in a composable way through API. >> That's a fundamental, cultural thinking shift because you're saying, "I don't want to just take exhaust "from apps, which is just how people have been dealing with data." You're saying, "Get holistic and say you need to create an asset class or layer or something that is designed." >> If an enterprises are going to be successful with data, now we're getting to five things, right, so there's five things. They need to treat data as an asset. It's got to be a first-class citizen, not a digital dropping, and they need a strategy around it. So what are, conceptually, what are the pieces of data that I care about? My customers, my products, my talent, my finances, what are the limited number of things. What is my data science strategy? How do I build deployable data science assets? I can't be developing machine-learning models and deploying them in Excel spreadsheets. They have to be integrated into My Processes. I have to have a cloud strategy so am I going to be on premise? Am I going to be off premise? Am I going to be something in between? I have to get back to unified governance. I have to govern it, right? Governing in a single place is hard enough, let alone multiple places, and then my talent disappears. >> Could you peg a progress bar of the industry where these would be, what you just said, because, I think-- >> Dave: Again, we only got through four. >> No talent was the last one. >> Talent, sorry, missed it. >> In the progress bar of work, how are the enterprises right now 'cause actually the big conversation on the cloud side is enterprise-readiness, enterprise-grade, that's kind of an ongoing conversation, but now, if you take your premise, which I think is accurate, is that I got to have a centralized data strategy and platform, not a data (mumbles), more than that, software, et cetera, where's the progress bar? Where are people, Pegeninning? >> I think they are all over the map. I've only been with IBM for four months and I've been spending much of that time literally traveling around the world talking to clients, and clients are all over the map. Last week I spent a week in South America with a media company, a cable company down there. Before setting up the meeting, the guy was like, "Well, you know, we're not that far along "down this journey," and I was like, "Oh, my God, "you guys are like so far ahead of everyone else! "That's not even funny!" And then I'm sitting down with big banks that think they're like way out there and they haven't even started on the journey. So it's really literally all over the place and it's even within industry. There's financial companies that are also way out there. There's another bank in Brazil that uses biometrics to access ATMs, you don't need a pin anymore. They have analytics that drive all that. That's crazy. We don't have anything like that here. >> Are you meeting with CDOs? >> Yeah, mostly CDOs, or kind of defacto like we talked about before this show. Mostly CDOs. >> So you may be unique in the sense that you are working for a technology company, so a lot of your time is outward focused, but when you travel around and meet with the CDOs, how much of their time is inward-focused versus outward-focused? >> My time is actually split between inward and outward focus because part of my time is transforming our own business using data and analytics because IBM is a company and we got to figure out how to do that. >> Is it correct that yours is probably a higher percentage outward? >> Mine's probably a higher percentage outward than most CDOs, yeah. So I think most CDOs are 7%, 80% inward-focused and 20% outward-focused, and a lot of that outward focus is just trying to understand what other people are doing. >> I guess it's okay for now, but will that change over time? >> I think that's about right. It gets back to the other conversation we had before the show about your monetization strategy. I think if a company progresses where it's not longer about how do I change my processes and use data to monetize my internal process. If I'm going to start figuring how I sell data, then CDOs need to get a more external-- >> But you're supporting the business in that role and that's largely going to be an internal function of data-quality, governance, and the like, like you say, the data science strategy. >> Yeah, and I think it's important when I talk about data governance, I think things that we used to talk about is data management is all part of data governance. Data governance is not just controlling. It's all of that. It's how do I understand my data, how do I provide access to my data. It's all those things you need to enable your business to thrive on data. >> My question for you is a personal one. How did you get to be a CDO? Do you go to a class? I'm going to be a CDO someday. Not that you do that, I'm just-- >> CDO school. >> CDO school. >> Seth: I was staying in a Holiday Express last night. (laughing) >> Tongue in cheek aside, people are getting into CDO roles from interesting vectors, right? Anthropology, science, art, I mean, it's a really interesting, math geeks certainly love, they thrive there, but there's not one, I haven't yet seen one sweet spot. Take us through how you got into it and what-- >> I'm not going to fit any preconceived notion of what a CDO is, especially in a technology company. My background is in molecular and statistical genetics. >> Dave: Well, that explains it. >> I'm a geneticist. >> Data has properties that could be kind of biological. >> And actually, if you think about the routes of big data and data science, or big data, at least, the two of the predative, they're probably fundamental drivers of the concept of big data were genetics and astrophysics. So 20 years ago when I was getting my PhD, we were dealing with tens and hundreds of gigabyte-sized files. We were trying to figure out how do we get stuff out of 15 Excel files because they weren't big enough into a single CSV file. Millions of rows and millions of crude, by today's standard, but it was still, how do we do this, and so 20 years ago I was learning to be a data scientist. I didn't know it. I stopped doing that field and I started managing labs for a while and then in my last role, we kind of transformed how the research group within that company, in the agricultural space, handled and managed data, and I was simultaneously the biggest critic and biggest advocate for IT, and they said, "Hey, come over and help us figure out how to transform "the company the way we've transformed this group." >> It's looks like when you talk about your PhD experience, it's almost like you were so stuck in the mud with not having to compute power or sort of tooling. It's like a hungry man saying "Oh, it's an unlimited "abundance of compute, oh, I love what's going on." So you almost get gravitated, pulled into that, right? >> It was funny, I was doing a demo upstairs today with, one of the sales guys was doing a demo with some clients, and in one line of code, they had expressed what was part of my dissertation. It was a single line of code in a script and it was like, that was someone's entire four-year career 20 years ago. >> Great story, and I think that's consistent with just people who just attracted to it, and they end up being captains of industry. This is a hot field. You guys have a CDO of that happening in San Francisco. We'll be doing some live streaming there. What's the agenda because this is a very accelerating field? You mentioned now dealing practically with compliance and governance, which is you'd run in the other direction in the old days, now this embracing that. It's got to get (mumbles) and discipline in management. What's going to go on at CDO Summit or do you know? >> At the CDO Summit next week, I think we're going to focus on three key areas, right? What does a cloud journey look like? Maybe four key areas, right. So a cloud journey, how do you monetize data and what does that even mean, and talent, so at all these CDO Summits, the IBM CDO Summits have been going on for three or four years now, every one of them has a talent conversation, and then governance. I think those are four key concepts, and not surprising, they were four of my five on my list. I think that's what really we're going to talk about. >> The unified governance, tell us how that happens in your vision because that's something that you hear unified identity, we hear block chain looking at a whole new disruptive way of dealing with value digitally. How do you see the data governance thing unifying? >> Well, I think again, it's around... IBM did a great job of figuring out how to take an Open Source product that was Spark, and make it the heart of our products. It's going to be the same thing with governance where you're going to see Apache Atlas is at its infancy right now, having that open backbone so that people can get in and out of it easy. If you're going to have a unified governance platform, it's going to be open by definition because I need to get other people's products on there. I can't go to an enterprise and say we're going to sell your unified governance platform, but you got to buy all IBM, or you got to spend two years doing development work to get it on there. So open is the framework and composable, API-driven, and pro-active are really, I think, that's kind of the key pieces for it. >> So we all remember the client-server days where it took a decade and a half to realize, "Oh, my Gosh, this is out of control "and we need to bring it back in." And the Wild West days of big data, it feels like enterprises have nipped that governance issue in the butt at least, maybe they don't have it under control yet, but they understand the need to get it under control. Is that a fair statement? >> I think they understand the need. The data is so big and grows so fast that another component that I didn't mention, maybe it was implied a little bit, but, is automation. You need to be able to capture metadata in an automated fashion. We were talking to a client earlier who, 400 terabytes a day of data changes, not even talking about what new data they are ingesting, how do they keep track of that? It's got to be automated. This unified governance, you need to capture this metadata and as an automated fashion as possible. Master data needs to be automated when you think about-- >> And make it available in real time, low-latency because otherwise it becomes a data swamp. >> Right, it's got to be pro-active, real-time, on-demand. >> Another thing I wanted to ask you, Seth, and get your opinion on is sort of the mid-2000s when the federal rules of civil procedure changed in electronic documents and records became admissible, it was always about how do I get rid of data, and that's changed. Everybody wants to keep data and how to analyze it, and so forth, so what about that balance? And one of the challenges back then was data classification. I can't scale, by governance, I can't eliminate and defensively delete data unless I can classify it. Is the analog true where with data as an opportunity, I can't do a good job or a good enough job analyzing my data and keeping my data under control without some kind of automated classification, and has the industry solved that? >> I don't think the industry has completely solved it yet, but I think with cognitive tools, there's tools out there that we have that other people have that can automatically, if you give them parameters and train it, can classify the data for you, and I think classification is one of the keys. You need to understand how the data's classified so you understand who can access it, how long you should keep it, and so it's key, and that's got to be automated also. I think we've done a fair job as an industry of doing that. There's still a whole lot of work, especially as you get into the kind of specialized sectors, and so I think that's a key and we've got to do a better job of helping companies train those things so that they work. I'm a big proponent of don't give your data away to IT companies. It's your asset. Don't let them train their models with your data and sell it to other people, but there are some caveats out. There are some core areas where industries need to get together and let IT companies, whether it's IBM or someone else, train models for things just like that, for classification because if someone gets it wrong, it can bring the whole industry down. >> It's almost as if (talking over each other) source paradigm almost. It's like Open Source software. Share some data, but I-- >> Right, and there's some key things that aren't differentiating that, as an industry, you should get together and share. >> You guys are making, IBM is making a big deal out of this, and I think it's super important. I think it's probably the top thing that CDOs and CIOs need to think about right now is if I really own my data and that data is needed to train my big data models, who owns the models and how do I protect my IP. >> And are you selling it to my competitors. Are you going down the street and taking away my IP, my differentiating IP and giving it to my competitor? >> So do I own the model 'cause the data and models are coming together, and that's what IBM's telling me. >> Seth: Absolutely. >> I own the data and the models that it informs, is that correct? >> Yeah, that's absolutely correct. You guys made the point earlier about IBM bursting at the seams on data. That's really the driver for it. We need to do a key set of training. We need to train our models with content for industries, bring those trained models to companies and let them train specific versions for their company with their data that unless there's a reason they tell us to do it, is never going to leave their company. >> I think that's a great point about you being full of data because a lot of people who are building solutions and scaffolding for data, aka software never have more data full. The typical, "Oh, I'm going to be a software company," and they build something that they don't (mumbles) for. Your data full, so you know the problem. You're living it every day. It's opportunity. >> Yeah, and that's why when a startup comes to you and says, "Hey, we have this great AI algorithm. "Give us your data," they want to resell that model, and because they don't have access to the content. If you look at what IBM's done with Watson, right? That's why there's specialized verticals that we're focusing Watson, Watson Health, Watson Financial, because where we are investing in data in those areas you can look at the acquisitions we've done, right. We're investing in data to train those models. >> We should follow up on this because this brings up the whole scale point. If you look at all the innovators of the past decade, even two decades, Yahoo, Google, Facebook, these are companies that were webscalers before there was anything that they could buy. They built their own because they had their own problem at scale. >> At scale. >> And data at scale is a whole other mind-blowing issue. Do you agree? >> Absolutely. >> We're going to put that on the agenda for the CDO Summit in San Francisco next week. Seth, thanks so much for joining us on theCube. Appreciate it; Chief Data Officer, this is going to be a hot field. The CDO is going to be a very important opportunity for anyone watching in the data field. This is going to be new opportunities. Get that data, get it controlled, taming the data, making it valuable. This is theCUBE, taming all of the content here at InterConnect. I'm John Furrier with Dave Vellante. More content coming. Stay with us. Day Two coverage continues. (innovative music tones)

Published Date : Mar 22 2017

SUMMARY :

Brought to you by IBM. Welcome to theCUBE, welcome back. chattin' with you guys. I like the keynote on day one here. I mean you guys have the scenes with data what are you guys working on I get to public cloud, the need to worry about governance platform needs to be built data science machine learning data science are two, If I said five, there's only four. one of the things we see and so that just reinforces the need for and say you need to create Am I going to be off premise? to access ATMs, you like we talked about before this show. and we got to figure out how to do that. a lot of that outward focus If I'm going to start and that's largely going to how do I provide access to my data. I'm going to be a CDO someday. Seth: I was staying in a Take us through how you I'm not going to fit Data has properties that fundamental drivers of the concept it's almost like you and it was like, that was someone's It's got to get (mumbles) and not surprising, they were How do you see the data and make it the heart of our products. and a half to realize, Master data needs to be in real time, low-latency Right, it's got to be and has the industry solved that? and sell it to other people, It's almost as if Right, and there's some key things need to think about right giving it to my competitor? So do I own the model is never going to leave their company. Your data full, so you know the problem. have access to the content. innovators of the past decade, Do you agree? The CDO is going to be a

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Chris Moody	PERSON	0.99+
Seth Dobrin	PERSON	0.99+
Yahoo	ORGANIZATION	0.99+
John	PERSON	0.99+
Google	ORGANIZATION	0.99+
Facebook	ORGANIZATION	0.99+
Brazil	LOCATION	0.99+
Seth	PERSON	0.99+
90%	QUANTITY	0.99+
three	QUANTITY	0.99+
San Francisco	LOCATION	0.99+
tens	QUANTITY	0.99+
Mandalay Bay	LOCATION	0.99+
John Furrier	PERSON	0.99+
South America	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
20%	QUANTITY	0.99+
Last week	DATE	0.99+
two	QUANTITY	0.99+
five	QUANTITY	0.99+
two years	QUANTITY	0.99+
80%	QUANTITY	0.99+
four months	QUANTITY	0.99+
7%	QUANTITY	0.99+
five things	QUANTITY	0.99+
400 terabytes	QUANTITY	0.99+
Watson Health	ORGANIZATION	0.99+
IBM Analytics	ORGANIZATION	0.99+
Las Vegas	LOCATION	0.99+
four years	QUANTITY	0.99+
One	QUANTITY	0.99+
Watson Financial	ORGANIZATION	0.99+
next week	DATE	0.99+
Twitter	ORGANIZATION	0.99+
Excel	TITLE	0.99+
Las Vegas	LOCATION	0.99+
one	QUANTITY	0.99+
four	QUANTITY	0.99+
today	DATE	0.99+
Robetti	PERSON	0.99+
third one	QUANTITY	0.99+
Watson	ORGANIZATION	0.99+
CDO Summit	EVENT	0.99+
a week	QUANTITY	0.99+
mid-2000s	DATE	0.98+
single line	QUANTITY	0.98+
next week	DATE	0.98+
Millions of rows	QUANTITY	0.98+
15	QUANTITY	0.98+
three day	QUANTITY	0.98+
first six months	QUANTITY	0.97+
20 years ago	DATE	0.97+
day one	QUANTITY	0.96+
single catalog	QUANTITY	0.96+
five key areas	QUANTITY	0.96+

Seth Dobrin, IBM Analytics - Spark Summit East 2017 - #sparksummit - #theCUBE

>> Narrator: Live from Boston, Massachusetts, this is theCUBE! Covering Spark Summit East 2017. Brought to you by, Databricks. Now, here are your hosts, Dave Vellante and George Gilbert. >> Welcome back to Boston, everybody, Seth Dobrin is here, he's the vice president and chief data officer of the IBM Analytics Organization. Great to see you, Seth, thanks for coming on. >> Great to be back, thanks for having me again. >> You're welcome, so chief data officer is the hot title. It was predicted to be the hot title and now it really is. Many more of you around the world and IBM's got an interesting sort of structure of chief data officers, can you explain that? >> Yeah, so there's a global chief data officer, that's Inderpal Bhandari and he's been on this podcast or videocast a view times. Then he's set up structures within each of the business units in IBM. Where each of the major business units have a chief data officer, also. And so I'm the chief data officer for the analytics business unit. >> So one of Interpol's things when I've interviewed them is culture. The data culture, you've got to drive that in. And he talks about the five things that chief data officers really need to do to be successful. Maybe you could give us your perspective on how that flows down through the organization and what are the key critical success factors for you and how are you implementing them? >> I agree, there's five key things and maybe I frame a little differently than Interpol does. There's this whole cloud migration, so every chief data officer needs to understand what their cloud migration strategy is. Every chief data officer needs to have a good understanding of what their data science strategy is. So how are they going to build the posable data science assets. So not data science assets that are delivered through spreadsheets. Every chief data officer needs to understand what their approach to unified governance is. So how do I govern all of my platforms in a way that enables that last point about data science. And then there's a piece around people. How do I build a pipeline for me today and the future? >> So the people piece is both the skills, and it's presumably a relationship with the line of business, as well. There's sort of two vectors there, right? >> Yeah the people piece when I think of it, is really about skills. There's a whole cultural component that goes across all of those five pieces that I laid out. Finding the right people, with the right skillset, where you need them, is hard. >> Can you talk about cloud migration, why that's so critical and so hard? >> If you look at kind of where the industry's been, the IT industry, it's been this race to the public cloud. I think it's a little misguided, all along. If you look at how business is run, right? Today, enterprises that are not internet born, make their money from what's running their businesses today. So this business critical assets. And just thinking that you can pick those up and move them to the cloud and take advantage of cloud, is not realistic. So the race really, is to a hybrid cloud. Our future's really lie in how do I connect these business critical assets to the cloud? And how do I migrate those things to the cloud? >> So Seth, the CIO might say to you, "Okay, let's go there for a minute, I kind of agree with what you're saying, I can't just shift everything in to the cloud. But what I can do in a hybrid cloud that I can't do in a public cloud?" >> Well, there's some drivers for that. I think one driver for hybrid cloud is what I just said. You can't just pick everything up and move it overnight, it's a journey. And it's not a six month journey, it's probably not a year journey, it's probably a multi year journey. >> Dave: So you can actually keep running your business? >> So you can actually keep running your business. And then other piece is there's new regulations that are coming up. And these regulations, EUGDPR is the biggest example of them right now. There are very stiff fines, for violations of those policies. And the party that's responsible for paying those fines, is the party that who the consumer engaged with. It's you, it's whoever owns the business. And as a business leader, I don't know that I would be, very willingly give up, trust a third party to manage that, just any any third party to manage that for me. And so there's certain types of data that some enterprises may never want to move to the cloud, because they're not going to trust a third party to manage that risk for them. >> So it's more transparent from a government standpoint. It's not opaque. >> Seth: Yup. >> You feel like you're in control? >> Yeah, you feel like you're in control and if something goes wrong, it's my fault. It's not something that I got penalized for because someone else did something wrong. >> So at the data layer, help us sort of abstract one layer up and the applications. How would you partition the applications. The ones that are managing that critical data that has to stay on premises. What would you build up potentially to compliment it in the public cloud? >> I don't think you need to partition applications. The way you build modern applications today, it's all API driven. You can reduce some of the costs of latency, through design. So you don't really need to partition the applications, per say. >> I'm thinking more along the lines of that the systems of record are not going to be torn out and those are probably the last ones if ever to go to the public cloud. But other applications leverage them. If that's not the right way of looking at it, where do you add value in the public cloud versus what stays on premise? >> So some of the system of record data, there's no reason you can't replicate some of it to the cloud. So if it's not this personal information, or highly regulated information, there's no reason that you can't replicate some of that to the cloud. And I think we get caught up in, we can't replicate data, we can't replicate data. I don't think that's the right answer, I think the right answer is to replicate the data if you need to, or if the data and system of record is not in the right structure, for what I need to do, then let's put the data in the right structure. Let's not have the conversation about how I can't replicate data. Let's have the conversation about where's the right place for the data, where does it make most sense and what's the right structure for it? And if that means you've got 10 copies of a certain type of data then you've got 10 copies of a certain type of data. >> Would you be, on that data, would it typically be, other parts of the systems of record that you might have in the public cloud, or would they be new apps, sort of green field apps? >> Seth: Yes. >> George: Okay. >> Seth: I think both. And that's part of, i think in my mind, that's kind of how you build, that question you just asked right there. Is one of the things that guide how you build your cloud migration strategy. So we said you can't just pick everything up and move it. So how do you prioritize? You look at what you need to build to run your business differently. And you start there and you start thinking about how do I migrate information to support those to the cloud? And maybe you start by building a local private cloud. So that everything's close together until you kind of master it. And then once you get enough, critical mass of data and applications around it, then you start moving stuff to the cloud. >> We talked earlier off camera about reframing governance steps. I used to head a CIO consultancy and we worked with a number of CIOs that were within legal IT, for example. And were worried about compliance and governance and things of that nature. And their ROI was always scare the board. But the holy grail, was can we turn governance into something of value? For the organization? Can we? >> I think in the world we live in today, with ever increasing regulations. And with a need to be agile and with everyone needing to and wanting to apply data science at scale. You need to reframe governance, right? Governance needs to be reframed from something that is seen as a roadblock. To something that is truly an enabler. And not just giving it lip service. And what do I mean by that? For governance to be an enabler, you really got to think about, how do I upfront, classify my data so that all data in my organization is bucketed in to some version of public, propietary and confidential. Different enterprises may have 30 scales and some may only have two. Or some may have one. and so you do that up front and so you know what can be done with data, when it can be done and who it can by done with. You need to capture intent. So what are allowed intended uses of data? And as a data scientist, what am I intending to do with this data? So that you can then mesh those two things together? Cause that's important in these new regulations I talked about, is people give you access to data, their personal data for an intended purpose. And then you need to be able to apply these governance, policies, actively. So it's not a passive, after the fact. Or you got to stop and you got to wait, it's leveraging services. Leveraging APIs. And building a composable system of polices that are delivered through APIs. So if I want to create a sandbox. To run some analytics on. I'm going to call an API. To get that data. That API is going to call a policy API that's going to say, "Okay, does Seth have permission to see this data? Can Seth use this data for this intended purpose?" if yes, the sandbox is created. If not, there's a conversation about really why does Seth need access to this data? It's really moving governance to be actively to enable me to do things. And it changes the conversation from, hey it's your data, can I have it? To there's really solid reasons as to why I can and can't have data. >> And then some potential automation around a sandbox that creates value. >> Seth: Absolutely. >> But it's still, the example you gave, public prop6ietary or confidential. Is still very governance like, where I was hoping you were going with the data classification and I think you referenced this. Can I extend that, that schema, that nomenclature to include other attributes of value? And can i do it, automate it, at the point of creation or use and scale it? >> Absolutely, that is exactly what I mean. I just used those three cause it was the three that are easy to understand. >> So I can give you as a business owner some areas that I would like to see, a classification schema and then you could automate that for me at scale? In theory? >> In theory, that's where we're hoping to go. To be able to automate. And it's going to be different based on what industry vertical you're in. What risk profile your business is willing to take. So that classification scheme is going to look very different for a bank, than it will for a pharmaceutical company. Or for a research organization. >> Dave: Well, if I can then defensively delete data. That's of real value to an organization. >> With new regulations, you need to be able to delete data. And you need to be able to know where all of your data is. So that you can delete it. Today, most organizations don't know where all their data is. >> And that problem is solved with math and data science, or? >> I think that problem is solved with a combination of governance. >> Dave: Sure. >> And technology. Right? >> Yeah, technology kind of got us into this problem. We'll say technology can get us out. >> On the technology subject, it seems like, with the explosion of data, whether it's not just volume, but also, many copies of the truth. You would need some sort of curation and catalog system that goes beyond what you had in a data warehouse. How do you address that challenge? >> Seth: Yeah and that gets into what I said when you guys asked me about CDOs, what do they care about? One of the things is unified governance. And so part of unified governance, the first piece of unified governance is having a catalog of your data. That is all of your data. And it's a single catalog for your data whether it's one of your business critical systems that's running your business today. Whether it's a public cloud, or it's a private cloud. Or some combination of both. You need to know where all your data is. You also need to have a policy catalog that's single for both of those. Catalogs like this fall apart by entropy. And the more you have, the more likely they are to fall apart. And so if you have one. And you have a lot of automation around it to do a lot of these things, so you have automation that allows you to go through your data and discover what data is where. And keep track of lineage in an automated fashion. Keep track of provenance in an automated fashion. Then we start getting into a system of truly unified governance that's active like I said before. >> There's a lot of talk about digital transformations. Of course, digital equals data. If it ain't data, it ain't digital. So one of the things that in the early days of the whole big data theme. You'd hear people say, "You have to figure out how to monetize the data." And that seems to have changed and morphed into you have to understand how your organization gets value from data. If you're a for profit company, it's monetizing. Something and feeding how data contributes to that monetization if you're a health care organization, maybe it's different. I wonder if you could talk about that in terms of the importance of understanding how an organization makes money to the CDO specifically. >> I think you bring up a good point. Monetization of data and analytics, is often interpreted differently. If you're a CFO you're going to say, "You're going to create new value for me, I'm going to start getting new revenue streams." And that may or may not be what you mean. >> Dave: Sell the data, it's not always so easy. >> It's not always so easy and it's hard to demonstrate value for data. To sell it. There's certain types, like IBM owns a weather company. Clearly, people want to buy weather data, it's important. But if you're talking about how do you transform a business unit it's not necessarily about creating new revenue streams, it's how do I leverage data and analytics to run my business differently. And maybe even what are new business models that I could never do before I had data and data science. >> Would it be fair to say that, as Dave was saying, there's the data side and people were talking about monetizing that. But when you talk about analytics increasingly, machine learning specifically, it's a fusion of the data and the model. And a feedback loop. Is that something where, that becomes a critical asset? >> I would actually say that you really can't generate a tremendous amount of value from just data. You need to apply something like machine learning to it. And machine learning has no value without good data. You need to be able to apply machine learning at scale. You need to build the deployable data science assets that run your business differently. So for example, I could run a report that shows me how my business did last quarter. How my sales team did last quarter. Or how my marketing team did last quarter. That's not really creating value. That's giving me a retrospective look on how I did. Where you can create value is how do I run my marketing team differently. So what data do I have and what types of learning can I get from that data that will tell my marketing team what they should be doing? >> George: And the ongoing process. >> And the ongoing process. And part of actually discovering, doing this catalog your data and understanding data you find data quality issues. And data quality issues are not necessarily an issue with the data itself or the people, they're usually process issues. And by discovering those data quality issues you may discover processes that need to be changed and in changing those processes you can create efficiencies. >> So it sounds like you guys got a pretty good framework. Having talked to Interpol a couple times and what you're saying makes sense. Do you have nightmares about IOT? (laughing) >> Do I have nightmares about IOT? I don't think I have nightmares about IOT. IOT is really just a series of connected devices. Is really what it is. On my talk tomorrow, I'm going to talk about hybrid cloud and connect a car is actually one of the things I'm going to talk about. And really a connected car you're just have a bunch of connected devices to a private cloud that's on wheels. I'm less concerned about IOT than I am, people manually changing data. IOT you get data, you can track it, if something goes wrong, you know what happened. I would say no, I don't have nightmares about IOT. If you do security wrong, that's a whole nother conversation. >> But it sounds like you're doing security right, sounds like you got a good handle on governance. Obviously scale is a key part of that. Could break the whole thing if you can't scale. And you're comfortable with the state of technology being able to support that? At least with IBM. >> I think at least with an IBM I think I am. Like I said, a connected car which is basically a bunch of IOT devices, a private cloud. How do we connect that private cloud to other private clouds or to a public cloud? There's tons of technologies out there to do that. Spark, Kafka. Those two things together allow you to do things that we could never do before. >> Can you elaborate? Like in a connected car environment or some other scenario where, other people called it a data center on wheels. Think of it as a private cloud, that's a wonderful analogy. How does Spark and Kafka on that very, very, smart device, cooperate with something like on the edge. Like the cities, buildings, versus in the clouds? >> If you're a connected car and you're this private cloud on wheels. You can't drive the car just on that information. You can't drive it just on the LIDAR knowing how well the wheels are in contact, you need weather information. You need information about other cars around you. You need information about pedestrians. You need information about traffic. All of this information you get from that connection. And the way you do that is leveraging Spark and Kafka. Kafka's a messaging system, you could leverage Kafka to send the car messages. Or send pedestrian messages. "This car is coming, you shouldn't cross." Or vice versa. Get a car to stop because there's a pedestrian in the way before even the systems on the car can see it. So if you can get that kind of messaging system in near real time. If I'm the pedestrian I'm 300 feet away. A half a second that it would take for that to go through, isn't that big of a deal because you'll be stopped before you get there. >> What about the again, intelligence between not just the data, but the advanced analytics. Where some of that would live in the car and some in the cloud. Is it just you're making realtime decisions in the car and you're retraining the models in the cloud, or how does that work? >> No I think some of those decisions would be done through Spark. In transit. And so one of the nice things about something about Spark is, we can do machine learning transformations on data. Think ETL. But think ETL where you can apply machine learning as part of that ETL. So I'm transferring all this weather data, positioning data and I'm applying a machine learning algorithm for a given purpose in that car. So the purpose is navigation. Or making sure I'm not running into a building. So that's happening in real time as it's streaming to the car. >> That's the prediction aspect that's happening in real time. >> Seth: Yes. >> But at the same time, you want to be learning from all the cars in your fleet. >> That would happen up in the cloud. I don't think that needs to happen on the edge. Maybe it does, but I don't think it needs to happen on the edge. And today, while I said a car is a data center, a private cloud on wheels, there's cost to the computation you can have on that car. And I don't think the cost is quite low enough yet where you could do all that where it makes sense to do all that computation on the edge. So some of it you would want to do in the cloud. Plus you would want to have all the information from as many cars in the area as possible. >> Dave: We're out of time, but some closing thoughts. They say may you live in interesting times. Well you can sum up the sum of the changes that are going on the business. Dell buys EMC, IBM buys The Weather Company. And that gave you a huge injection of data scientists. Which, talk about data culture. Just last thoughts on that in terms of the acquisition and how that's affected your role. >> I've only been at IBM since November. So all that happened before my role. >> Dave: So you inherited? >> So from my perspective it's a great thing. Before I got there, the culture was starting to change. Like we talked about before we went on air, that's the hardest part about any kind of data science transformation is the cultural aspects. >> Seth, thanks very much for coming back in theCUBE. Good to have you. >> Yeah, thanks for having me again. >> You're welcome, all right, keep it right there everybody, we'll be back with our next guest. This is theCUBE, we're live from Spark Summit in Boston. Right back. (soft rock music)

Published Date : Feb 8 2017

SUMMARY :

Brought to you by, Databricks. of the IBM Analytics Organization. Many more of you around the world And so I'm the chief data officer and what are the key critical success factors for you So how are they going to build the posable data science assets. So the people piece is both the skills, with the right skillset, where you need them, is hard. So the race really, is to a hybrid cloud. So Seth, the CIO might say to you, And it's not a six month journey, So you can actually keep running your business. So it's more transparent from a government standpoint. Yeah, you feel like you're in control that has to stay on premises. I don't think you need to partition applications. of record are not going to be torn out to replicate the data if you need to, that guide how you build your cloud migration strategy. But the holy grail, So that you can then mesh those two things together? And then some potential automation But it's still, the example you gave, that are easy to understand. So that classification scheme is going to That's of real value to an organization. And you need to be able to know where all of your data is. I think that problem is solved And technology. Yeah, technology kind of got us into this problem. that goes beyond what you had in a data warehouse. And the more you have, And that seems to have changed and morphed into you have And that may or may not be what you mean. and it's hard to demonstrate value for data. it's a fusion of the data and the model. that you really can't generate a tremendous amount And by discovering those data quality issues you may So it sounds like you guys got a pretty good framework. of the things I'm going to talk about. Could break the whole thing if you can't scale. Those two things together allow you Can you elaborate? And the way you do that is leveraging Spark and Kafka. and some in the cloud. But think ETL where you can apply machine That's the prediction aspect you want to be learning from all the cars in your fleet. to the computation you can have on that car. And that gave you a huge injection of data scientists. So all that happened before my role. that's the hardest part about any kind Good to have you. we'll be back with our next guest.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
George Gilbert	PERSON	0.99+
Seth	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Inderpal Bhandari	PERSON	0.99+
10 copies	QUANTITY	0.99+
Seth Dobrin	PERSON	0.99+
Dell	ORGANIZATION	0.99+
300 feet	QUANTITY	0.99+
one	QUANTITY	0.99+
two	QUANTITY	0.99+
six month	QUANTITY	0.99+
both	QUANTITY	0.99+
Boston	LOCATION	0.99+
30 scales	QUANTITY	0.99+
last quarter	DATE	0.99+
five things	QUANTITY	0.99+
five pieces	QUANTITY	0.99+
IBM Analytics Organization	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
each	QUANTITY	0.99+
two things	QUANTITY	0.99+
today	DATE	0.99+
November	DATE	0.99+
tomorrow	DATE	0.99+
Today	DATE	0.99+
single	QUANTITY	0.99+
The Weather Company	ORGANIZATION	0.99+
two vectors	QUANTITY	0.99+
EMC	ORGANIZATION	0.98+
Spark	TITLE	0.98+
Interpol	ORGANIZATION	0.98+
IBM Analytics	ORGANIZATION	0.98+
one driver	QUANTITY	0.98+
One	QUANTITY	0.97+
first piece	QUANTITY	0.97+
Kafka	PERSON	0.97+
three	QUANTITY	0.97+
Spark Summit East 2017	EVENT	0.93+
a year	QUANTITY	0.93+
Spark Summit	EVENT	0.92+
five key things	QUANTITY	0.91+
single catalog	QUANTITY	0.9+
EUGDPR	TITLE	0.9+
one layer	QUANTITY	0.9+
Spark	PERSON	0.88+
Kafka	TITLE	0.86+
half a second	QUANTITY	0.84+
Databricks	ORGANIZATION	0.82+

Gene Kolker, IBM & Seth Dobrin, Monsanto - IBM Chief Data Officer Strategy Summit 2016 - #IBMCDO

>> live from Boston, Massachusetts. It's the Cube covering IBM Chief Data Officer Strategy Summit brought to you by IBM. Now, here are your hosts. Day Volante and Stew Minimum. >> Welcome back to Boston, everybody. This is the Cube, the worldwide leader in live tech coverage. Stillman and I have pleased to have Jean Kolker on a Cuba lem. Uh, he's IBM vice president and chief data officer of the Global Technology Services division. And Seth Dobrin who's the Director of Digital Strategies. That Monsanto. You may have seen them in the news lately. Gentlemen. Welcome to the Cube, Jean. Welcome back. Good to see you guys again. Thanks. Thank you. So let's start with the customer. Seth, Let's, uh, tell us about what you're doing here, and then we'll get into your role. >> Yes. So, you know, the CDO summit has been going on for a couple of years now, and I've been lucky enoughto be participating for a couple of a year and 1/2 or so, Um, and you know, really, the nice thing about the summit is is the interaction with piers, um, and the interaction and networking with people who are facing similar challenges from a similar perspective. >> Yes, kind of a relatively new Roland topic, one that's evolved, Gene. We talked about this before, but now you've come from industry into, ah, non regulated environment. Now what's happened like >> so I think the deal is that way. We're developing some approaches, and we get in some successes in regulated environment. Right? And now I feel with And we were being client off IBM for years, right? Using their technology's approaches. Right? So and now I feel it's time for me personally to move on something different and tried to serve our power. I mean, IBM clients respected off in this striking from healthcare, but their approaches, you know, and what IBM can do for clients go across the different industries, right? And doing it. That skill that's very beneficial, I think, for >> clients. So Monsanto obviously guys do a lot of stuff in the physical world. Yeah, you're the head of digital strategy. So what does that entail? What is Monte Santo doing for digital? >> Yes, so, you know, for as head of digital strategies for Monsanto, really? My role is to number one. Help Monsanto internally reposition itself so that we behave and act like a digital companies, so leveraging data and analytics and also the cultural shifts associated with being more digital, which is that whole kind like you start out this conversation with the whole customer first approach. So what is the real impact toe? What we're doing to our customers on driving that and then based on on those things, how can we create new business opportunities for us as a company? Um, and how can we even create new adjacent markets or new revenues in adjacent areas based on technologies and things we already have existing within the company? >> It was the scope of analytics, customer engagement of digital experiences, all of the above, so that the scope is >> really looking at our portfolio across the gamut on DH, seeing how we can better serve our customers and society leveraging what we're doing today. So it's really leveraging the re use factor of the whole digital concept. Right? So we have analytics for geospatial, right? Big part of agriculture is geospatial. Are there other adjacent areas that we could apply some of that technology? Some of that learning? Can we monetize those data? We monetize the the outputs of those models based on that, Or is there just a whole new way of doing business as a company? Because we're in this digital era >> this way? Talked about a lot of the companies that have CEOs today are highly regulated. What are you learning from them? What's what's different? Kind of a new organization. You know, it might be an opportunity for you that they don't have. And, you know, do you have a CDO yet or is that something you're planning on having? >> Yes, So we don't have a CDO We do have someone acts as an essential. he's a defacto CEO, he has all of the data organizations on his team. Um, it's very recent for Monsanto, Um, and and so I think, you know, in terms of from the regular, what can we learn from, you know, there there are. It's about half financial people have non financial people, are half heavily regulated industries, and I think, you know, on the surface you would. You would think that, you know, there was not a lot of overlap, but I think the level of rigor that needs to go into governance in a financial institution that same thought process. Khun really be used as a way Teo really enable Maur R and D. Mohr you know, growth centered companies to be able to use data more broadly and so thinking of governance not as as a roadblock or inhibitor, but really thinking about governance is an enabler. How does it enable us to be more agile as it enable us to beam or innovative? Right? If if people in the company there's data that people could get access to by unknown process of known condition, right, good, bad, ugly. As long as people know they can do things more quickly because the data is there, it's available. It's curated. And if they shouldn't have access it under their current situation, what do they need to do to be able to access that data? Right. So if I would need If I'm a data scientist and I want to access data about my customers, what can I can't? What can and can't I do with that data? Number one doesn't have to be DEA Nana Mayes, right? Or if I want to access in, it's current form. What steps do I need to go through? What types of approval do I need to do to do to access that data? So it's really about removing roadblocks through governance instead of putting him in place. >> Gina, I'm curious. You know, we've been digging into you know, IBM has a very multifaceted role here. You know how much of this is platforms? How much of it is? You know, education and services. How much of it is, you know, being part of the data that your your customers you're using? >> Uh so I think actually, that different approaches to this issues. My take is basically we need Teo. I think that with even cognitive here, right and data is new natural resource worldwide, right? So data service, cognitive za za service. I think this is where you know IBM is coming from. And the BM is, you know, tradition. It was not like that, but it's under a lot of transformation as we speak. A lot of new people coming in a lot off innovation happening as we speak along. This line's off new times because cognitive with something, really you right, and it's just getting started. Data's a service is really new. It's just getting started. So there's a lot to do. And I think my role specifically global technology services is you know, ah, largest by having your union that IBM, you're 30 plus 1,000,000,000 answered You okay? And we support a lot of different industries basically going across all different types of industries how to transition from offerings to new business offerings, service, integrated services. I think that's the key for us. >> Just curious, you know? Where's Monsanto with kind of the adoption of cognitive, You know what? Where are you in that journey? >> Um, so we are actually a fairly advanced in the journey In terms of using analytics. I wouldn't say that we're using cognitive per se. Um, we do use a lot of machine learning. We have some applications that on the back end run on a I So some form of artificial or formal artificial intelligence, that machine learning. Um, we haven't really gotten into what, you know, what? IBM defined his cognitive in terms of systems that you can interact with in a natural, normal course of doing voice on DH that you spend a whole lot of time constantly teaching. But we do use like I said, artificial intelligence. >> Jean I'm interested in the organizational aspects. So we have Inderpal on before. He's the global CDO, your divisional CDO you've got a matrix into your leadership within the Global Services division as well as into the chief date officer for all of IBM. Okay, Sounds sounds reasonable. He laid out for us a really excellent sort of set of a framework, if you will. This is interval. Yeah, I understand your data strategy. Identify your data store says, make those data sources trusted. And then those air sequential activities. And in parallel, uh, you have to partner with line of business. And then you got to get into the human resource planning and development piece that has to start right away. So that's the framework. Sensible framework. A lot of thought, I'm sure, went into it and a lot of depth and meaning behind it. How does that framework translate into the division? Is it's sort of a plug and play and or is there their divisional goals that are create dissonance? Can you >> basically, you know, I'm only 100 plus days in my journey with an IBM right? But I can feel that the global technology services is transforming itself into integrated services business. Okay, so it's thiss framework you just described is very applicable to this, right? So basically what we're trying to do, we're trying to become I mean, it was the case before for many industries, for many of our clients. But we I want to transform ourselves into trusted broker. So what they need to do and this framework help is helping tremendously, because again, there's things we can do in concert, you know, one after another, right to control other and things we can do in parallel. So we trying those things to be put on the agenda for our global technology services, okay. And and this is new for them in some respects. But some respects it's kind of what they were doing before, but with new emphasis on data's A service cognitive as a service, you know, major thing for one of the major things for global technology services delivery. So cognitive delivery. That's kind of new type off business offerings which we need to work on how to make it truly, you know, once a sense, you know, automated another sense, you know, cognitive and deliver to our clients some you value and on value compared to what was done up until recently. What >> do you mean by cognitive delivery? Explained that. >> Yeah, so basically in in plain English. So what's right now happening? Usually when you have a large systems computer IT system, which are basically supporting lot of in this is a lot of organizations corporations, right? You know, it's really done like this. So it's people run technology assistant, okay? And you know what Of decisions off course being made by people, But some of the decisions can be, you know, simple decisions. Right? Decisions, which can be automated, can standardize, normalize can be done now by technology, okay and people going to be used for more complex decisions, right? It's basically you're going toe. It turned from people around technology assisted toa technology to technology around people assisted. OK, that's very different. Very proposition, right? So, again, it's not about eliminating jobs, it's very different. It's taken off, you know, routine and automata ble part off the business right to technology and given options and, you know, basically options to choose for more complex decision making to people. That's kind of I would say approach. >> It's about scale and the scale to, of course, IBM. When when Gerstner made the decision, Tio so organized as a services company, IBM came became a global leader, if not the global leader but a services business. Hard to scale. You could scare with bodies, and the bigger it gets, the more complicated it gets, the more expensive it gets. So you saying, If I understand correctly, the IBM is using cognitive and software essentially to scale its services business where possible, assisted by humans. >> So that's exactly the deal. So and this is very different. Very proposition, toe say, compared what was happening recently or earlier? Always. You know other. You know, players. We're not building your shiny and much more powerful and cognitive, you know, empowered mouse trap. No, we're trying to become trusted broker, OK, and how to do that at scale. That's an open, interesting question, but we think that this transition from you know people around technology assisted Teo technology around people assisted. That's the way to go. >> So what does that mean to you? How does that resonate? >> Yeah, you know, I think it brings up a good point actually, you know, if you think of the whole litany of the scope of of analytics, you have everything from kind of describing what happened in the past All that to cognitive. Um, and I think you need to I understand the power of each of those and what they shouldn't should be used for. A lot of people talk. You talk. People talk a lot about predictive analytics, right? And when you hear predictive analytics, that's really where you start doing things that fully automate processes that really enable you to replace decisions that people make right, I think. But those air mohr transactional type decisions, right? More binary type decisions. As you get into things where you can apply binary or I'm sorry, you can apply cognitive. You're moving away from those mohr binary decisions. There's more transactional decisions, and you're moving mohr towards a situation where, yes, the system, the silicon brain right, is giving you some advice on the types of decisions that you should make, based on the amount of information that it could absorb that you can't even fathom absorbing. But they're still needs really some human judgment involved, right? Some some understanding of the contacts outside of what? The computer, Khun Gay. And I think that's really where something like cognitive comes in. And so you talk about, you know, in this in this move to have, you know, computer run, human assisted right. There's a whole lot of descriptive and predictive and even prescriptive analytics that are going on before you get to that cognitive decision but enables the people to make more value added decisions, right? So really enabling the people to truly add value toe. What the data and the analytics have said instead of thinking about it, is replacing people because you're never going to replace you. Never gonna replace people. You know, I think I've heard people at some of these conferences talking about, Well, no cognitive and a I is going to get rid of data scientist. I don't I don't buy that. I think it's really gonna enable data scientist to do more valuable, more incredible things >> than they could do today way. Talked about this a lot to do. I mean, machines, through the course of history, have always replaced human tasks, right, and it's all about you know, what's next for the human and I mean, you know, with physical labor, you know, driving stakes or whatever it is. You know, we've seen that. But now, for the first time ever, you're seeing cognitive, cognitive assisted, you know, functions come into play and it's it's new. It's a new innovation curve. It's not Moore's law anymore. That's driving innovation. It's how we interact with systems and cognitive systems one >> tonight. And I think, you know, I think you hit on a good point there when you said in driving innovation, you know, I've run, you know, large scale, automated process is where the goal was to reduce the number of people involved. And those were like you said, physical task that people are doing we're talking about here is replacing intellectual tasks, right or not replacing but freeing up the intellectual capacity that is going into solving intellectual tasks to enable that capacity to focus on more innovative things, right? We can teach a computer, Teo, explain ah, an area to us or give us some advice on something. I don't know that in the next 10 years, we're gonna be able to teach a computer to innovate, and we can free up the smart minds today that are focusing on How do we make a decision? Two. How do we be more innovative in leveraging this decision and applying this decision? That's a huge win, and it's not about replacing that person. It's about freeing their time up to do more valuable things. >> Yes, sure. So, for example, from my previous experience writing healthcare So physicians, right now you know, basically, it's basically impossible for human individuals, right to keep up with spaced of changes and innovations happening in health care and and by medical areas. Right? So in a few years it looks like there was some numbers that estimate that in three days you're going to, you know, have much more information for several years produced during three days. What was done by several years prior to that point. So it's basically becomes inhuman to keep up with all these innovations, right? Because of that decision is going to be not, you know, optimal decisions. So what we'd like to be doing right toe empower individuals make this decision more, you know, correctly, it was alternatives, right? That's about empowering people. It's not about just taken, which is can be done through this process is all this information and get in the routine stuff out of their plate, which is completely full. >> There was a stat. I think it was last year at IBM Insight. Exact numbers, but it's something like a physician would have to read 1,500 periodic ALS a week just to keep up with the new data innovations. I mean, that's virtually impossible. That something that you're obviously pointing, pointing Watson that, I mean, But there are mundane examples, right? So you go to the airport now, you don't need a person that the agent to give you. Ah, boarding pass. It's on your phone already. You get there. Okay, so that's that's That's a mundane example we're talking about set significantly more complicated things. And so what's The gate is the gate. Creativity is it is an education, you know, because these are step functions in value creation. >> You know, I think that's ah, what? The gate is a question I haven't really thought too much about. You know, when I approach it, you know the thinking Mohr from you know, not so much. What's the gate? But where? Where can this ad the most value um So maybe maybe I have thought about it. And the gate is value, um, and and its value both in terms of, you know, like the physician example where, you know, physicians, looking at images. And I mean, I don't even know what the error rate is when someone evaluates and memory or something. And I probably don't want Oh, right. So, getting some advice there, the value may not be monetary, but to me, it's a lot more than monetary, right. If I'm a patient on DH, there's a lot of examples like that. And other places, you know, that are in various industries. That I think that's that's the gate >> is why the value you just hit on you because you are a heat seeking value missile inside of your organisation. What? So what skill sets do you have? Where did you come from? That you have this capability? Was your experience, your education, your fortitude, >> While the answer's yes, tell all of them. Um, you know, I'm a scientist by training my backgrounds in statistical genetics. Um, and I've kind of worked through the business. I came up through the RND organization with him on Santo over the last. Almost exactly 10 years now, Andi, I've had lots of opportunities to leverage. Um, you know, Data and analytics have changed how the company operates on. I'm lucky because I'm in a company right now. That is extremely science driven, right? Monsanto is a science based company. And so being in a company like that, you don't face to your question about financial industry. I don't think you face the same barriers and Monsanto about using data and analytics in the same way you may in a financial types that you've got company >> within my experience. 50% of diagnosis being proven incorrect. Okay, so 50% 05 0/2 summation. You go to your physician twice. Once you on average, you get in wrong diagnosis. We don't know which one, by the way. Definitely need some someone. Garrett A cz Individuals as humans, we do need some help. Us cognitive, and it goes across different industries. Right, technologist? So if your server is down, you know you shouldn't worry about it because there is like system, you know, Abbas system enough, right? So think about how you can do that scale, and then, you know start imagined future, which going to be very empowering. >> So I used to get a second opinion, and now the opinion comprises thousands, millions, maybe tens of millions of opinions. Is that right? >> It's a try exactly and scale ofthe data accumulation, which you're going to help us to solve. This problem is enormous. So we need to keep up with that scale, you know, and do it properly exactly for business. Very proposition. >> Let's talk about the role of the CDO and where you see that evolving how it relates to the role of the CIA. We've had this conversation frequently, but is I'm wondering if the narratives changing right? Because it was. It's been fuzzy when we first met a couple years ago that that was still a hot topic. When I first started covering this. This this topic, it was really fuzzy. Has it come in two more clarity lately in terms of the role of the CDO versus the CIA over the CTO, its chief digital officer, we starting to see these roles? Are they more than just sort of buzzwords or grey? You know, areas. >> I think there's some clarity happening already. So, for example, there is much more acceptance for cheap date. Office of Chief Analytics Officer Teo, Chief Digital officer. Right, in addition to CEO. So basically station similar to what was with Serious 20 plus years ago and CEO Row in one sentence from my viewpoint would be How you going using leverage in it. Empower your business. Very proposition with CDO is the same was data how using data leverage and data, your date and your client's data. You, Khun, bring new value to your clients and businesses. That's kind ofthe I would say differential >> last word, you know, And you think you know I'm not a CDO. But if you think about the concept of establishing a role like that, I think I think the name is great because that what it demonstrates is support from leadership, that this is important. And I think even if you don't have the name in the organization like it, like in Monsanto, you know, we still have that executive management level support to the data and analytics, our first class citizens and their important, and we're going to run our business that way. I think that's really what's important is are you able to build the culture that enable you to leverage the maximum capability Data and analytics. That's really what matters. >> All right, We'll leave it there. Seth Gene, thank you very much for coming that you really appreciate your time. Thank you. Alright. Keep it right there, Buddy Stew and I'll be back. This is the IBM Chief Data Officer Summit. We're live from Boston right back.

Published Date : Oct 4 2016

SUMMARY :

IBM Chief Data Officer Strategy Summit brought to you by IBM. Good to see you guys again. be participating for a couple of a year and 1/2 or so, Um, and you know, Yes, kind of a relatively new Roland topic, one that's evolved, approaches, you know, and what IBM can do for clients go across the different industries, So Monsanto obviously guys do a lot of stuff in the physical world. the cultural shifts associated with being more digital, which is that whole kind like you start out this So it's really leveraging the re use factor of the whole digital concept. And, you know, do you have a CDO I think, you know, in terms of from the regular, what can we learn from, you know, there there are. How much of it is, you know, being part of the data that your your customers And the BM is, you know, tradition. Um, we haven't really gotten into what, you know, what? And in parallel, uh, you have to partner with line of business. because again, there's things we can do in concert, you know, one after another, do you mean by cognitive delivery? and given options and, you know, basically options to choose for more complex decision So you saying, If I understand correctly, the IBM is using cognitive and software That's an open, interesting question, but we think that this transition from you know people you know, in this in this move to have, you know, computer run, know, what's next for the human and I mean, you know, with physical labor, And I think, you know, I think you hit on a good point there when you said in driving innovation, decision is going to be not, you know, optimal decisions. So you go to the airport now, you don't need a person that the agent to give you. of, you know, like the physician example where, you know, physicians, is why the value you just hit on you because you are a heat seeking value missile inside of your organisation. I don't think you face the same barriers and Monsanto about using data and analytics in the same way you may So think about how you can do that scale, So I used to get a second opinion, and now the opinion comprises thousands, So we need to keep up with that scale, you know, Let's talk about the role of the CDO and where you So basically station similar to what was with Serious And I think even if you don't have the name in the organization like it, like in Monsanto, Seth Gene, thank you very much for coming that you really appreciate your time.

ENTITIES

Entity	Category	Confidence
Monsanto	ORGANIZATION	0.99+
Gina	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Seth Dobrin	PERSON	0.99+
Seth	PERSON	0.99+
Jean Kolker	PERSON	0.99+
CIA	ORGANIZATION	0.99+
Gene Kolker	PERSON	0.99+
thousands	QUANTITY	0.99+
Boston	LOCATION	0.99+
50%	QUANTITY	0.99+
Jean	PERSON	0.99+
three days	QUANTITY	0.99+
Seth Gene	PERSON	0.99+
Stillman	PERSON	0.99+
Boston, Massachusetts	LOCATION	0.99+
Teo	PERSON	0.99+
Andi	PERSON	0.99+
Khun Gay	PERSON	0.99+
last year	DATE	0.99+
D. Mohr	PERSON	0.99+
today	DATE	0.99+
second opinion	QUANTITY	0.99+
one sentence	QUANTITY	0.99+
Nana Mayes	PERSON	0.99+
Buddy Stew	PERSON	0.99+
tonight	DATE	0.99+
twice	QUANTITY	0.99+
both	QUANTITY	0.98+
100 plus days	QUANTITY	0.98+
IBM Insight	ORGANIZATION	0.98+
first	QUANTITY	0.98+
Cuba	LOCATION	0.98+
Gene	PERSON	0.97+
tens of millions	QUANTITY	0.97+
each	QUANTITY	0.97+
Monte Santo	ORGANIZATION	0.97+
English	OTHER	0.97+
Moore	PERSON	0.96+
Khun	PERSON	0.96+
first time	QUANTITY	0.96+
Global Technology Services	ORGANIZATION	0.96+
10 years	QUANTITY	0.96+
Watson	PERSON	0.95+
RND	ORGANIZATION	0.95+
Gerstner	PERSON	0.95+
CDO	EVENT	0.95+
millions	QUANTITY	0.95+
Maur R	PERSON	0.94+
first approach	QUANTITY	0.94+
IBM Chief Data Officer Summit	EVENT	0.93+
two	QUANTITY	0.93+
Global Services	ORGANIZATION	0.93+
20 plus years ago	DATE	0.92+
Santo	ORGANIZATION	0.92+
1,000,000,000	QUANTITY	0.9+
50	QUANTITY	0.88+
Serious	ORGANIZATION	0.88+
30 plus	QUANTITY	0.87+
Cube	ORGANIZATION	0.86+
DEA	ORGANIZATION	0.85+
1/2	QUANTITY	0.85+
Inderpal	PERSON	0.84+
1,500 periodic ALS a week	QUANTITY	0.84+
Garrett A	PERSON	0.84+
next 10 years	DATE	0.84+
#IBMCDO	EVENT	0.84+

Influencer Panel | IBM CDO Summit 2019

>> Live from San Francisco, California, it's theCUBE covering the IBM Chief Data Officers Summit, brought to you by IBM. >> Welcome back to San Francisco everybody. I'm Dave Vellante and you're watching theCUBE, the leader in live tech coverage. This is the end of the day panel at the IBM Chief Data Officer Summit. This is the 10th CDO event that IBM has held and we love to to gather these panels. This is a data all-star panel and I've recruited Seth Dobrin who is the CDO of the analytics group at IBM. Seth, thank you for agreeing to chip in and be my co-host in this segment. >> Yeah, thanks Dave. Like I said before we started, I don't know if this is a promotion or a demotion. (Dave laughing) >> We'll let you know after the segment. So, the data all-star panel and the data all-star awards that you guys are giving out a little later in the event here, what's that all about? >> Yeah so this is our 10th CDU Summit. So two a year, so we've been doing this for 5 years. The data all-stars are those people that have been to four at least of the ten. And so these are five of the 16 people that got the award. And so thank you all for participating and I attended these like I said earlier, before I joined IBM they were immensely valuable to me and I was glad to see 16 other people that think it's valuable too. >> That is awesome. Thank you guys for coming on. So, here's the format. I'm going to introduce each of you individually and then ask you to talk about your role in your organization. What role you play, how you're using data, however you want to frame that. And the first question I want to ask is, what's a good day in the life of a data person? Or if you want to answer what's a bad day, that's fine too, you choose. So let's start with Lucia Mendoza-Ronquillo. Welcome, she's the Senior Vice President and the Head of BI and Data Governance at Wells Fargo. You told us that you work within the line of business group, right? So introduce your role and what's a good day for a data person? >> Okay, so my role basically is again business intelligence so I support what's called cards and retail services within Wells Fargo. And I also am responsible for data governance within the business. We roll up into what's called a data governance enterprise. So we comply with all the enterprise policies and my role is to make sure our line of business complies with data governance policies for enterprise. >> Okay, good day? What's a good day for you? >> A good day for me is really when I don't get a call that the regulators are knocking on our doors. (group laughs) Asking for additional reports or have questions on the data and so that would be a good day. >> Yeah, especially in your business. Okay, great. Parag Shrivastava is the Director of Data Architecture at McKesson, welcome. Thanks so much for coming on. So we got a healthcare, couple of healthcare examples here. But, Parag, introduce yourself, your role, and then what's a good day or if you want to choose a bad day, be fun the mix that up. >> Yeah, sounds good. Yeah, so mainly I'm responsible for the leader strategy and architecture at McKesson. What that means is McKesson has a lot of data around the pharmaceutical supply chain, around one-third of the world's pharmaceutical supply chain, clinical data, also around pharmacy automation data, and we want to leverage it for the better engagement of the patients and better engagement of our customers. And my team, which includes the data product owners, and data architects, we are all responsible for looking at the data holistically and creating the data foundation layer. So I lead the team across North America. So that's my current role. And going back to the question around what's a good day, I think I would say the good day, I'll start at the good day. Is really looking at when the data improves the business. And the first thing that comes to my mind is sort of like an example, of McKesson did an acquisition of an eight billion dollar pharmaceutical company in Europe and we were creating the synergy solution which was based around the analytics and data. And actually IBM was one of the partners in implementing that solution. When the solution got really implemented, I mean that was a big deal for me to see that all the effort that we did in plumbing the data, making sure doing some analytics, is really helping improve the business. I think that is really a good day I would say. I mean I wouldn't say a bad day is such, there are challenges, constant challenges, but I think one of the top priorities that we are having right now is to deal with the demand. As we look at the demand around the data, the role of data has got multiple facets to it now. For example, some of the very foundational, evidentiary, and compliance type of needs as you just talked about and then also profitability and the cost avoidance and those kind of aspects. So how to balance between that demand is the other aspect. >> All right good. And we'll get into a lot of that. So Carl Gold is the Chief Data Scientist at Zuora. Carl, tell us a little bit about Zuora. People might not be as familiar with how you guys do software for billing et cetera. Tell us about your role and what's a good day for a data scientist? >> Okay, sure, I'll start by a little bit about Zuora. Zuora is a subscription management platform. So any company who wants to offer a product or service as subscription and you don't want to build your billing and subscription management, revenue recognition, from scratch, you can use a product like ours. I say it lets anyone build a telco with a complicated plan, with tiers and stuff like that. I don't know if that's a good thing or not. You guys'll have to make up your own mind. My role is an interesting one. It's split, so I said I'm a chief data scientist and we work about 50% on product features based on data science. Things like churn prediction, or predictive payment retries are product areas where we offer AI-based solutions. And then but because Zuora is a subscription platform, we have an amazing set of data on the actual performance of companies using our product. So a really interesting part of my role has been leading what we call the subscription economy index and subscription economy benchmarks which are reports around best practices for subscription companies. And it's all based off this amazing dataset created from an anonymized data of our customers. So that's a really exciting part of my role. And for me, maybe this speaks to our level of data governance, I might be able to get some tips from some of my co-panelists, but for me a good day is when all the data for me and everyone on my team is where we left it the night before. And no schema changes, no data, you know records that you were depending on finding removed >> Pipeline failures. >> Yeah pipeline failures. And on a bad day is a schema change, some crucial data just went missing and someone on my team is like, "The code's broken." >> And everybody's stressed >> Yeah, so those are bad days. But, data governance issues maybe. >> Great, okay thank you. Jung Park is the COO of Latitude Food Allergy Care. Jung welcome. >> Yeah hi, thanks for having me and the rest of us here. So, I guess my role I like to put it as I'm really the support team. I'm part of the support team really for the medical practice so, Latitude Food Allergy Care is a specialty practice that treats patients with food allergies. So, I don't know if any of you guys have food allergies or maybe have friends, kids, who have food allergies, but, food allergies unfortunately have become a lot more prevalent. And what we've been able to do is take research and data really from clinical trials and other research institutions and really use that from the clinical trial setting, back to the clinical care model so that we can now treat patients who have food allergies by using a process called oral immunotherapy. It's fascinating and this is really personal to me because my son as food allergies and he's been to the ER four times. >> Wow. >> And one of the scariest events was when he went to an ER out of the country and as a parent, you know you prepare your child right? With the food, he takes the food. He was 13 years old and you had the chaperones, everyone all set up, but you get this call because accidentally he ate some peanut, right. And so I saw this unfold and it scared me so much that this is something I believe we just have to get people treated. So this process allows people to really eat a little bit of the food at a time and then you eat the food at the clinic and then you go home and eat it. Then you come back two weeks later and then you eat a little bit more until your body desensitizes. >> So you build up that immunity >> Exactly. >> and then you watch the data obviously. >> Yeah. So what's a good day for me? When our patients are done for the day and they have a smile on their face because they were able to progress to that next level. >> Now do you have a chief data officer or are you the de facto CFO? >> I'm the de facto. So, my career has been pretty varied. So I've been essentially chief data officer, CIO, at companies small and big. And what's unique about I guess in this role is that I'm able to really think about the data holistically through every component of the practice. So I like to think of it as a patient journey and I'm sure you guys all think of it similarly when you talk about your customers, but from a patient's perspective, before they even come in, you have to make sure the data behind the science of whatever you're treating is proper, right? Once that's there, then you have to have the acquisition part. How do you actually work with the community to make sure people are aware of really the services that you're providing? And when they're with you, how do you engage them? How do you make sure that they are compliant with the process? So in healthcare especially, oftentimes patients don't actually succeed all the way through because they don't continue all the way through. So it's that compliance. And then finally, it's really long-term care. And when you get the long-term care, you know that the patient that you've treated is able to really continue on six months, a year from now, and be able to eat the food. >> Great, thank you for that description. Awesome mission. Rolland Ho is the Vice President of Data and Analytics at Clover Health. Tell us a little bit about Clover Health and then your role. >> Yeah, sure. So Clover is a startup Medicare Advantage plan. So we provide Medicare, private Medicare to seniors. And what we do is we're because of the way we run our health plan, we're able to really lower a lot of the copay costs and protect seniors against out of pocket. If you're on regular Medicare, you get cancer, you have some horrible accident, your out of pocket is infinite potentially. Whereas with Medicare Advantage Plan it's limited to like five, $6,000 and you're always protected. One of the things I'm excited about being at Clover is our ability to really look at how can we bring the value of data analytics to healthcare? Something I've been in this industry for close to 20 years at this point and there's a lot of waste in healthcare. And there's also a lot of very poor application of preventive measures to the right populations. So one of the things that I'm excited about is that with today's models, if you're able to better identify with precision, the right patients to intervene with, then you fundamentally transform the economics of what can be done. Like if you had to pa $1,000 to intervene, but you were only 20% of the chance right, that's very expensive for each success. But, now if your model is 60, 70% right, then now it opens up a whole new world of what you can do. And that's what excites me. In terms of my best day? I'll give you two different angles. One as an MBA, one of my best days was, client calls me up, says, "Hey Rolland, you know, "your analytics brought us over $100 million "in new revenue last year." and I was like, cha-ching! Excellent! >> Which is my half? >> Yeah right. And then on the data geek side the best day was really, run a model, you train a model, you get ridiculous AUC score, so area under the curve, and then you expect that to just disintegrate as you go into validation testing and actual live production. But the 98 AUC score held up through production. And it's like holy cow, the model actually works! And literally we could cut out half of the workload because of how good that model was. >> Great, excellent, thank you. Seth, anything you'd add to the good day, bad day, as a CDO? >> So for me, well as a CDO or as CDO at IBM? 'Cause at IBM I spend most of my time traveling. So a good day is a day I'm home. >> Yeah, when you're not in an (group laughing) aluminum tube. >> Yeah. Hurdling through space (laughs). No, but a good day is when a GDPR compliance just happened, a good day for me was May 20th of last year when IBM was done and we were, or as done as we needed to be for GDPR so that was a good day for me last year. This year is really a good day is when we start implementing some new models to help IBM become a more effective company and increase our bottom line or increase our margins. >> Great, all right so I got a lot of questions as you know and so I want to give you a chance to jump in. >> All right. >> But, I can get it started or have you got something? >> I'll go ahead and get started. So this is a the 10th CDO Summit. So five years. I know personally I've had three jobs at two different companies. So over the course of the last five years, how many jobs, how many companies? Lucia? >> One job with one company. >> Oh my gosh you're boring. (group laughing) >> No, but actually, because I support basically the head of the business, we go into various areas. So, we're not just from an analytics perspective and business intelligence perspective and of course data governance, right? It's been a real journey. I mean there's a lot of work to be done. A lot of work has been accomplished and constantly improving the business, which is the first goal, right? Increasing market share through insights and business intelligence, tracking product performance to really helping us respond to regulators (laughs). So it's a variety of areas I've had to be involved in. >> So one company, 50 jobs. >> Exactly. So right now I wear different hats depending on the day. So that's really what's happening. >> So it's a good question, have you guys been jumping around? Sure, I mean I think of same company, one company, but two jobs. And I think those two jobs have two different layers. When I started at McKesson I was a solution leader or solution director for business intelligence and I think that's how I started. And over the five years I've seen the complete shift towards machine learning and my new role is actually focused around machine learning and AI. That's why we created this layer, so our own data product owners who understand the data science side of things and the ongoing and business architecture. So, same company but has seen a very different shift of data over the last five years. >> Anybody else? >> Sure, I'll say two companies. I'm going on four years at Zuora. I was at a different company for a year before that, although it was kind of the same job, first at the first company, and then at Zuora I was really focused on subscriber analytics and churn for my first couple a years. And then actually I kind of got a new job at Zuora by becoming the subscription economy expert. I become like an economist, even though I don't honestly have a background. My PhD's in biology, but now I'm a subscription economy guru. And a book author, I'm writing a book about my experiences in the area. >> Awesome. That's great. >> All right, I'll give a bit of a riddle. Four, how do you have four jobs, five companies? >> In five years. >> In five years. (group laughing) >> Through a series of acquisition, acquisition, acquisition, acquisition. Exactly, so yeah, I have to really, really count on that one (laughs). >> I've been with three companies over the past five years and I would say I've had seven jobs. But what's interesting is I think it kind of mirrors and kind of mimics what's been going on in the data world. So I started my career in data analytics and business intelligence. But then along with that I had the fortune to work with the IT team. So the IT came under me. And then after that, the opportunity came about in which I was presented to work with compliance. So I became a compliance officer. So in healthcare, it's very interesting because these things are tied together. When you look about the data, and then the IT, and then the regulations as it relates to healthcare, you have to have the proper compliance, both internal compliance, as well as external regulatory compliance. And then from there I became CIO and then ultimately the chief operating officer. But what's interesting is as I go through this it's all still the same common themes. It's how do you use the data? And if anything it just gets to a level in which you become closer with the business and that is the most important part. If you stand alone as a data scientist, or a data analyst, or the data officer, and you don't incorporate the business, you alienate the folks. There's a math I like to do. It's different from your basic math, right? I believe one plus one is equal to three because when you get the data and the business together, you create that synergy and then that's where the value is created. >> Yeah, I mean if you think about it, data's the only commodity that increases value when you use it correctly. >> Yeah. >> Yeah so then that kind of leads to a question that I had. There's this mantra, the more data the better. Or is it more of an Einstein derivative? Collect as much data as possible but not too much. What are your thoughts? Is more data better? >> I'll take it. So, I would say the curve has shifted over the years. Before it used to be data was the bottleneck. But now especially over the last five to 10 years, I feel like data is no longer oftentimes the bottleneck as much as the use case. The definition of what exactly we're going to apply to, how we're going to apply it to. Oftentimes once you have that clear, you can go get the data. And then in the case where there is not data, like in Mechanical Turk, you can all set up experiments, gather data, the cost of that is now so cheap to experiment that I think the bottleneck's really around the business understanding the use case. >> Mm-hmm. >> Mm-hmm. >> And I think the wave that we are seeing, I'm seeing this as there are, in some cases, more data is good, in some cases more data is not good. And I think I'll start it where it is not good. I think where quality is more required is the area where more data is not good. For example like regulation and compliance. So for example in McKesson's case, we have to report on opioid compliance for different states. How much opioid drugs we are giving to states and making sure we have very, very tight reporting and compliance regulations. There, highest quality of data is important. In our data organization, we have very, very dedicated focus around maintaining that quality. So, quality is most important, quantity is not if you will, in that case. Having the right data. Now on the other side of things, where we are doing some kind of exploratory analysis. Like what could be a right category management for our stores? Or where the product pricing could be the right ones. Product has around 140 attributes. We would like to look at all of them and see what patterns are we finding in our models. So there you could say more data is good. >> Well you could definitely see a lot of cases. But certainly in financial services and a lot of healthcare, particularly in pharmaceutical where you don't want work in process hanging around. >> Yeah. >> Some lawyer could find a smoking gun and say, "Ooh see." And then if that data doesn't get deleted. So, let's see, I would imagine it's a challenge in your business, I've heard people say, "Oh keep all the, now we can keep all the data, "it's so inexpensive to store." But that's not necessarily such a good thing is it? >> Well, we're required to store data. >> For N number of years, right? >> Yeah, N number of years. But, sometimes they go beyond those number of years when there's a legal requirements to comply or to answer questions. So we do keep more than, >> Like a legal hold for example. >> Yeah. So we keep more than seven years for example and seven years is the regulatory requirement. But in the case of more data, I'm a data junkie, so I like more data (laughs). Whenever I'm asked, "Is the data available?" I always say, "Give me time I'll find it for you." so that's really how we operate because again, we're the go-to team, we need to be able to respond to regulators to the business and make sure we understand the data. So that's the other key. I mean more data, but make sure you understand what that means. >> But has that perspective changed? Maybe go back 10 years, maybe 15 years ago, when you didn't have the tooling to be able to say, "Give me more data." "I'll get you the answer." Maybe, "Give me more data." "I'll get you the answer in three years." Whereas today, you're able to, >> I'm going to go get it off the backup tapes (laughs). >> (laughs) Yeah, right, exactly. (group laughing) >> That's fortunately for us, Wells Fargo has implemented data warehouse for so many number of years, I think more than 10 years. So we do have that capability. There's certainly a lot of platforms you have to navigate through, but if you are able to navigate, you can get to the data >> Yeah. >> within the required timeline. So I have, astonished you have the technology, team behind you. Jung, you want to add something? >> Yeah, so that's an interesting question. So, clearly in healthcare, there is a lot of data and as I've kind of come closer to the business, I also realize that there's a fine line between collecting the data and actually asking our folks, our clinicians, to generate the data. Because if you are focused only on generating data, the electronic medical records systems for example. There's burnout, you don't want the clinicians to be working to make sure you capture every element because if you do so, yes on the back end you have all kinds of great data, but on the other side, on the business side, it may not be necessarily a productive thing. And so we have to make a fine line judgment as to the data that's generated and who's generating that data and then ultimately how you end up using it. >> And I think there's a bit of a paradox here too, right? The geneticist in me says, "Don't ever throw anything away." >> Right. >> Right? I want to keep everything. But, the most interesting insights often come from small data which are a subset of that larger, keep everything inclination that we as data geeks have. I think also, as we're moving in to kind of the next phase of AI when you can start doing really, really doing things like transfer learning. That small data becomes even more valuable because you can take a model trained on one thing or a different domain and move it over to yours to have a starting point where you don't need as much data to get the insight. So, I think in my perspective, the answer is yes. >> Yeah (laughs). >> Okay, go. >> I'll go with that just to run with that question. I think it's a little bit of both 'cause people touched on different definitions of more data. In general, more observations can never hurt you. But, more features, or more types of things associated with those observations actually can if you bring in irrelevant stuff. So going back to Rolland's answer, the first thing that's good is like a good mental model. My PhD is actually in physical science, so I think about physical science, where you actually have a theory of how the thing works and you collect data around that theory. I think the approach of just, oh let's put in 2,000 features and see what sticks, you know you're leaving yourself open to all kinds of problems. >> That's why data science is not democratized, >> Yeah (laughing). >> because (laughing). >> Right, but first Carl, in your world, you don't have to guess anymore right, 'cause you have real data. >> Well yeah, of course, we have real data, but the collection, I mean for example, I've worked on a lot of customer churn problems. It's very easy to predict customer churn if you capture data that pertains to the value customers are receiving. If you don't capture that data, then you'll never predict churn by counting how many times they login or more crude measures of engagement. >> Right. >> All right guys, we got to go. The keynotes are spilling out. Seth thank you so much. >> That's it? >> Folks, thank you. I know, I'd love to carry on, right? >> Yeah. >> It goes fast. >> Great. >> Yeah. >> Guys, great, great content. >> Yeah, thanks. And congratulations on participating and being data all-stars. >> We'd love to do this again sometime. All right and thank you for watching everybody, it's a wrap from IBM CDOs, Dave Vellante from theCUBE. We'll see you next time. (light music)

Published Date : Jun 25 2019

SUMMARY :

brought to you by IBM. This is the end of the day panel Like I said before we started, I don't know if this is that you guys are giving out a little later And so thank you all for participating and then ask you to talk and my role is to make sure our line of business complies a call that the regulators are knocking on our doors. and then what's a good day or if you want to choose a bad day, And the first thing that comes to my mind So Carl Gold is the Chief Data Scientist at Zuora. as subscription and you don't want to build your billing and someone on my team is like, "The code's broken." Yeah, so those are bad days. Jung Park is the COO of Latitude Food Allergy Care. So, I don't know if any of you guys have food allergies of the food at a time and then you eat the food and then you When our patients are done for the day and I'm sure you guys all think of it similarly Great, thank you for that description. the right patients to intervene with, and then you expect that to just disintegrate Great, excellent, thank you. So a good day is a day I'm home. Yeah, when you're not in an (group laughing) for GDPR so that was a good day for me last year. and so I want to give you a chance to jump in. So over the course of the last five years, Oh my gosh you're boring. and constantly improving the business, So that's really what's happening. and the ongoing and business architecture. in the area. That's great. Four, how do you have four jobs, five companies? In five years. really count on that one (laughs). and you don't incorporate the business, Yeah, I mean if you think about it, Or is it more of an Einstein derivative? But now especially over the last five to 10 years, So there you could say more data is good. particularly in pharmaceutical where you don't want "it's so inexpensive to store." So we do keep more than, Like a legal hold So that's the other key. when you didn't have the tooling to be able to say, (laughs) Yeah, right, exactly. but if you are able to navigate, you can get to the data astonished you have the technology, and then ultimately how you end up using it. And I think there's a bit of a paradox here too, right? to have a starting point where you don't need as much data and you collect data around that theory. you don't have to guess anymore right, if you capture data that pertains Seth thank you so much. I know, I'd love to carry on, right? and being data all-stars. All right and thank you for watching everybody,

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Europe	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
McKesson	ORGANIZATION	0.99+
Wells Fargo	ORGANIZATION	0.99+
May 20th	DATE	0.99+
five companies	QUANTITY	0.99+
Zuora	ORGANIZATION	0.99+
two jobs	QUANTITY	0.99+
seven jobs	QUANTITY	0.99+
$1,000	QUANTITY	0.99+
50 jobs	QUANTITY	0.99+
three companies	QUANTITY	0.99+
last year	DATE	0.99+
Seth	PERSON	0.99+
Dave	PERSON	0.99+
Clover	ORGANIZATION	0.99+
Lucia Mendoza-Ronquillo	PERSON	0.99+
seven years	QUANTITY	0.99+
five	QUANTITY	0.99+
two companies	QUANTITY	0.99+
Clover Health	ORGANIZATION	0.99+
four years	QUANTITY	0.99+
Parag Shrivastava	PERSON	0.99+
San Francisco	LOCATION	0.99+
five years	QUANTITY	0.99+
Rolland Ho	PERSON	0.99+
$6,000	QUANTITY	0.99+
Lucia	PERSON	0.99+
eight billion dollar	QUANTITY	0.99+
5 years	QUANTITY	0.99+
Carl	PERSON	0.99+
more than seven years	QUANTITY	0.99+
one company	QUANTITY	0.99+
San Francisco, California	LOCATION	0.99+
today	DATE	0.99+
North America	LOCATION	0.99+
One	QUANTITY	0.99+
Four	QUANTITY	0.99+
Jung	PERSON	0.99+
three jobs	QUANTITY	0.99+
Latitude Food Allergy Care	ORGANIZATION	0.99+
One job	QUANTITY	0.99+
2,000 features	QUANTITY	0.99+
Carl Gold	PERSON	0.99+
four jobs	QUANTITY	0.99+
over $100 million	QUANTITY	0.99+
first	QUANTITY	0.99+
both	QUANTITY	0.99+
one	QUANTITY	0.99+
Einstein	PERSON	0.99+
first question	QUANTITY	0.99+
16 people	QUANTITY	0.99+
three	QUANTITY	0.99+
first goal	QUANTITY	0.99+
Parag	PERSON	0.99+
IBM Chief Data Officers Summit	EVENT	0.99+
Rolland	PERSON	0.99+
six months	QUANTITY	0.98+
15 years ago	DATE	0.98+
Jung Park	PERSON	0.98+

John Thomas, IBM | IBM CDO Fall Summit 2018

>> Live from Boston, it's theCUBE, covering IBM Chief Data Officer Summit, brought to you by IBM. >> Welcome back everyone to theCUBE's live coverage of the IBM CDO Summit here in Boston, Massachusetts. I'm your host Rebecca Knight*, and I'm joined by cohost, Paul Gillan*. We have a guest today, John Thomas. He is the Distinguished Engineer and Director* at IBM. Thank you so much for coming, returning to theCUBE. You're a CUBE veteran, CUBE alum. >> Oh thank you Rebecca, thank you for having me on this. >> So tell our viewers a little bit about, you're a distinguished engineer. There are only 672 in all of IBM. What do you do? What is your role? >> Well that's a good question. Distinguished Engineer is kind of a technical executive role, which is a combination of applying the technology skills, as well as helping shape IBM strategy in a technical way, working with clients, et cetera. So it is a bit of a jack of all trades, but also deep skills in some specific areas, and I love what I do (laughs lightly). So, I get to work with some very talented people, brilliant people, in terms of shaping IBM technology and strategy. Product strategy, that is part of it. We also work very closely with clients, in terms of how to apply that technology in the context of the client's use status. >> We've heard a lot today about soft skills, the importance of organizational people skills to being a successful Chief Data Officer, but there's still a technical component. How important is the technical side? What is, what are the technical skills that the CDOs need? >> Well, this is a very good question Paul. So, absolutely, so, navigating the organizational structure is important. It's a soft skill. You are absolutely right. And being able to understand the business strategy for the company, and then aligning your data strategy to the business strategy is important, right? But the underlying technical pieces need to be solid. So for example, how do you deal with large volumes of different types of data spread across a company? How do you manage that data? How do you understand the data? How do you govern that data? How do you then master leveraging the value of that data in the context of your business, right? So an understanding, a deep understanding of the technology of collecting, organizing, and analyzing that data is needed for you to be a successful CDO. >> So in terms of, in terms of those skillsets that you're looking for, and one of the things that Inderpal said earlier in his keynote, is that, there are just, it's a rare individual who truly understands the idea of how to collect, store, analyze, curatize, monetize the data, and then also have the soft skills of being able to navigate the organization, being able to be a change agent who is inspiring, inspiring the rank and file. How do you recruit and retain talent? I mean, this seems to be a major challenge. >> Expertise is, and getting the right expertise in place, and Inderpal talked about it in his keynote, which was the very first thing he did was bring in talent. Sometimes it is from outside of your company. Maybe you have a kind of talent that has grown up in your company. Maybe you have to go outside, but you've got to bring in the right skills together. Form the team that understands the technology, and the business side of things, and build this team, and that is essential for you to be a successful CDO. And to some extent, that's what Inderpal has done. That's what the analytic CDO's office has done. Seth Dobrin, my boss, is the analytics CDO , and he and the analytics CDO team actually hired people with different skills. Data engineering skills, data science skills, visualization skills, and then put this team together which understands the, how to collect, govern, curate, and analyze the data, and then apply them in specific situations. >> There's been a lot of talk about AI, at this conference, which seems to be finally happening. What do you see in the field, or perhaps projects that you've worked on, of examples of AI that are really having a meaningful business impact? >> Yeah Paul, that is a very good question because, you know, the term AI is overused a lot as you can imagine, a lot of hype around it. But I think we are past that hype cycle, and people are looking at, how do I implement successful use cases? And I stress the word use case, right? In my experience these, how I'm going to transform my business in one big boil the ocean exercise, does not work. But if you have a very specific bounded use case that you can identify, the business tells you this is relevant. The business tells you what the metrics for success are. And then you focus your attention, your efforts on that specific use case with the skills needed for that use case, then it's successful. So, you know, examples of use cases from across the industries, right? I mean everything that you can think of. Customer-facing examples, like, how do I read the customer's mind? So when, if I'm a business and I interact with my customers, can I anticipate what the customer is looking for, maybe for a cross-sell opportunity, or maybe to reduce the call handing time when a customer calls into my call center. Or trying to segment my customers so I can do a proper promotion, or a campaign for that customer. All of these are specific customer phasing examples. There also are examples of applying this internally to improve precesses, capacity planning for your infrastructure, can I predict when a system is likely to have an outage, or can I predict the traffic coming into my systems, into my infrastructure and provision capacity for that on demand, So all of these are interesting applications of AI in the enterprise. >> So when your trying, what are the things we keep hearing, is that we need to data to tell a story To, the data needs to be compelling enough so that the people, the data scientist get it but then also the other kinds of business decision makers get it to. >> Yep >> So, what are sort of, the best practices that have emerged from your experience? In terms of, being able to, for your data to tell a story that you want it to tell. >> Yeah, well I mean if the pattern doesn't exist in the data then no amount of fancy algorithms can help, you know? and sometimes its like searching for a needle in a haystack but assuming, I guess the first step is, like I said, What is the use case? Once you have a clear understanding of your use case and such metrics for your use case, do you have the data to support that use case? So for example if it's fraud detection, do you actually have the historical data to support the fraud use case? Sometimes you may have transactional data from your, transocular from your core enterprise systems but that may not be enough. You may need to alt mend it with external data, third party data, maybe unstructured data, that goes along with your transaction data. So the question is, can you identify the data that is needed to support the use case and if so can I, is that data clean, is that data, do you understand the lineage of the data, who has touched and modified the data, who owns the data. So then I can start building predictive models and machine learning, deep learning models with that data. So use case, do you have the data to support the use case? Do you understand how that sata reached you? Then comes the process of applying machine learning algorithms and deep learning algorithms against that data. >> What are the risks of machine learning and particularly deep learning, I think because it becomes kind of a black box and people can fall into the trap of just believing what comes back, regardless of whether the algorithms are really sound or the data is. What is the responsibility of data scientist to sort of show their work? >> Yeah, Paul this is fascinating and not completely solid area, right? So, bias detection, can I explain how my model behaved, can I ensure that the models are fair in their predictions. So there is a lot of research, a lot of innovation happening in the space. IBM is investing a lot into space. We call trust and transparency, being able to explain a model, it's got multiple levels to it. You need some level of AI governments itself, just like we talked about data governments that is the notion of AI governments. Which is what motion of the model was used to make a prediction? What were the imports that went into that model? What were the decisions that were, that were the features that were used to make a sudden prediction? What was the prediction? And how did that match up with ground truth. You need to be able to capture all that information but beyond that, we have got actual mechanisms in place that IBM Research is developing to look at bias detection. So pre processing during execution post processing, can I look for bias in how my models behave and do I have mechanisms to mitigate that? So one example is the open source Python library, called AIF360 that comes from IBM Research and has contributed to the open source community. You can look at, there are mechanisms to look at bias and provide some level of bias mitigation as part of your model building exercises. >> And the bias mitigation, does it have to do with, and I'm going to use an IMB term of art here, the human in the loop, is it how much are you actually looking at the humans that are part of this process >> Yeah, humans are at least at this point in time, humans are very much in the loop. This notion of Peoria high where humans are completely outside the loop is, we're not there yet so very much something that the system can for awhile set off recommendations, can provide a set of explanations and can someone who understands the business look at it and make a corrective, take corrective actions. >> There has been, however to Rebecca's point, some prominent people including Bill Gates, who have speculated that the AI could ultimately be a negative for humans. What is the responsibility of company's like IBM to ensure that humans are kept in the loop? >> I think at least at this point IBM's view is humans are an essential part of AI. In fact, we don't even use artificial intelligence that much we call it augmented intelligence. Where the system is pro sending a set of recommendations, expert advise to the human who can then make a decision. For example, you know my team worked with a prominent health care provider on you know, models for predicting patient death in the case of sepsis, sepsis-onset. This is, we are talking literally life and death decisions being made and this is not something you can just automate and throw into a magic black box, and have a decision be made. So this is absolutely a place where people with deep, domain knowledge are supported, are opt mended with, with AI to make better decisions, that's where I think we are today. As to what will happen five years from now, I can't predict that yet. >> Well I actually want to- >> But the question >> bring this up to both of you, the role, so you are helping doctor's make these decisions, not just this is what the computer program says about this patient's symptoms here but this is really, so you are helping the doctor make better decisions. What about the doctors gut, in the, his or her intuition to. I mean, what is the role of that, in the future? >> I think it goes away, I mean I think, the intuition really will be trumped by data in the long term because you can't argue with the facts. Some people do these days. (soft laughter) But I don't remember (everyone laughing) >> We have take break there for some laughter >> Intrested in your perspective onthat is there, will there, should there always be a human on the front line, who is being supported by the back end or would you see a scenario were an AI is making decisions, customer facing decisions that are, really are life and death decisions? >> So I think in the consumer invest way, I can definitely see AI making decisions on it's own. So you know if lets say a recommender system would say as you know I think, you know John Thomas, bought these last five things online. He's likely to buy this other thing, let's make an offer to him. You know, I don't need another human in the loop for that >> No harm right? >> Right. >> It's pretty straight forward, it's already happening, in a big way but when it comes to some of these >> Prepoping a mortgage, how about that one? >> Yeah >> Where bias creeps in a lot. >> But that's one big decision. >> Even that I think can be automated, can be automated if the threshold is set to be what the business is comfortable with, were it says okay, above this probity level, I don't really need a human to look at this. But, and if it is below this level, I do want someone to look at this. That's you know, that is relatively straight forward, right? But if it is a decision about you know life or death situation or something that effects the very fabric of the business that you are in, then you probably want a domain explore to look at it. In most enterprises, enterprises cases will fall, lean toward that category. >> These are big questions. These are hard questions. >> These are hard questions, yes. >> Well John, thank you so much for doing >> Oh absolutely, thank you >> On theCUBE, we really had a great time with you. >> No thank you for having me. >> I'm Rebecca Knight for Paul Gillan, we will have more from theCUBE's live coverage of IBM CDO, here in Boston, just after this. (Upbeat Music)

Published Date : Nov 15 2018

SUMMARY :

brought to you by IBM. of the IBM CDO Summit here in Boston, Massachusetts. What do you do? in the context of the client's use status. How important is the technical side? in the context of your business, right? and one of the things that Inderpal said and that is essential for you to be a successful CDO. What do you see in the field, the term AI is overused a lot as you can imagine, To, the data needs to be compelling enough the best practices that have emerged from your experience? So the question is, can you identify the data and people can fall into the trap of just can I ensure that the models are fair in their predictions. are completely outside the loop is, What is the responsibility of company's being made and this is not something you can just automate What about the doctors gut, in the, his or her intuition to. in the long term because you can't argue with the facts. So you know if lets say a recommender system would say as of the business that you are in, These are hard questions. we really had a great time with you. here in Boston, just after this.

ENTITIES

Entity	Category	Confidence
Rebecca Knight	PERSON	0.99+
Paul Gillan	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Seth Dobrin	PERSON	0.99+
Rebecca	PERSON	0.99+
John Thomas	PERSON	0.99+
Inderpal	PERSON	0.99+
John	PERSON	0.99+
Paul	PERSON	0.99+
Bill Gates	PERSON	0.99+
Boston	LOCATION	0.99+
IBM Research	ORGANIZATION	0.99+
Boston, Massachusetts	LOCATION	0.99+
first step	QUANTITY	0.99+
both	QUANTITY	0.99+
Python	TITLE	0.98+
theCUBE	ORGANIZATION	0.98+
672	QUANTITY	0.98+
today	DATE	0.98+
one example	QUANTITY	0.98+
IBM CDO Summit	EVENT	0.96+
one	QUANTITY	0.95+
Bost	LOCATION	0.95+
five things	QUANTITY	0.94+
sepsis	OTHER	0.88+
Peoria	LOCATION	0.88+
CUBE	ORGANIZATION	0.88+
IBM Chief Data Officer Summit	EVENT	0.87+
IBM	EVENT	0.85+
first thing	QUANTITY	0.82+
CDO Fall Summit 2018	EVENT	0.81+
AIF360	TITLE	0.71+
CDO	TITLE	0.66+
five years	DATE	0.66+

Daniel Hernandez, Analytics Offering Management | IBM Data Science For All

>> Announcer: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome to the big apple, John Walls and Dave Vellante here on theCUBE we are live at IBM's Data Science For All. Going to be here throughout the day with a big panel discussion wrapping up our day. So be sure to stick around all day long on theCUBe for that. Dave always good to be here in New York is it not? >> Well you know it's been kind of the data science weeks, months, last week we're in Boston at an event with the chief data officer conference. All the Boston Datarati were there, bring it all down to New York City getting hardcore really with data science so it's from chief data officer to the hardcore data scientists. >> The CDO, hot term right now. Daniel Hernandez now joins as our first guest here at Data Science For All. Who's a VP of IBM Analytics, good to see you. David thanks for being with us. >> Pleasure. >> Alright well give us first off your take, let's just step back high level here. Data science it's certainly been evolving for decades if you will. First off how do you define it today? And then just from the IBM side of the fence, how do you see it in terms of how businesses should be integrating this into their mindset. >> So the way I describe data science simply to my clients is it's using the scientific method to answer questions or deliver insights. It's kind of that simple. Or answering questions quantitatively. So it's a methodology, it's a discipline, it's not necessarily tools. So that's kind of the way I approach describing what it is. >> Okay and then from the IBM side of the fence, in terms of how wide of a net are you casting these days I assume it's as big as you can get your arms out. >> So when you think about any particular problem that's a data science problem, you need certain capabilities. We happen to deliver those capabilities. You need the ability to collect, store, manage, any and all data. You need the ability to organize that data so you can discover it and protect it. You got to be able to analyze it. Automate the mundane, explain the past, predict the future. Those are the capabilities you need to do data science. We deliver a portfolio of it. Including on the analyze part of our portfolio, our data science tools that we would declare as such. >> So data science for all is very aspirational, and when you guys made the announcement of the Watson data platform last fall, one of the things that you focused on was collaboration between data scientists, data engineers, quality engineers, application development, the whole sort of chain. And you made the point that most of the time that data scientists spend is on wrangling data. You're trying to attack that problem, and you're trying to break down the stovepipes between those roles that I just mentioned. All that has to happen before you can actually have data science for all. I mean that's just data science for all hardcore data people. Where are we in terms of sort of the progress that your clients have made in that regard? >> So you know, I would say there's two majors vectors of progress we've made. So if you want data science for all you need to be able to address people that know how to code and people that don't know how to code. So if you consider kind the history of IBM in the data science space especially in SPSS, which has been around for decades. We're mastering and solving data science problems for non-coders. The data science experience really started with embracing coders. Developers that grew up in open source, that lived and learned Jupiter or Python and were more comfortable there. And integration of these is kind of our focus. So that's one aspect. Serving the needs of people that know how to code and don't in the kind of data science role. And then for all means supporting an entire analytics life cycle from collecting the data you need in order to answer the question that you're trying to answer to organizing that information once you've collected so you can discover it inside of tools like our own data science experience and SPSS, and then of course the set of tools that around exploratory analytics. All integrated so that you can do that end to end life cycle. So where clients are, I think they're getting certainly much more sophisticated in understanding that. You know most people have approached data science as a tool problem, as a data prep problem. It's a life cycle problem. And that's kind of how we're thinking about it. We're thinking about it in terms of, alright if our job is answer questions, delivering insights through scientific methods, how do we decompose that problem to a set of things that people need to get the job done, serving the individuals that have to work together. >> And when you think about, go back to the days where it's sort of the data warehouse was king. Something we talked about in Boston last week, it used to be the data warehouse was king, now it's the process is much more important. But it was very few people had access to that data, you had the elapsed time of getting answers, and the inflexibility of the systems. Has that changed and to what degree has it changed? >> I think if you were to go ask anybody in business whether or not they have all the data they need to do their job, they would say no. Why? So we've invested in EDW's, we've invested in Hadoop. In part sometimes, the problem might be, I just don't have the data. Most of the time it is I have the data I just don't know where it is. So there's a pretty significant issue on data discoverability, and it's important that I might have data in my operational systems, I might have data inside my EDW, I don't have everything inside my EDW, I've standed up one or more data lakes, and to solve my problem like customer segmentation I have data everywhere, how do I find and bring it in? >> That seems like that should be a fundamental consideration, right? If you're going to gather this much more information, make it accessible to people. And if you don't, it's a big flaw, it's a big gap is it not? >> So yes, and I think part of the reason why is because governance professionals which I am, you know I spent quite a bit of time trying to solve governance related problems. We've been focusing pretty maniacally on kind of the compliance, and the regulatory and security related issues. Like how do we keep people from going to jail, how do we ensure regulatory compliance with things like e-discovery, and records for instance. And it just so happens the same discipline that you use, even though in some cases lighter weight implementations, are what you need in order to solve this data discovery problem. So the discourse around governance has been historically about compliance, about regulations, about cost takeout, not analytics. And so a lot of our time certainly in R&D is trying to solve that data discovery problem which is how do I discover data using semantics that I have, which as a regular user is not physical understandings of my data, and once I find it how am I assured that what I get is what I should get so that it's, I'm not subject to compliance related issues, but also making the company more vulnerable to data breach. >> Well so presumably part of that anyway involves automating classification at the point of creation or use, which is actually was a technical challenge for a number of years. Has that challenge been solved in your view? >> I think machine learning is, and in fact later on today I will be doing some demonstrations of technology which will show how we're making the application of machine learning easy, inside of everything we do we're applying machine learning techniques including to classification problems that help us solve the problem. So it could be we're automatically harvesting technical metadata. Are there business terms that could be automatically extracted that don't require some data steward to have to know and assert, right? Or can we automatically suggest and still have the steward for a case where I need a canonical data model, and so I just don't want the machine to tell me everything, but I want the machine to assist the data curation process. We are not just exploring the application of machine learning to solve that data classification problem, which historically was a manual one. We're embedding that into most of the stuff that we're doing. Often you won't even know that we're doing it behind the scenes. >> So that means that often times well the machine ideally are making the decisions as to who gets access to what, and is helping at least automate that governance, but there's a natural friction that occurs. And I wonder if you can talk about the balance sheet if you will between information as an asset, information as a liability. You know the more restrictions you put on that information the more it constricts you know a business user's ability. So how do you see that shaping up? >> I think it's often a people process problem, not necessarily a technology problem. I don't think as an industry we've figured it out. Certainly a lot of our clients haven't figured out that balance. I mean there are plenty of conversation I'll go into where I'll talk to a data science team in a same line of business as a governance team and what the data science team will tell us is I'm building my own data catalog because the stuff that the governance guys are doing doesn't help me. And the reason why it doesn't help me is because it's they're going through this top down data curation methodology and I've got a question, I need to go find the data that's relevant. I might not know what that is straight away. So the CDO function in a lot of organizations is helping bridge that. So you'll see governance responsibilities line up with the CDO with analytics. And I think that's gone a long way to bridge that gaps. But that conversation that I was just mentioning is not unique to one or two customers. Still a lot of customers are doing it. Often customers that either haven't started a CDO practice or are early days on it still. >> So about that, because this is being introduced to the workplace, a new concept right, fairly new CDOs. As opposed to CIO or CTO, you know you have these other. I mean how do you talk to your clients about trying to broaden their perspective on that and I guess emphasizing the need for them to consider putting somebody of a sole responsibility, or primary responsibility for their data. Instead of just putting it lumping it in somewhere else. >> So we happen to have one of the best CDO's inside of our group which is like a handy tool for me. So if I go into a client and it's purporting to be a data science problem and it turns out they have a data management issue around data discovery, and they haven't yet figured out how to install the process and people design to solve that particular issue one of the key things I'll do is I'll bring in our CDO and his delegates to have a conversation around them on what we're doing inside of IBM, what we're seeing in other customers to help institute that practice inside of, inside of their own organization. We have forums like the CDO event in Boston last week, which are designed to, you know it's not designed to be here's what IBM can do in technology, it's designed to say here's how the discipline impacts your business and here's some best practices you should apply. So if ultimately I enter into those conversations where I find that there's a need, I typically am like alright, I'm not going to, tools are part of the problem but not the only issue, let me bring someone in that can describe the people process related issues which you got to get right. In order for, in some cases to the tools that I deliver to matter. >> We had Seth Dobrin on last weekend in Boston, and Inderpal Bhandari as well, and he put forth this enterprise, sort of data blueprint if you will. CDO's are sort of-- >> Daniel: We're using that in IBM by the way. >> Well this is the thing, it's a really well thought out sort of structure that seems to be trickling down to the divisions. And so it's interesting to hear how you're applying Seth's expertise. I want to ask you about the Hortonworks relationship. You guys have made a big deal about that this summer. To me it was a no brainer. Really what was the point of IBM having a Hadoop distro, and Hortonworks gets this awesome distribution channel. IBM has always had an affinity for open source so that made sense there. What's behind that relationship and how's it going? >> It's going awesome. Perhaps what we didn't say and we probably should have focused on is the why customers care aspect. There are three main by an occasion use cases that customers are implementing where they are ready even before the relationship. They're asking IBM and Hortonworks to work together. And so we were coming to the table working together as partners before the deeper collaboration we started in June. The first one was bringing data science to Hadoop. So running data science models, doing data exploration where the data is. And if you were to actually rewind the clock on the IBM side and consider what we did with Hortonworks in full consideration of what we did prior, we brought the data science experience and machine learning to Z in February. The highest value transactional data was there. The next step was bring data science to where the, often for a lot of clients the second most valuable set of data which is Hadoop. So that was kind of part one. And then we've kind of continued that by bringing data science experience to the private cloud. So that's one use case. I got a lot data, I need to do data science, I want to do it in resident, I want to take advantage of the compute grid I've already laid down, and I want to take advantage of the performance benefits and the integrated security and governance benefits by having these things co-located. That's kind of play one. So we're bringing in data science experience and HDP and HDF, which are the Hortonworks distributions way closer together and optimized for each other. Another component of that is not all data is going to be in Hadoop as we were describing. Some of it's in an EDW and that data science job is going to require data outside of Hadoop, and so we brought big SQL. It was already supporting Hortonworks, we just optimized the stack, and so the combination of data science experience and big SQL allows you to data science against a broader surface area of data. That's kind of play one. Play two is I've got a EDW either for cost or agility reasons I want to augment it or some cases I might want to offload some data from it to Hadoop. And so the combination of Hortonworks plus big SQL and our data integration technologies are a perfect combination there and we have plenty of clients using that for kind of analytics offloading from EDW. And then the third piece that we're doing quite a bit of engineering, go-to-market work around is govern data lakes. So I want to enable self service analytics throughout my enterprise. I want self service analytics tools to everyone that has access to it. I want to make data available to them, but I want that data to be governed so that they can discover what's in it in the lake, and whatever I give them is what they should have access to. So those are the kind of the three tracks that we're working with Hortonworks on, and all of them are making stunning results inside of clients. >> And so that involves actually some serious engineering as well-- >> Big time. It's not just sort of a Barney deal or just a pure go to market-- >> It's certainly more the market texture and just works. >> Big picture down the road then. Whatever challenges that you see on your side of the business for the next 12 months. What are you going to tackle, what's that monster out there that you think okay this is our next hurdle to get by. >> I forgot if Rob said this before, but you'll hear him say often and it's statistically proven, the majority of the data that's available is not available to be Googled, so it's behind a firewall. And so we started last year with the Watson data platform creating an integrating data analytics system. What if customers have data that's on-prem that they want to take advantage of, what if they're not ready for the public cloud. How do we deliver public benefits to them when they want to run that workload behind a firewall. So we're doing a significant amount of engineering, really starting with the work that we did on a data science experience. Bringing it behind the firewall, but still delivering similar benefits you would expect if you're delivering it in the public cloud. A major advancement that IBM made is run IBM cloud private. I don't know if you guys are familiar with that announcement. We made, I think it's already two weeks ago. So it's a (mumbles) foundation on top of which we have micro services on top of which our stack is going to be made available. So when I think of kind of where the future is, you know our customers ultimately we believe want to run data and analytic workloads in the public cloud. How do we get them there considering they're not there now in a stepwise fashion that is sensible economically project management-wise culturally. Without having them having to wait. That's kind of big picture, kind of a big problem space we're spending considerable time thinking through. >> We've been talking a lot about this on theCUBE in the last several months or even years is people realize they can't just reform their business and stuff into the cloud. They have to bring the cloud model to their data. Wherever that data exists. If it's in the cloud, great. And the key there is you got to have a capability and a solution that substantially mimics that public cloud experience. That's kind of what you guys are focused on. >> What I tell clients is, if you're ready for certain workloads, especially green field workloads, and the capability exists in a public cloud, you should go there now. Because you're going to want to go there eventually anyway. And if not, then a vendor like IBM helps you take advantage of that behind a firewall, often in form facts that are ready to go. The integrated analytics system, I don't know if you're familiar with that. That includes our super advanced data warehouse, the data science experience, our query federation technology powered by big SQL, all in a form factor that's ready to go. You get started there for data and data science workloads and that's a major step in the direction to the public cloud. >> Alright well Daniel thank you for the time, we appreciate that. We didn't get to touch at all on baseball, but next time right? >> Daniel: Go Cubbies. (laughing) >> Sore spot with me but it's alright, go Cubbies. Alright Daniel Hernandez from IBM, back with more here from Data Science For All. IBM's event here in Manhattan. Back with more in theCUBE in just a bit. (electronic music)

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. So be sure to stick around all day long on theCUBe for that. to the hardcore data scientists. Who's a VP of IBM Analytics, good to see you. how do you see it in terms of how businesses should be So that's kind of the way I approach describing what it is. in terms of how wide of a net are you casting You need the ability to organize that data All that has to happen before you can actually and people that don't know how to code. Has that changed and to what degree has it changed? and to solve my problem like customer segmentation And if you don't, it's a big flaw, it's a big gap is it not? And it just so happens the same discipline that you use, Well so presumably part of that anyway We're embedding that into most of the stuff You know the more restrictions you put on that information So the CDO function in a lot of organizations As opposed to CIO or CTO, you know you have these other. the process and people design to solve that particular issue data blueprint if you will. that seems to be trickling down to the divisions. is going to be in Hadoop as we were describing. just a pure go to market-- that you think okay this is our next hurdle to get by. I don't know if you guys are familiar And the key there is you got to have a capability often in form facts that are ready to go. We didn't get to touch at all on baseball, Daniel: Go Cubbies. IBM's event here in Manhattan.

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
Daniel	PERSON	0.99+
February	DATE	0.99+
Boston	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
one	QUANTITY	0.99+
David	PERSON	0.99+
Manhattan	LOCATION	0.99+
Inderpal Bhandari	PERSON	0.99+
June	DATE	0.99+
Rob	PERSON	0.99+
Dave	PERSON	0.99+
New York	LOCATION	0.99+
New York City	LOCATION	0.99+
last year	DATE	0.99+
Seth	PERSON	0.99+
Python	TITLE	0.99+
third piece	QUANTITY	0.99+
EDW	ORGANIZATION	0.99+
second	QUANTITY	0.99+
Hortonworks	ORGANIZATION	0.99+
last week	DATE	0.99+
today	DATE	0.99+
First	QUANTITY	0.99+
SQL	TITLE	0.99+
two customers	QUANTITY	0.99+
Hadoop	TITLE	0.99+
first	QUANTITY	0.99+
SPSS	TITLE	0.98+
Seth Dobrin	PERSON	0.98+
three tracks	QUANTITY	0.98+
John Walls	PERSON	0.98+
IBM Analytics	ORGANIZATION	0.98+
first guest	QUANTITY	0.97+
two weeks ago	DATE	0.97+
one aspect	QUANTITY	0.96+
first one	QUANTITY	0.96+
Barney	ORGANIZATION	0.96+
two majors	QUANTITY	0.96+
last weekend	DATE	0.94+
this summer	DATE	0.94+
Hadoop	ORGANIZATION	0.93+
decades	QUANTITY	0.92+
last fall	DATE	0.9+
two	QUANTITY	0.85+
IBM Data Science For All	ORGANIZATION	0.79+
three main	QUANTITY	0.78+
next 12 months	DATE	0.78+
CDO	TITLE	0.77+
D	ORGANIZATION	0.72+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Seth Dobrin: