Caryn Woodruff, IBM & Ritesh Arora, HCL Technologies | IBM CDO Summit Spring 2018
>> Announcer: Live from downtown San Francisco, it's the Cube, covering IBM Chief Data Officer Strategy Summit 2018. Brought to you by IBM. >> Welcome back to San Francisco everybody. We're at the Parc 55 in Union Square and this is the Cube, the leader in live tech coverage and we're covering exclusive coverage of the IBM CDO strategy summit. IBM has these things, they book in on both coasts, one in San Francisco one in Boston, spring and fall. Great event, intimate event. 130, 150 chief data officers, learning, transferring knowledge, sharing ideas. Cayn Woodruff is here as the principle data scientist at IBM and she's joined by Ritesh Ororo, who is the director of digital analytics at HCL Technologies. Folks welcome to the Cube, thanks for coming on. >> Thank you >> Thanks for having us. >> You're welcome. So we're going to talk about data management, data engineering, we're going to talk about digital, as I said Ritesh because digital is in your title. It's a hot topic today. But Caryn let's start off with you. Principle Data Scientist, so you're the one that is in short supply. So a lot of demand, you're getting pulled in a lot of different directions. But talk about your role and how you manage all those demands on your time. >> Well, you know a lot of, a lot of our work is driven by business needs, so it's really understanding what is critical to the business, what's going to support our businesses strategy and you know, picking the projects that we work on based on those items. So it's you really do have to cultivate the things that you spend your time on and make sure you're spending your time on the things that matter and as Ritesh and I were talking about earlier, you know, a lot of that means building good relationships with the people who manage the systems and the people who manage the data so that you can get access to what you need to get the critical insights that the business needs, >> So Ritesh, data management I mean this means a lot of things to a lot of people. It's evolved over the years. Help us frame what data management is in this day and age. >> Sure, so there are two aspects of data in my opinion. One is the data management, another the data engineering, right? And over the period as the data has grown significantly. Whether it's unstructured data, whether it's structured data, or the transactional data. We need to have some kind of governance in the policies to secure data to make data as an asset for a company so the business can rely on your data. What you are delivering to them. Now, the another part comes is the data engineering. Data engineering is more about an IT function, which is data acquisition, data preparation and delivering the data to the end-user, right? It can be business, it can be third-party but it all comes under the governance, under the policies, which are designed to secure the data, how the data should be accessed to different parts of the company or the external parties. >> And how those two worlds come together? The business piece and the IT piece, is that where you come in? >> That is where data science definitely comes into the picture. So if you go online, you can find Venn diagrams that describe data science as a combination of computer science math and statistics and business acumen. And so where it comes in the middle is data science. So it's really being able to put those things together. But, you know, what's what's so critical is you know, Interpol, actually, shared at the beginning here and I think a few years ago here, talked about the five pillars to building a data strategy. And, you know, one of those things is use cases, like getting out, picking a need, solving it and then going from there and along the way you realize what systems are critical, what data you need, who the business users are. You know, what would it take to scale that? So these, like, Proof-point projects that, you know, eventually turn into these bigger things, and for them to turn into bigger things you've got to have that partnership. You've got to know where your trusted data is, you've got to know that, how it got there, who can touch it, how frequently it is updated. Just being able to really understand that and work with partners that manage the infrastructure so that you can leverage it and make it available to other people and transparent. >> I remember when I first interviewed Hilary Mason way back when and I was asking her about that Venn diagram and she threw in another one, which was data hacking. >> Caryn: Uh-huh, yeah. >> Well, talk about that. You've got to be curious about data. You need to, you know, take a bath in data. >> (laughs) Yes, yes. I mean yeah, you really.. Sometimes you have to be a detective and you have to really want to know more. And, I mean, understanding the data is like the majority of the battle. >> So Ritesh, we were talking off-camera about it's not how titles change, things evolve, data, digital. They're kind of interchangeable these days. I mean we always say the difference between a business and a digital business is how they have used data. And so digital being part of your role, everybody's trying to get digital transformation, right? As an SI, you guys are at the heart of it. Certainly, IBM as well. What kinds of questions are our clients asking you about digital? >> So I ultimately see data, whatever we drive from data, it is used by the business side. So we are trying to always solve a business problem, which is to optimize the issues the company is facing, or try to generate more revenues, right? Now, the digital as well as the data has been married together, right? Earlier there are, you can say we are trying to analyze the data to get more insights, what is happening in that company. And then we came up with a predictive modeling that based on the data that will statically collect, how can we predict different scenarios, right? Now digital, we, over the period of the last 10 20 years, as the data has grown, there are different sources of data has come in picture, we are talking about social media and so on, right? And nobody is looking for just reports out of the Excel, right? It is more about how you are presenting the data to the senior management, to the entire world and how easily they can understand it. That's where the digital from the data digitization, as well as the application digitization comes in picture. So the tools are developed over the period to have a better visualization, better understanding. How can we integrate annotation within the data? So these are all different aspects of digitization on the data and we try to integrate the digital concepts within our data and analytics, right? So I used to be more, I mean, I grew up as a data engineer, analytics engineer but now I'm looking more beyond just the data or the data preparation. It's more about presenting the data to the end-user and the business. How it is easy for them to understand it. >> Okay I got to ask you, so you guys are data wonks. I am too, kind of, but I'm not as skilled as you are, but, and I say that with all due respect. I mean you love data. >> Caryn: Yes. >> As data science becomes a more critical skill within organizations, we always talk about the amount of data, data growth, the stats are mind-boggling. But as a data scientist, do you feel like you have access to the right data and how much of a challenge is that with clients? >> So we do have access to the data but the challenge is, the company has so many systems, right? It's not just one or two applications. There are companies we have 50 or 60 or even hundreds of application built over last 20 years. And there are some applications, which are basically duplicate, which replicates the data. Now, the challenge is to integrate the data from different systems because they maintain different metadata. They have the quality of data is a concern. And sometimes with the international companies, the rules, for example, might be in US or India or China, the data acquisitions are different, right? And you are, as you become more global, you try to integrate the data beyond boundaries, which becomes a more compliance issue sometimes, also, beyond the technical issues of data integration. >> Any thoughts on that? >> Yeah, I think, you know one of the other issues too, you have, as you've heard of shadow IT, where people have, like, servers squirreled away under their desks. There's your shadow data, where people have spreadsheets and databases that, you know, they're storing on, like a small server or that they share within their department. And so you know, you were discussing, we were talking earlier about the different systems. And you might have a name in one system that's one way and a name in another system that's slightly different, and then a third system, where it's it's different and there's extra granularity to it or some extra twist. And so you really have to work with all of the people that own these processes and figure out what's the trusted source? What can we all agree on? So there's a lot of... It's funny, a lot of the data problems are people problems. So it's getting people to talk and getting people to agree on, well this is why I need it this way, and this is why I need it this way, and figuring out how you come to a common solution so you can even create those single trusted sources that then everybody can go to and everybody knows that they're working with the the right thing and the same thing that they all agree on. >> The politics of it and, I mean, politics is kind of a pejorative word but let's say dissonance, where you have maybe of a back-end syst6em, financial system and the CFO, he or she is looking at the data saying oh, this is what the data says and then... I remember I was talking to a, recently, a chef in a restaurant said that the CFO saw this but I know that's not the case, I don't have the data to prove it. So I'm going to go get the data. And so, and then as they collect that data they bring together. So I guess in some ways you guys are mediators. >> [Caryn And Ritesh] Yes, yes. Absolutely. >> 'Cause the data doesn't lie you just got to understand it. >> You have to ask the right question. Yes. And yeah. >> And sometimes when you see the data, you start, that you don't even know what questions you want to ask until you see the data. Is that is that a challenge for your clients? >> Caryn: Yes, all the time. Yeah >> So okay, what else do we want to we want to talk about? The state of collaboration, let's say, between the data scientists, the data engineer, the quality engineer, maybe even the application developers. Somebody, John Fourier often says, my co-host and business partner, data is the new development kit. Give me the data and I'll, you know, write some code and create an application. So how about collaboration amongst those roles, is that something... I know IBM's gone on about some products there but your point Caryn, it's a lot of times it's the people. >> It is. >> And the culture. What are you seeing in terms of evolution and maturity of that challenge? >> You know I have a very good friend who likes to say that data science is a team sport and so, you know, these should not be, like, solo projects where just one person is wading up to their elbows in data. This should be something where you've got engineers and scientists and business, people coming together to really work through it as a team because everybody brings really different strengths to the table and it takes a lot of smart brains to figure out some of these really complicated things. >> I completely agree. Because we see the challenges, we always are trying to solve a business problem. It's important to marry IT as well as the business side. We have the technical expert but we don't have domain experts, subject matter experts who knows the business in IT, right? So it's very very important to collaborate closely with the business, right? And data scientist a intermediate layer between the IT as well as business I will say, right? Because a data scientist as they, over the years, as they try to analyze the information, they understand business better, right? And they need to collaborate with IT to either improve the quality, right? That kind of challenges they are facing and I need you to, the data engineer has to work very hard to make sure the data delivered to the data scientist or the business is accurate as much as possible because wrong data will lead to wrong predictions, right? And ultimately we need to make sure that we integrate the data in the right way. >> What's a different cultural dynamic that was, say ten years ago, where you'd go to a statistician, she'd fire up the SPSS.. >> Caryn: We still use that. >> I'm sure you still do but run some kind of squares give me some, you know, probabilities and you know maybe run some Monte Carlo simulation. But one person kind of doing all that it's your point, Caryn. >> Well you know, it's it's interesting. There are there are some students I mentor at a local university and you know we've been talking about the projects that they get and that you know, more often than not they get a nice clean dataset to go practice learning their modeling on, you know? And they don't have to get in there and clean it all up and normalize the fields and look for some crazy skew or no values or, you know, where you've just got so much noise that needs to be reduced into something more manageable. And so it's, you know, you made the point earlier about understanding the data. It's just, it really is important to be very curious and ask those tough questions and understand what you're dealing with. Before you really start jumping in and building a bunch of models. >> Let me add another point. That the way we have changed over the last ten years, especially from the technical point of view. Ten years back nobody talks about the real-time data analysis. There was no streaming application as such. Now nobody talks about the batch analysis, right? Everybody wants data on real-time basis. But not if not real-time might be near real-time basis. That has become a challenge. And it's not just that prediction, which are happening in their ERP environment or on the cloud, they want the real-time integration with the social media for the marketing and the sales and how they can immediately do the campaign, right? So, for example, if I go to Google and I search for for any product, right, for example, a pressure cooker, right? And I go to Facebook, immediately I see the ad within two minutes. >> Yeah, they're retargeting. >> So that's a real-time analytics is happening under different application, including the third-party data, which is coming from social media. So that has become a good source of data but it has become a challenge for the data analyst and the data scientist. How quickly we can turn around is called data analysis. >> Because it used to be you would get ads for a pressure cooker for months, even after you bought the pressure cooker and now it's only a few days, right? >> Ritesh: It's a minute. You close this application, you log into Facebook... >> Oh, no doubt. >> Ritesh: An ad is there. >> Caryn: There it is. >> Ritesh: Because everything is linked either your phone number or email ID you're done. >> It's interesting. We talked about disruption a lot. I wonder if that whole model is going to get disrupted in a new way because everybody started using the same ad. >> So that's a big change of our last 10 years. >> Do you think..oh go ahead. >> oh no, I was just going to say, you know, another thing is just there's so much that is available to everybody now, you know. There's not this small little set of tools that's restricted to people that are in these very specific jobs. But with open source and with so many software-as-a-service products that are out there, anybody can go out and get an account and just start, you know, practicing or playing or joining a cackle competition or, you know, start getting their hands on.. There's data sets that are out there that you can just download to practice and learn on and use. So, you know, it's much more open, I think, than it used to be. >> Yeah, community additions of software, open data. The number of open day sources just keeps growing. Do you think that machine intelligence can, or how can machine intelligence help with this data quality challenge? >> I think that it's it's always going to require people, you know? There's always going to be a need for people to train the machines on how to interpret the data. How to classify it, how to tag it. There's actually a really good article in Popular Science this month about a woman who was training a machine on fake news and, you know, it did a really nice job of finding some of the the same claims that she did. But she found a few more. So, you know, I think it's, on one hand we have machines that we can augment with data and they can help us make better decisions or sift through large volumes of data but then when we're teaching the machines to classify the data or to help us with metadata classification, for example, or, you know, to help us clean it. I think that it's going to be a while before we get to the point where that's the inverse. >> Right, so in that example you gave, the human actually did a better job from the machine. Now, this amazing to me how.. What, what machines couldn't do that humans could, you know last year and all of a sudden, you know, they can. It wasn't long ago that robots couldn't climb stairs. >> And now they can. >> And now they can. >> It's really creepy. >> I think the difference now is, earlier you know, you knew that there is an issue in the data. But you don't know that how much data is corrupt or wrong, right? Now, there are tools available and they're very sophisticated tools. They can pinpoint and provide you the percentage of accuracy, right? On different categories of data that that you come across, right? Even forget about the structure data. Even when you talk about unstructured data, the data which comes from social media or the comments and the remarks that you log or are logged by the customer service representative, there are very sophisticated text analytics tools available, which can talk very accurately about the data as well as the personality of the person who is who's giving that information. >> Tough problems but it seems like we're making progress. All you got to do is look at fraud detection as an example. Folks, thanks very much.. >> Thank you. >> Thank you very much. >> ...for sharing your insight. You're very welcome. Alright, keep it right there everybody. We're live from the IBM CTO conference in San Francisco. Be right back, you're watching the Cube. (electronic music)
SUMMARY :
Brought to you by IBM. of the IBM CDO strategy summit. and how you manage all those demands on your time. and you know, picking the projects that we work on I mean this means a lot of things to a lot of people. and delivering the data to the end-user, right? so that you can leverage it and make it available about that Venn diagram and she threw in another one, You need to, you know, take a bath in data. and you have to really want to know more. As an SI, you guys are at the heart of it. the data to get more insights, I mean you love data. and how much of a challenge is that with clients? Now, the challenge is to integrate the data And so you know, you were discussing, I don't have the data to prove it. [Caryn And Ritesh] Yes, yes. You have to ask the right question. And sometimes when you see the data, Caryn: Yes, all the time. Give me the data and I'll, you know, And the culture. and so, you know, these should not be, like, and I need you to, the data engineer that was, say ten years ago, and you know maybe run some Monte Carlo simulation. and that you know, more often than not And I go to Facebook, immediately I see the ad and the data scientist. You close this application, you log into Facebook... Ritesh: Because everything is linked I wonder if that whole model is going to get disrupted that is available to everybody now, you know. Do you think that machine intelligence going to require people, you know? Right, so in that example you gave, and the remarks that you log All you got to do is look at fraud detection as an example. We're live from the IBM CTO conference
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Ritesh Ororo | PERSON | 0.99+ |
Caryn | PERSON | 0.99+ |
John Fourier | PERSON | 0.99+ |
Ritesh | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
US | LOCATION | 0.99+ |
50 | QUANTITY | 0.99+ |
Cayn Woodruff | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
China | LOCATION | 0.99+ |
India | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Excel | TITLE | 0.99+ |
one | QUANTITY | 0.99+ |
Caryn Woodruff | PERSON | 0.99+ |
Ritesh Arora | PERSON | 0.99+ |
Hilary Mason | PERSON | 0.99+ |
60 | QUANTITY | 0.99+ |
130 | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Monte Carlo | TITLE | 0.99+ |
HCL Technologies | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
third system | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
Interpol | ORGANIZATION | 0.98+ |
ten years ago | DATE | 0.98+ |
two applications | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
Parc 55 | LOCATION | 0.98+ |
five pillars | QUANTITY | 0.98+ |
one system | QUANTITY | 0.98+ |
ORGANIZATION | 0.97+ | |
two aspects | QUANTITY | 0.97+ |
both coasts | QUANTITY | 0.97+ |
one person | QUANTITY | 0.96+ |
Ten years back | DATE | 0.96+ |
two minutes | QUANTITY | 0.95+ |
this month | DATE | 0.95+ |
Union Square | LOCATION | 0.95+ |
two worlds | QUANTITY | 0.94+ |
Spring 2018 | DATE | 0.94+ |
Popular Science | TITLE | 0.9+ |
CTO | EVENT | 0.88+ |
days | QUANTITY | 0.88+ |
one way | QUANTITY | 0.87+ |
SPSS | TITLE | 0.86+ |
single trusted sources | QUANTITY | 0.85+ |
Venn | ORGANIZATION | 0.84+ |
few years ago | DATE | 0.84+ |
150 chief data officers | QUANTITY | 0.83+ |
last 10 20 years | DATE | 0.83+ |
Officer Strategy Summit 2018 | EVENT | 0.82+ |
hundreds of application | QUANTITY | 0.8+ |
last 10 years | DATE | 0.8+ |
Cube | COMMERCIAL_ITEM | 0.79+ |
IBM Chief | EVENT | 0.79+ |
IBM CDO strategy summit | EVENT | 0.72+ |
last ten years | DATE | 0.7+ |
IBM CDO Summit | EVENT | 0.7+ |
fall | DATE | 0.68+ |
Cube | TITLE | 0.66+ |
spring | DATE | 0.65+ |
last 20 years | DATE | 0.63+ |
minute | QUANTITY | 0.49+ |