John Thomas, IBM | IBM CDO Summit Spring 2018
>> Narrator: Live from downtown San Francisco, it's theCUBE, covering IBM Chief Data Officer Strategy Summit 2018, brought to you by IBM. >> We're back in San Francisco, we're here at the Parc 55 at the IBM Chief Data Officer Strategy Summit. You're watching theCUBE, the leader in live tech coverage. My name is Dave Vellante and IBM's Chief Data Officer Strategy Summit, they hold them on both coasts, one in Boston and one in San Francisco. A couple times each year, about 150 chief data officers coming in to learn how to apply their craft, learn what IBM is doing, share ideas. Great peer networking, really senior audience. John Thomas is here, he's a distinguished engineer and director at IBM, good to see you again John. >> Same to you. >> Thanks for coming back in theCUBE. So let's start with your role, distinguished engineer, we've had this conversation before but it just doesn't happen overnight, you've got to be accomplished, so congratulations on achieving that milestone, but what is your role? >> The road to distinguished engineer is long but today, these days I spend a lot of my time working on data science and in fact am part of what is called a data science elite team. We work with clients on data science engagements, so this is not consulting, this is not services, this is where a team of data scientists work collaboratively with a client on a specific use case and we build it out together. We bring data science expertise, machine learning, deep learning expertise. We work with the business and build out a set of tangible assets that are relevant to that particular client. >> So this is not a for-pay service, this is hey you're a great customer, a great client of ours, we're going to bring together some resources, you'll learn, we'll learn, we'll grow together, right? >> This is an investment IBM is making. It's a major investment for our top clients working with them on their use cases. >> This is a global initiative? >> This is global, yes. >> We're talking about, what, hundreds of clients, thousands of clients? >> Well eventually thousands but we're starting small. We are trying to scale now so obviously once you get into these engagements, you find out that it's not just about building some models. There are a lot of challenges that you've got to deal with in an enterprise setting. >> Dave: What are some of the challenges? >> Well in any data science engagement the first thing is to have clarity on the use case that you're engaging in. You don't want to build models for models' sake. Just because Tensorflow or scikit-learn is great and build models, that doesn't serve a purpose. That's the first thing, do you have clarity of the business use case itself? Then comes data, now I cannot stress this enough, Dave, there is no data science without data, and you might think this is the most obvious thing, of course there has to be data, but when I say data I'm talking about access to the right data. Do we have governance over the data? Do we know who touched the data? Do we have lineage on that data? Because garbage in, garbage out, you know this. Do we have access to the right data in the right control setting for my machine learning models we built. These are challenges and then there's another challenge around, okay, I built my models but how do I operationalize them? How do I weave those models into the fabric of my business? So these are all challenges that we have to deal with. >> That's interesting what you're saying about the data, it does sound obvious but having the right data model as well. I think about when I interact with Netflix, I don't talk to their customer service department or their marketing department or their sales department or their billing department, it's one experience. >> You just have an experience, exactly. >> This notion of incumbent disruptors, is that a logical starting point for these guys to get to that point where they have a data model that is a single data model? >> Single data model. (laughs) >> Dave: What does that mean, right? At least from an experienced standpoint. >> Once we know this is the kind of experience we want to target, what are the relevant data sets and data pieces that are necessary to make their experience happen or come together. Sometimes there's core enterprise data that you have in many cases, it has been augmented with external data. Do you have a strategy around handling your internal, external data, your structured transactional data, your semi-structured data, your newsfeeds. All of these need to come together in a consistent fashion for that experience to be true. It is not just about I've got my credit card transaction data but what else is augmenting that data? You need a model, you need a strategy around that. >> I talk to a lot of organizations and they say we have a good back-end reporting system, we have Cognos we can build cubes and all kinds of financial data that we have, but then it doesn't get down to the front line. We have an instrument at the front line, we talk about IOT and that portends change there but there's a lot of data that either isn't persisted or not stored or doesn't even exist, so is that one of the challenges that you see enterprises dealing with? >> It is a challenge. Do I have access to the right data, whether that is data at rest or in motion? Am I persisting it the way I can consume it later? Or am I just moving big volumes of data around because analytics is there, or machine learning is there and I have to move data out of my core systems into that area. That is just a waste of time, complexity, cost, hidden costs often, 'cause people don't usually think about the hidden costs of moving large volumes of data around. But instead of that can I bring analytics and machine learning and data science itself to where my data is. Not necessarily to move it around all the time. Whether you're dealing with streaming data or large volumes of data in your Hadoop environment or mainframes or whatever. Can I do ML in place and have the most value out of the data that is there? >> What's happening with all that Hadoop? Nobody talks about Hadoop anymore. Hadoop largely became a way to store data for less, but there's all this data now and a data lake. How are customers dealing with that? >> This is such an interesting thing. People used to talk about the big data, you're right. We jumped from there to the cognitive It's not like that right? No, without the data then there is no cognition there is no AI, there is no ML. In terms of existing investments in Hadoop for example, you have to absolutely be able to tap in and leverage those investments. For example, many large clients have investments in large Cloudera or Hortonworks environment, or Hadoop environments so if you're doing data science, how do you push down, how do you leverage that for scale, for example? How do you access the data using the same access control mechanisms that are already in place? Maybe you have Carbros as your mechanism how do you work with that? How do you avoid moving data off of that environment? How do you push down data prep into the spar cluster? How do you do model training in that spar cluster? All of these become important in terms of leveraging your existing investments. It is not just about accessing data where it is, it's also about leveraging the scale that the company has already invested in. You have hundred, 500 node Hadoop clusters well make the most of them in terms of scaling your data science operations. So push down and access data as much as possible in those environments. >> So Beth talked today, Beth Smith, about Watson's law, and she made a little joke about that, but to me its poignant because we are entering a new era. For decades this industry marched to the cadence of Moore's law, then of course Metcalfe's law in the internet era. I want to make an observation and see if it resonates. It seems like innovation is no longer going to come from doubling microprocessor speed and the network is there, it's built out, the internet is built. It seems like innovation comes from applying AI to data together to get insights and then being able to scale, so it's cloud economics. Marginal costs go to zero and massive network effects, and scale, ability to track innovation. That seems to be the innovation equation, but how do you operationalize that? >> To your point, Dave, when we say cloud scale, we want the flexibility to do that in an off RAM public cloud or in a private cloud or in between, in a hybrid cloud environment. When you talk about operationalizing, there's a couple different things. People think that, say I've got a super Python programmer and he's great with Tensorflow or scikit-learn or whatever and he builds these models, great, but what happens next, how do you actually operationalize those models? You need to be able to deploy those models easily. You need to be able to consume those models easily. For example you have a chatbot, a chatbot is dumb until it actually calls these machine learning models, real time to make decisions on which way the conversation should go. So how do you make that chatbot intelligent? It's when it consumes the ML models that have been built. So deploying models, consuming models, you create a model, you deploy it, you've got to push it through the development test staging production phases. Just the same rigor that you would have for any applications that are deployed. Then another thing is, a model is great on day one. Let's say I built a fraud detection model, it works great on day one. A week later, a month later it's useless because the data that it trained on is not what the fraudsters are using now. So patterns have changed, the model needs to be retrained How do I understand the performance of the model stays good over time? How do I do monitoring? How do I retrain the models? How do I do the life cycle management of the models and then scale? Which is okay I deployed this model out and its great, every application is calling it, maybe I have partners calling these models. How do I automatically scale? Whether what you are using behind the scenes or if you are going to use external clusters for scale? Technology is like spectrum connector from our HPC background are very interesting counterparts to this. How do I scale? How do I burst? How do I go from an on-frame to an off-frame environment? How do I build something behind the firewall but deploy it into the cloud? We have a chatbot or some other cloud-native application, all of these things become interesting in the operationalizing. >> So how do all these conversations that you're having with these global elite clients and the challenges that you're unpacking, how do they get back into innovation for IBM, what's that process like? >> It's an interesting place to be in because I am hearing and experiencing first hand real enterprise challenges and there we see our product doesn't handle this particular thing now? That is an immediate circling back with offering management and development. Hey guys we need this particular function because I'm seeing this happening again and again in customer engagements. So that helps us shape our products, shape our data science offerings, and sort of running with the flow of what everyone is doing, we'll look at that. What do our clients want? Where are they headed? And shape the products that way. >> Excellent, well John thanks very much for coming back in theCUBE and it's a pleasure to see you again. I appreciate your time. >> Thank you Dave. >> All right good to see you. Keep it right there everybody we'll be back with our next guest. We're live from the IBM CDO strategy summit in San Francisco, you're watching theCUBE.
SUMMARY :
brought to you by IBM. to see you again John. but what is your role? that are relevant to This is an investment IBM is making. into these engagements, you find out the first thing is to have but having the right data model as well. Single data model. Dave: What does that mean, right? for that experience to be true. so is that one of the challenges and I have to move data out but there's all this that the company has already invested in. and scale, ability to track innovation. How do I do the life cycle management to be in because I am hearing pleasure to see you again. All right good to see you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Beth Smith | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Beth | PERSON | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
A week later | DATE | 0.99+ |
a month later | DATE | 0.99+ |
thousands | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Watson | PERSON | 0.99+ |
one experience | QUANTITY | 0.99+ |
Moore | PERSON | 0.98+ |
today | DATE | 0.98+ |
Python | TITLE | 0.98+ |
Metcalfe | PERSON | 0.98+ |
Parc 55 | LOCATION | 0.97+ |
both coasts | QUANTITY | 0.97+ |
zero | QUANTITY | 0.96+ |
Single | QUANTITY | 0.96+ |
about 150 chief data officers | QUANTITY | 0.96+ |
day one | QUANTITY | 0.94+ |
Cognos | ORGANIZATION | 0.94+ |
each year | QUANTITY | 0.93+ |
hundreds of clients | QUANTITY | 0.92+ |
Hortonworks | ORGANIZATION | 0.91+ |
first thing | QUANTITY | 0.9+ |
Tensorflow | TITLE | 0.9+ |
IBM CDO Summit | EVENT | 0.87+ |
Strategy Summit | EVENT | 0.86+ |
hundred, 500 node Hadoop clusters | QUANTITY | 0.85+ |
thousands of clients | QUANTITY | 0.84+ |
single data model | QUANTITY | 0.81+ |
Strategy Summit 2018 | EVENT | 0.81+ |
Chief Data Officer | EVENT | 0.79+ |
IBM CDO strategy summit | EVENT | 0.79+ |
Chief Data Officer Strategy Summit | EVENT | 0.79+ |
couple times | QUANTITY | 0.77+ |
Cloudera | ORGANIZATION | 0.75+ |
decades | QUANTITY | 0.74+ |
Spring 2018 | DATE | 0.72+ |
Data Officer | EVENT | 0.67+ |
Carbros | ORGANIZATION | 0.63+ |
Tensorflow | ORGANIZATION | 0.61+ |
scikit | ORGANIZATION | 0.58+ |
theCUBE | ORGANIZATION | 0.58+ |