John Thomas, IBM Data and AI | IBM Data and AI Forum

(upbeat music) >> Announcer: Live from Miami, Florida. It's theCUBE. Covering IBM's Data and AI Forum. Brought to you by IBM. >> We're back in Miami everybody. You're watching theCube, the leader in live tech coverage. We go out to the events and extract the signal from the noise we hear. Covering the IBM Data and AI Forum, John Thomas is here, many time CUBE guest. He's not only a distinguished engineer but he's also the chief data scientist for IBM Data and AI. John, great to see you again. >> Great to see you again Dave. >> I'm always excited to talk to you because you're hard core data science. You're working with the customers and you're kind of where the action is. The watchword today is end to end data science life cycle. What's behind that? I mean it's been a lot of experimentation, a lot of tactical things going on. You're talking about end to end life cycle, explain. >> So Dave, what we are saying in our client engagements is, actually working with the data, building the models. That part is relatively easy. The tougher part is to make the business understand what is the true value of this. So it's not a science project, right? It is not a, an academic exercise. So how do you do that? In order for that to happen these models need to go into production. Well, okay, well how do you do that? There is this business of, I've got something in my development environment that needs to move up through QA and staging, and then to production. Well, lot of different things need to happen as you go through that process. How do you do this? See this is not a new paradigm. It is a paradigm that exists in the world of application development. You got to go through a dev ops life cycle. You got to go through continuous integration and continuous delivery mindset. You got to have the same rigor in data science. Then at the front end of this is, what business problem are you actually solving? Do you have business KPIs for that? And when the model is actually is in production, can you track, can you monitor the performance of the model against the business KPIs that the business cares about? And how do you do this on an end to end fashion? And then in there is retraining the model when performance degrades, et cetera, et cetera. But this notion of following dev ops mindset in the world of data science is absolutely essential. >> Dave: So when you think about dev ops, you think of agile. So help me square this circle, when you think end to end data life cycle, you think chewy, big, waterfall, but I'm inferring you're not prescribing a waterfall. >> John: No, no, no. >> So how are organizations dealing with that wholistic end to end view but still doing it in an agile manner? >> Yeah, exactly. So, I always say do not boil the ocean, especially if you're approaching AI use cases. Start with something that is convened, that you can define and break it into springs. So taking an agile approach to this. Two, three springs, if you're not seeing value in those two, three springs, go back to the drawing board and see what is it that you're doing wrong. So for each of your springs, what is the specific successful criteria that you care about and the business cares about? Now, as you go through this process, you need a mechanism to look at, okay, well I've got something in development, how do I move the assets? Not just the model, but, what is the set of features that you're working with? What is the data prep pipeline? What are the scripts being used to evaluate the model? All of these things are logical assets surrounding the model. How do you move them from development to staging? How do you do QA against these set of assets? Then how do you do third party approval oversight? How do you do code review? How do make sure that when you move these assets all of the surrounding mechanisms are being adhered to, compliance requirements, regulatory requirements? And then finally get them to production. So there's a technology aspect of it, obviously. You have a lot of discussion around cube flow, ml flow, et cetera, et cetera as technology options. But there is also mindset that needs to be followed here. >> So once you find a winner, business people want a scale, 'cause they can make more money the more and more times they can replicate that value. And I want to understand this trust and transparent, 'cause when you scale, if you're scaling things that aren't compliant, you're in trouble. But before we get there, I wonder if we can take an example of, pick an industry, or some kind of use case where you've seen this end to end life cycle be successful. >> Yeah, across industries. I mean it's not just specific industry related. But, I'll give you an example. This morning Wunderman Thompson was talking about how they are applying machine learning to, a very difficult problem, which is how to improve how they create a first-time buyer list for their clients. But think of the problem here. It's not just about a one time building of a model. The model needs, okay you got data, understand what data says you're working with, what is the lineage of that data. Once I have their understanding of their data then I get into feature selection, feature engineering, all the steps that I need in your machine learning cycle. Once I am done with selecting my features, doing my feature engineering, I go into model building. Now, it's a pipeline that is being built. It is not a one time activity. Once that model, the pipeline has been vetted, you got to move it from development to your QA environment, from there to your production environment, and so on. And here comes, and this is where it links to the question, transparency discussion. Well the model is in production, how do I make sure the model is being fair? How do I make sure that I can explain what is going on? How do I make sure that the model is not unfairly biased? So all of these are important discussions in the trust and transparency because, you know, people are going to question the outcome of the model. Why did it make a decision? If a campaign was run for an end individual, why did you choose him and not somebody else? If it's a credit card fraud detection scenario, why was somebody tagged as fraudulent and not the other person? If a loan application was rejected, why was he rejected and not someone else? You got to explain this. So, it's not an explain ability that Tom has a lot of, it's over loaded at times, but. The idea here is you should be able to retrace your steps back to an individual scoring activity and explain an individual transaction. You should be able to play back an individual transaction and say version 15 of my model used these features, these hundred features for it's scoring. This was the incoming payload, this was the outcome, and, if I had changed five of my incoming payload variables out of the 500 I use, or hundred I use, the outcome would have been different. Now you can say, you know what, ethnicity, age, education, gender. These parameters did play a role in the decision but they were within the fairness bracket. And the fairness bracket is something that you have to define. >> So, if I could play that back. Take fraud detection. So you might have the machine tell you with 90% confidence or greater that this is fraud but it throws back a false positive. When you dig in, you might see well there's some bias included in there. Then what? You would kind of re-factor the model? >> A couple of different things. Sometimes a bias is in the data itself and it may be valid bias. And you may not want to change that. Well, that's what the system allows you to do. It tells you, this is the kind of bias that exists in the data already. And you can make a business decision as to whether it is good to retain that bias or to correct it in the data itself. Now, if the bias is in how the algorithm is processing their data, again, it's a business decision. Should I correct it or not. Sometimes, bias is not a bad thing. (laughs) It's not a bad thing. No, because, you are actually looking at what signal exists in their data. But what you want to make sure is that it's fair. Now what is fair, that is up to the regulatory body. Are your business defined? You know what, age range between 26 and 45, I want to treat them a certain way. If this is a conscious decision that you, as a business, or your industry is making, that's fair game. But if it is, this is what I wanted that model to do for this age range but the model is behaving a different way, I want to catch that. And I want to either fix the bias in the data or in how the algorithm is behaving with the model itself. >> So, you can eject the edits of the company into the model, but then, and then appropriately and fairly apply that, as long as it doesn't break the law. >> Exactly. (laughs) >> Which is another part of the compliance. >> So, this is not just about compliance. Compliance is a big, big part here. But, this also just answering what your end customer is going to ask. I put in an application for a loan and I was rejected. And, I want an explanation as to why it was rejected, right? >> So you got to be transparent, is your point there. >> Exactly, exactly. And if the business can say, you know what, this is the criteria we used, you fell in this range, and this, in our mind, is a fair range, that is okay. It may not be okay for the end customer but at least you have a valid explanation for why the decision was made by the model. So, it's some black box making some.. >> So the bank might say, well, the decision was made because we don't like the location of the property, we think they're over valued. It had nothing to do with your credit. >> John: Exactly. >> We just don't want to invest in this, by the way, maybe we advise you don't invest in that either. >> Right, right, right. >> So that feedback loop is there. >> This is, being able to find it for each individual transaction, each individual model scoring. What weighed in into the decision that was made by the model. This is important. >> So you got to have atomic access to that data? >> John: At the transaction level. >> And then make it transparent. Are organizations, banks, are they actually making it transparent to their consumers, 'cause I know in situations that I'm involved in, it's either okay go or no but, we're not going to tell you why. >> Everyone is beginning to look into this place. >> Healthcare is another one, right, where we would love more transparency in healthcare. >> Exactly. So this is happening. This is happening where people are looking at oh we can't do just black box in decision making, we have to get serious about this. >> And I wonder, John, if a lot of that black box decision making is just easy to not share information. Healthcare, you're worried about HIPPA. Financial services is just so highly regulated so people are afraid to actually be transparent. >> John: Yup. >> But machine intelligence potentially solves that problem? >> So, internally, at least internal to the company, when the decision is made, you need to have a good idea why the decision was made, right. >> Yeah right. >> As to what you use to explain to the end client or to regulatory body, is up to you. At least internally you need to have clarity on how the decision was arrived at. >> When you were talking about feature selection and feature engineering and model building, how much of that is being done by AI or things like auto AI? >> John: Yup >> You know, versus humans? >> So, it depends. If it's a relatively straightforward use case, you're dealing with 50, maybe a hundred features. Not a big deal. I mean, a good data scientist can sit down and do that. But, again, I'm going back to the Wunderman Thomas example from this morning's keynote, they're dealing with 20,000 features. You just, that is, you just can't do this economically at scale with a bunch of data scientists, even if they're super data scientists doing this in a programmatic way. So this is where something like auto AI comes into play and says, you know what, out of this 20,000 plus feature set, I can select, no. This percentage, maybe a thousand or 2,000 features that are actually relevant. Two, now here comes interesting things. Not just that it has selected 2,000 features out of 20,000, but it says, if I were to take three of these features and two of these features and combine them. Combine them, maybe to do a transpose. Maybe do an inverse of one and multiply it with something else or whatever, right. Do a logarithm make approach to one and then combine it with something else, XOR, whatever, right. Some combination of operations on these features generates a new feature which boosts the signal in your data. Here is the magic, right. So suddenly you've gone from this huge array of features to a small subset and in there you are saying, okay, if I were to combine these features I can now get much better productivity, prediction power for my model. And that is very good, and auto AI is very heavily used in the Wunderman example. In scenarios like that where you have very large scale feature selection, feature engineering. >> You guys use this concept of the data ladder, collect, organize, analyze, and infuse. Correct me if I'm wrong, but a lot of data scientists times is spent collecting, organizing. They want to do more analysis and so ultimately they can infuse. Talk about that analyze portion and how to get there? What kind of progress the industry, generally and IBM is making to help data scientists? >> So analyzers typically.. You don't jump into building machine learning models. The first part is to just do explore re-analysis. You know, age old exploration of your data to understand what is there. I mean people jump into the exhibit first and it's normal, but if you don't understand what your data is telling you, it is foolish to expect magic to happen from your data. So, explorate reanalysis, your traditional approaches. You start there. Then you say, in that context I think I can do model building to solve a particular business problem and then comes the discussion, okay am I using neural nets or am using classical mechanisms, am I doing this framework, XGBoost or Tensorflow? All of that is secondary once you get to explorate reanalysis, looking at framing the business problem as a set of models that can be built, then say what technique do I use now. And auto AI, for example, will help you select the algorithms once you have framed the problem. It's says, should I use lite GBN? Should I use something else? Should I use logistic regression? Whatever, right. So, it is something that the algorithm selection can be helped by auto AI. >> John, we're up against the clock. Great to have you. A wonderful discussion Thanks so much, really appreciate it. >> Absolutely, absolutely. >> Good to see you again. >> Yup, same here. >> All right. Thanks for watching everybody. We'll be right back right after this short break. You're watching theCUBE from the IBM Data and AI Forum in Miami. We'll be right back. (upbeat music)

Published Date : Oct 22 2019

SUMMARY :

Brought to you by IBM. John, great to see you again. I'm always excited to talk to you It is a paradigm that exists in the world Dave: So when you think about dev ops, How do make sure that when you move these assets So once you find a winner, How do I make sure that the model is not unfairly biased? So you might have the machine tell you Well, that's what the system allows you to do. So, you can eject the edits of the company Exactly. is going to ask. And if the business can say, It had nothing to do with your credit. by the way, maybe we advise you don't invest This is, being able to find it we're not going to tell you why. Healthcare is another one, right, So this is happening. so people are afraid to actually be transparent. you need to have a good idea why As to what you use to explain to the end client In scenarios like that where you have very large scale and how to get there? select the algorithms once you have framed the problem. Great to have you. from the IBM Data and AI Forum in Miami.

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
John	PERSON	0.99+
John Thomas	PERSON	0.99+
50	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Miami	LOCATION	0.99+
five	QUANTITY	0.99+
Tom	PERSON	0.99+
20,000 features	QUANTITY	0.99+
two	QUANTITY	0.99+
Two	QUANTITY	0.99+
2,000 features	QUANTITY	0.99+
each	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
45	QUANTITY	0.99+
26	QUANTITY	0.99+
one time	QUANTITY	0.99+
IBM Data	ORGANIZATION	0.98+
20,000	QUANTITY	0.98+
first part	QUANTITY	0.98+
three	QUANTITY	0.98+
today	DATE	0.97+
first	QUANTITY	0.97+
a thousand	QUANTITY	0.96+
each individual model	QUANTITY	0.95+
this morning	DATE	0.94+
500	QUANTITY	0.94+
hundred features	QUANTITY	0.93+
first-time	QUANTITY	0.92+
each individual transaction	QUANTITY	0.92+
This morning	DATE	0.89+
IBM Data and AI	ORGANIZATION	0.89+
agile	TITLE	0.88+
one	QUANTITY	0.85+
version 15	OTHER	0.85+
hundred	QUANTITY	0.85+
two of these features	QUANTITY	0.84+
a hundred features	QUANTITY	0.83+
90% confidence	QUANTITY	0.83+
20,000 plus feature set	QUANTITY	0.79+
CUBE	ORGANIZATION	0.78+
Wunderman Thompson	PERSON	0.75+
three of	QUANTITY	0.74+
IBM Data and AI Forum	ORGANIZATION	0.74+
three springs	QUANTITY	0.72+
and AI Forum	EVENT	0.61+
AI Forum	ORGANIZATION	0.59+
couple	QUANTITY	0.56+
HIPPA	ORGANIZATION	0.55+
Thomas	PERSON	0.52+
features	QUANTITY	0.51+
Wunderman	PERSON	0.5+
Wunderman	ORGANIZATION	0.49+
Data	ORGANIZATION	0.33+

Keynote Analysis | IBM Data and AI Forum

>>Live from Miami, Florida. It's the cube covering IBM's data and AI forum brought to you by IBM. >>Welcome everybody to the port of Miami. My name is Dave Vellante and you're watching the cube, the leader in live tech coverage. We go out to the events, we extract the signal from the noise and we're here at the IBM data and AI form. The hashtag is data AI forum. This is IBM's. It's formerly known as the, uh, IBM analytics university. It's a combination of learning peer network and really the focus is on AI and data. And there are about 1700 people here up from, Oh, about half of that last year, uh, when it was the IBM, uh, analytics university, about 600 customers, a few hundred partners. There's press here, there's, there's analysts, and of course the cube is covering this event. We'll be here for one day, 128 hands-on sessions or ER or sessions, 35 hands on labs. As I say, a lot of learning, a lot of technical discussions, a lot of best practices. >>What's happening here. For decades, our industry has marched to the cadence of Moore's law. The idea that you could double the processor performance every 18 months, doubling the number of transistors, you know, within, uh, the footprint that's no longer what's driving innovation in the it and technology industry today. It's a combination of data with machine intelligence applied to that data and cloud. So data we've been collecting data, we've always talked about all this data that we've collected and over the past 10 years with the advent of lower costs, warehousing technologies in file stores like Hadoop, um, with activity going on at the edge with new databases and lower cost data stores that can handle unstructured data as well as structured data. We've amassed this huge amount of, of data that's growing at a, at a nonlinear rate. It's, you know, this, the curve is steepening is exponential. >>So there's all this data and then applying machine intelligence or artificial intelligence with machine learning to that data is the sort of blending of a new cocktail. And then the third piece of that third leg of that stool is the cloud. Why is the cloud important? Well, it's important for several reasons. One is that's where a lot of the data lives too. It's where agility lives. So cloud, cloud, native of dev ops, and being able to spin up infrastructure as code really started in the cloud and it's sort of seeping to to on prem, slowly and hybrid and multi-cloud, ACC architectures. But cloud gives you not only that data access, not only the agility, but also scale, global scale. So you can test things out very cheaply. You can experiment very cheaply with cloud and data and AI. And then once your POC is set and you know it's going to give you business value and the business outcomes you want, you can then scale it globally. >>And that's really what what cloud brings. So this forum here today where the big keynotes, uh, Rob Thomas kicked it off. He uh, uh, actually take that back. A gentleman named Ray Zahab, he's an adventure and ultra marathon or kicked it off. This Jude one time ran 4,500 miles in 111 days with two ultra marathon or colleagues. Um, they had no days off. They traveled through six countries, they traversed Africa, the continent, and he took two showers in a 111 days. And his whole mission is really talking about the power of human beings, uh, and, and the will of humans to really rise above any challenge would with no limits. So that was the sort of theme that, that was set for. This, the, the tone that was set for this conference that Rob Thomas came in and invoked the metaphor of superheroes and superpowers of course, AI and data being two of those three superpowers that I talked about in addition to cloud. >>So Rob talked about, uh, eliminating the good to find the great, he talked about some of the experiences with Disney's ward. Uh, ward Kimball and Stanley, uh, ward Kimball went to, uh, uh, Walt Disney with this amazing animation. And Walter said, I love it. It was so funny. It was so beautiful, was so amazing. Your work 283 days on this. I'm cutting it out. So Rob talked about cutting out the good to find, uh, the great, um, also talking about AI is penetrated only about four to 10% within organizations. Why is that? Why is it so low? He said there are three things that are blockers. They're there. One is data and he specifically is referring to data quality. The second is trust and the third is skillsets. So he then talked about, you know, of course dovetailed a bunch of IBM products and capabilities, uh, into, you know, those, those blockers, those challenges. >>He talked about two in particular, IBM cloud pack for data, which is this way to sort of virtualize data across different clouds and on prem and hybrid and and basically being able to pull different data stores in, virtualize it, combine join data and be able to act on it and apply a machine learning and AI to it. And then auto AI a way to basically machine intelligence for artificial intelligence. In other words, AI for AI. What's an example? How do I choose the right algorithm and that's the best fit for the use case that I'm using. Let machines do that. They've got experience and they can have models that are trained to actually get the best fit. So we talked about that, talked about a customer, a panel, a Miami Dade County, a Wunderman Thompson, and the standard bank of South Africa. These are incumbents that are using a machine intelligence and AI to actually try to super supercharge their business. We heard a use case with the Royal bank of Scotland, uh, basically applying AI and driving their net promoter score. So we'll talk some more about that. Um, and we're going to be here all day today, uh, interviewing executives, uh, from, uh, from IBM, talking about, you know, what customers are doing with a, uh, getting the feedback from the analysts. So this is what we do. Keep it right there, buddy. We're in Miami all day long. This is Dave Olanta. You're watching the cube. We'll be right back right after this short break..

Published Date : Oct 22 2019

SUMMARY :

IBM's data and AI forum brought to you by IBM. It's a combination of learning peer network and really the focus is doubling the number of transistors, you know, within, uh, the footprint that's in the cloud and it's sort of seeping to to on prem, slowly and hybrid and multi-cloud, really talking about the power of human beings, uh, and, and the will of humans So Rob talked about cutting out the good to find, and that's the best fit for the use case that I'm using.

ENTITIES

Entity	Category	Confidence
Ray Zahab	PERSON	0.99+
Miami	LOCATION	0.99+
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
Dave Olanta	PERSON	0.99+
4,500 miles	QUANTITY	0.99+
35 hands	QUANTITY	0.99+
Stanley	PERSON	0.99+
two	QUANTITY	0.99+
six countries	QUANTITY	0.99+
128 hands	QUANTITY	0.99+
111 days	QUANTITY	0.99+
Walter	PERSON	0.99+
Rob	PERSON	0.99+
Africa	LOCATION	0.99+
Jude	PERSON	0.99+
one day	QUANTITY	0.99+
283 days	QUANTITY	0.99+
third piece	QUANTITY	0.99+
Miami, Florida	LOCATION	0.99+
Wunderman Thompson	ORGANIZATION	0.99+
Royal bank of Scotland	ORGANIZATION	0.99+
One	QUANTITY	0.99+
third	QUANTITY	0.99+
today	DATE	0.99+
second	QUANTITY	0.98+
last year	DATE	0.98+
about 600 customers	QUANTITY	0.98+
third leg	QUANTITY	0.98+
South Africa	LOCATION	0.97+
one time	QUANTITY	0.97+
three things	QUANTITY	0.96+
IBM Data	ORGANIZATION	0.96+
about 1700 people	QUANTITY	0.96+
three superpowers	QUANTITY	0.96+
two ultra marathon	QUANTITY	0.95+
Kimball	PERSON	0.95+
two showers	QUANTITY	0.94+
10%	QUANTITY	0.94+
about four	QUANTITY	0.88+
IBM analytics university	ORGANIZATION	0.86+
Miami Dade County	LOCATION	0.8+
18 months	QUANTITY	0.78+
hundred partners	QUANTITY	0.76+
decades	QUANTITY	0.74+
university	ORGANIZATION	0.73+
ward	PERSON	0.69+
Disney	ORGANIZATION	0.69+
Hadoop	TITLE	0.67+
Moore	PERSON	0.6+
years	DATE	0.59+
Walt	PERSON	0.58+
Disney	PERSON	0.5+
10	QUANTITY	0.46+
half	QUANTITY	0.4+
past	DATE	0.39+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Wunderman Thompson: