John Thomas, IBM Data and AI | IBM Data and AI Forum
(upbeat music) >> Announcer: Live from Miami, Florida. It's theCUBE. Covering IBM's Data and AI Forum. Brought to you by IBM. >> We're back in Miami everybody. You're watching theCube, the leader in live tech coverage. We go out to the events and extract the signal from the noise we hear. Covering the IBM Data and AI Forum, John Thomas is here, many time CUBE guest. He's not only a distinguished engineer but he's also the chief data scientist for IBM Data and AI. John, great to see you again. >> Great to see you again Dave. >> I'm always excited to talk to you because you're hard core data science. You're working with the customers and you're kind of where the action is. The watchword today is end to end data science life cycle. What's behind that? I mean it's been a lot of experimentation, a lot of tactical things going on. You're talking about end to end life cycle, explain. >> So Dave, what we are saying in our client engagements is, actually working with the data, building the models. That part is relatively easy. The tougher part is to make the business understand what is the true value of this. So it's not a science project, right? It is not a, an academic exercise. So how do you do that? In order for that to happen these models need to go into production. Well, okay, well how do you do that? There is this business of, I've got something in my development environment that needs to move up through QA and staging, and then to production. Well, lot of different things need to happen as you go through that process. How do you do this? See this is not a new paradigm. It is a paradigm that exists in the world of application development. You got to go through a dev ops life cycle. You got to go through continuous integration and continuous delivery mindset. You got to have the same rigor in data science. Then at the front end of this is, what business problem are you actually solving? Do you have business KPIs for that? And when the model is actually is in production, can you track, can you monitor the performance of the model against the business KPIs that the business cares about? And how do you do this on an end to end fashion? And then in there is retraining the model when performance degrades, et cetera, et cetera. But this notion of following dev ops mindset in the world of data science is absolutely essential. >> Dave: So when you think about dev ops, you think of agile. So help me square this circle, when you think end to end data life cycle, you think chewy, big, waterfall, but I'm inferring you're not prescribing a waterfall. >> John: No, no, no. >> So how are organizations dealing with that wholistic end to end view but still doing it in an agile manner? >> Yeah, exactly. So, I always say do not boil the ocean, especially if you're approaching AI use cases. Start with something that is convened, that you can define and break it into springs. So taking an agile approach to this. Two, three springs, if you're not seeing value in those two, three springs, go back to the drawing board and see what is it that you're doing wrong. So for each of your springs, what is the specific successful criteria that you care about and the business cares about? Now, as you go through this process, you need a mechanism to look at, okay, well I've got something in development, how do I move the assets? Not just the model, but, what is the set of features that you're working with? What is the data prep pipeline? What are the scripts being used to evaluate the model? All of these things are logical assets surrounding the model. How do you move them from development to staging? How do you do QA against these set of assets? Then how do you do third party approval oversight? How do you do code review? How do make sure that when you move these assets all of the surrounding mechanisms are being adhered to, compliance requirements, regulatory requirements? And then finally get them to production. So there's a technology aspect of it, obviously. You have a lot of discussion around cube flow, ml flow, et cetera, et cetera as technology options. But there is also mindset that needs to be followed here. >> So once you find a winner, business people want a scale, 'cause they can make more money the more and more times they can replicate that value. And I want to understand this trust and transparent, 'cause when you scale, if you're scaling things that aren't compliant, you're in trouble. But before we get there, I wonder if we can take an example of, pick an industry, or some kind of use case where you've seen this end to end life cycle be successful. >> Yeah, across industries. I mean it's not just specific industry related. But, I'll give you an example. This morning Wunderman Thompson was talking about how they are applying machine learning to, a very difficult problem, which is how to improve how they create a first-time buyer list for their clients. But think of the problem here. It's not just about a one time building of a model. The model needs, okay you got data, understand what data says you're working with, what is the lineage of that data. Once I have their understanding of their data then I get into feature selection, feature engineering, all the steps that I need in your machine learning cycle. Once I am done with selecting my features, doing my feature engineering, I go into model building. Now, it's a pipeline that is being built. It is not a one time activity. Once that model, the pipeline has been vetted, you got to move it from development to your QA environment, from there to your production environment, and so on. And here comes, and this is where it links to the question, transparency discussion. Well the model is in production, how do I make sure the model is being fair? How do I make sure that I can explain what is going on? How do I make sure that the model is not unfairly biased? So all of these are important discussions in the trust and transparency because, you know, people are going to question the outcome of the model. Why did it make a decision? If a campaign was run for an end individual, why did you choose him and not somebody else? If it's a credit card fraud detection scenario, why was somebody tagged as fraudulent and not the other person? If a loan application was rejected, why was he rejected and not someone else? You got to explain this. So, it's not an explain ability that Tom has a lot of, it's over loaded at times, but. The idea here is you should be able to retrace your steps back to an individual scoring activity and explain an individual transaction. You should be able to play back an individual transaction and say version 15 of my model used these features, these hundred features for it's scoring. This was the incoming payload, this was the outcome, and, if I had changed five of my incoming payload variables out of the 500 I use, or hundred I use, the outcome would have been different. Now you can say, you know what, ethnicity, age, education, gender. These parameters did play a role in the decision but they were within the fairness bracket. And the fairness bracket is something that you have to define. >> So, if I could play that back. Take fraud detection. So you might have the machine tell you with 90% confidence or greater that this is fraud but it throws back a false positive. When you dig in, you might see well there's some bias included in there. Then what? You would kind of re-factor the model? >> A couple of different things. Sometimes a bias is in the data itself and it may be valid bias. And you may not want to change that. Well, that's what the system allows you to do. It tells you, this is the kind of bias that exists in the data already. And you can make a business decision as to whether it is good to retain that bias or to correct it in the data itself. Now, if the bias is in how the algorithm is processing their data, again, it's a business decision. Should I correct it or not. Sometimes, bias is not a bad thing. (laughs) It's not a bad thing. No, because, you are actually looking at what signal exists in their data. But what you want to make sure is that it's fair. Now what is fair, that is up to the regulatory body. Are your business defined? You know what, age range between 26 and 45, I want to treat them a certain way. If this is a conscious decision that you, as a business, or your industry is making, that's fair game. But if it is, this is what I wanted that model to do for this age range but the model is behaving a different way, I want to catch that. And I want to either fix the bias in the data or in how the algorithm is behaving with the model itself. >> So, you can eject the edits of the company into the model, but then, and then appropriately and fairly apply that, as long as it doesn't break the law. >> Exactly. (laughs) >> Which is another part of the compliance. >> So, this is not just about compliance. Compliance is a big, big part here. But, this also just answering what your end customer is going to ask. I put in an application for a loan and I was rejected. And, I want an explanation as to why it was rejected, right? >> So you got to be transparent, is your point there. >> Exactly, exactly. And if the business can say, you know what, this is the criteria we used, you fell in this range, and this, in our mind, is a fair range, that is okay. It may not be okay for the end customer but at least you have a valid explanation for why the decision was made by the model. So, it's some black box making some.. >> So the bank might say, well, the decision was made because we don't like the location of the property, we think they're over valued. It had nothing to do with your credit. >> John: Exactly. >> We just don't want to invest in this, by the way, maybe we advise you don't invest in that either. >> Right, right, right. >> So that feedback loop is there. >> This is, being able to find it for each individual transaction, each individual model scoring. What weighed in into the decision that was made by the model. This is important. >> So you got to have atomic access to that data? >> John: At the transaction level. >> And then make it transparent. Are organizations, banks, are they actually making it transparent to their consumers, 'cause I know in situations that I'm involved in, it's either okay go or no but, we're not going to tell you why. >> Everyone is beginning to look into this place. >> Healthcare is another one, right, where we would love more transparency in healthcare. >> Exactly. So this is happening. This is happening where people are looking at oh we can't do just black box in decision making, we have to get serious about this. >> And I wonder, John, if a lot of that black box decision making is just easy to not share information. Healthcare, you're worried about HIPPA. Financial services is just so highly regulated so people are afraid to actually be transparent. >> John: Yup. >> But machine intelligence potentially solves that problem? >> So, internally, at least internal to the company, when the decision is made, you need to have a good idea why the decision was made, right. >> Yeah right. >> As to what you use to explain to the end client or to regulatory body, is up to you. At least internally you need to have clarity on how the decision was arrived at. >> When you were talking about feature selection and feature engineering and model building, how much of that is being done by AI or things like auto AI? >> John: Yup >> You know, versus humans? >> So, it depends. If it's a relatively straightforward use case, you're dealing with 50, maybe a hundred features. Not a big deal. I mean, a good data scientist can sit down and do that. But, again, I'm going back to the Wunderman Thomas example from this morning's keynote, they're dealing with 20,000 features. You just, that is, you just can't do this economically at scale with a bunch of data scientists, even if they're super data scientists doing this in a programmatic way. So this is where something like auto AI comes into play and says, you know what, out of this 20,000 plus feature set, I can select, no. This percentage, maybe a thousand or 2,000 features that are actually relevant. Two, now here comes interesting things. Not just that it has selected 2,000 features out of 20,000, but it says, if I were to take three of these features and two of these features and combine them. Combine them, maybe to do a transpose. Maybe do an inverse of one and multiply it with something else or whatever, right. Do a logarithm make approach to one and then combine it with something else, XOR, whatever, right. Some combination of operations on these features generates a new feature which boosts the signal in your data. Here is the magic, right. So suddenly you've gone from this huge array of features to a small subset and in there you are saying, okay, if I were to combine these features I can now get much better productivity, prediction power for my model. And that is very good, and auto AI is very heavily used in the Wunderman example. In scenarios like that where you have very large scale feature selection, feature engineering. >> You guys use this concept of the data ladder, collect, organize, analyze, and infuse. Correct me if I'm wrong, but a lot of data scientists times is spent collecting, organizing. They want to do more analysis and so ultimately they can infuse. Talk about that analyze portion and how to get there? What kind of progress the industry, generally and IBM is making to help data scientists? >> So analyzers typically.. You don't jump into building machine learning models. The first part is to just do explore re-analysis. You know, age old exploration of your data to understand what is there. I mean people jump into the exhibit first and it's normal, but if you don't understand what your data is telling you, it is foolish to expect magic to happen from your data. So, explorate reanalysis, your traditional approaches. You start there. Then you say, in that context I think I can do model building to solve a particular business problem and then comes the discussion, okay am I using neural nets or am using classical mechanisms, am I doing this framework, XGBoost or Tensorflow? All of that is secondary once you get to explorate reanalysis, looking at framing the business problem as a set of models that can be built, then say what technique do I use now. And auto AI, for example, will help you select the algorithms once you have framed the problem. It's says, should I use lite GBN? Should I use something else? Should I use logistic regression? Whatever, right. So, it is something that the algorithm selection can be helped by auto AI. >> John, we're up against the clock. Great to have you. A wonderful discussion Thanks so much, really appreciate it. >> Absolutely, absolutely. >> Good to see you again. >> Yup, same here. >> All right. Thanks for watching everybody. We'll be right back right after this short break. You're watching theCUBE from the IBM Data and AI Forum in Miami. We'll be right back. (upbeat music)
SUMMARY :
Brought to you by IBM. John, great to see you again. I'm always excited to talk to you It is a paradigm that exists in the world Dave: So when you think about dev ops, How do make sure that when you move these assets So once you find a winner, How do I make sure that the model is not unfairly biased? So you might have the machine tell you Well, that's what the system allows you to do. So, you can eject the edits of the company Exactly. is going to ask. And if the business can say, It had nothing to do with your credit. by the way, maybe we advise you don't invest This is, being able to find it we're not going to tell you why. Healthcare is another one, right, So this is happening. so people are afraid to actually be transparent. you need to have a good idea why As to what you use to explain to the end client In scenarios like that where you have very large scale and how to get there? select the algorithms once you have framed the problem. Great to have you. from the IBM Data and AI Forum in Miami.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
50 | QUANTITY | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Miami | LOCATION | 0.99+ |
five | QUANTITY | 0.99+ |
Tom | PERSON | 0.99+ |
20,000 features | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Two | QUANTITY | 0.99+ |
2,000 features | QUANTITY | 0.99+ |
each | QUANTITY | 0.99+ |
Miami, Florida | LOCATION | 0.99+ |
45 | QUANTITY | 0.99+ |
26 | QUANTITY | 0.99+ |
one time | QUANTITY | 0.99+ |
IBM Data | ORGANIZATION | 0.98+ |
20,000 | QUANTITY | 0.98+ |
first part | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
first | QUANTITY | 0.97+ |
a thousand | QUANTITY | 0.96+ |
each individual model | QUANTITY | 0.95+ |
this morning | DATE | 0.94+ |
500 | QUANTITY | 0.94+ |
hundred features | QUANTITY | 0.93+ |
first-time | QUANTITY | 0.92+ |
each individual transaction | QUANTITY | 0.92+ |
This morning | DATE | 0.89+ |
IBM Data and AI | ORGANIZATION | 0.89+ |
agile | TITLE | 0.88+ |
one | QUANTITY | 0.85+ |
version 15 | OTHER | 0.85+ |
hundred | QUANTITY | 0.85+ |
two of these features | QUANTITY | 0.84+ |
a hundred features | QUANTITY | 0.83+ |
90% confidence | QUANTITY | 0.83+ |
20,000 plus feature set | QUANTITY | 0.79+ |
CUBE | ORGANIZATION | 0.78+ |
Wunderman Thompson | PERSON | 0.75+ |
three of | QUANTITY | 0.74+ |
IBM Data and AI Forum | ORGANIZATION | 0.74+ |
three springs | QUANTITY | 0.72+ |
and AI Forum | EVENT | 0.61+ |
AI Forum | ORGANIZATION | 0.59+ |
couple | QUANTITY | 0.56+ |
HIPPA | ORGANIZATION | 0.55+ |
Thomas | PERSON | 0.52+ |
features | QUANTITY | 0.51+ |
Wunderman | PERSON | 0.5+ |
Wunderman | ORGANIZATION | 0.49+ |
Data | ORGANIZATION | 0.33+ |
John Thomas & Steven Eliuk, IBM | IBM CDO Summit 2019
>> Live from San Francisco, California, it's theCUBE, covering the IBM Chief Data Officer Summit. Brought to you by IBM. >> We're back at San Francisco. We're here at Fisherman's Wharf covering the IBM Chief Data Officer event #IBMCDO. This is the tenth year of this event. They tend to bookend them both in San Francisco and in Boston, and you're watching theCUBE, the leader in live tech coverage. My name is Dave Valante. John Thomas is here, Cube alum and distinguished engineer, Director of Analytics at IBM, and somebody who provides technical direction to the data science elite team. John, good to see you again. Steve Aliouk is back. He is the Vice President of Deep Learning in the Global Chief Data Office, thanks for comin' on again. >> No problem. >> Let's get into it. So John, you and I have talked over the years at this event. What's new these days, what are you working on? >> So Dave, still working with clients on implementing data science and AI data use cases, mostly enterprise clients, and seeing a variety of different things developing in that space. Things have moved into broader discussions around AI and how to actually get value out of that. >> Okay, so I know one of the things that you've talked about is operationalizing machine intelligence and AI and cognitive and that's always a challenge, right. Sounds good, we see this potential but unless you change the operating model, you're not going to get the type of business value, so how do you operationalize AI? >> Yeah, this is a good question Dave. So, enterprises, many of them, are beginning to realize that it is not enough to focus on just the coding and development of the models, right. So they can hire super-talented Python TensorFlow programmers and get the model building done, but there's no value in it until these models actually are operationalized in the context of the business. So one aspect of this is, actually we know, we are thinking of this in a very systematic way and talking about this in a prescriptive way. So, you've got to scope your use cases out. You got to understand what is involved in implementing the use case. Then the steps are build, run, manage, and each of these have technical aspects and business aspects around, right. So most people jump right into the build aspect, which is writing the code. Yeah, that's great, but once you build the code, build the models by writing code, how do you actually deploy these models? Whether that is for online invocation or back storing or whatever, how do you manage the performance of these models over time, how do you retrain these models, and most importantly, when these models are in production, how do I actually understand the business metrics around them? 'Cause this goes back to that first step of scoping. What are the business KPI's that the line of business cares about? The data scientist talks about data science metrics, position and recall and Area Under the ROC Curve and accuracy and so on. But how do these relate to business KPI's. >> All right, so we're going to get into each of those steps in a moment, but Steve I want to ask you, so part of your charter, Inderpal, Global Chief Data Officer, you guys have to do this for IBM, right, drink your own champagne, dog footing, whatever you call it. But there's real business reasons for you to do that. So how is IBM operationalizing AI? What kind of learnings can you share? >> Well, the beauty is I got a wide portfolio of products that I can pull from, so that's nice. Like things like AI open to Watson, some of the hardware components, all that stuffs kind of being baked in. But part of the reason that John and I want to do this interview together, is because what he's producing, what his thoughts are kind of resonates very well for our own practices internally. We've got so many enterprise use cases, how are we deciding, you know, which ones to work on, which ones have the data, potentially which ones have the biggest business impact, all those KPI's etcetera, also, in addition to, for the practitioners, once we decide on a specific enterprise use case to work on, when have they reached the level where the enterprise is having a return on investment? They don't need to keep refining and refining and refining, or maybe they do, but they don't know these practitioners. So we have to clearly justify it, and scope it accordingly, or these practitioners are left in this kind of limbo, where they're producing things, but not able to iterate effectively for the business, right? So that process is a big problem I'm facing internally. We got hundreds of internal use cases, and we're trying to iterate through them. There's an immense amount of scoping, understanding, etcetera, but at the same time, we're building more and more technical debt, as the process evolves, being able to move from project to project, my team is ballooning, we can't do this, we can't keep growing, they're not going to give me another hundred head count, another hundred head count, so we're definitely need to manage it more appropriately. And that's where this mentality comes in there's-- >> All right, so I got a lot of questions. I want to start unpacking this stuff. So the scope piece, that's we're setting goals, identifying the metrics, success metrics, KPI's, and the like, okay, reasonable starting point. But then you go into this, I think you call it, the explore or understanding phase. What's that all about, is that where governance comes in? >> That's exactly where governance comes in. Right, so because it is, you know, we all know the expression, garbage in, garbage out, if you don't know what data you're working with for your machine learning and deep learning enterprise projects, you will not have the resource that you want. And you might think this is obvious, but in an enterprise setting, understanding where the data comes from, who owns the data, who work on the data, the lineage of that data, who is allowed access to the data, policies and rules around that, it's all important. Because without all of these things in place, the models will be questioned later on, and the value of the models will not realized, right? So that part of exploration or understanding, whatever you want to call it, is about understanding data that has to be used by the ML process, but then at a point in time, the models themselves need to be cataloged, need to be published, because the business as a whole needs to understand what models have been produced out of this data. So who built these models? Just as you have lineage of data, you need lineage of models. You need to understand what API's are associated with the models that are being produced. What are the business KPI's that are linked to model metrics? So all of that is part of this understand and explore path. >> Okay, and then you go to build. I think people understand that, everybody wants to start there, just start the dessert, and then you get into the sort of run and manage piece. Run, you want a time to value, and then when you get to the management phase, you really want to be efficient, cost-effective, and then iterative. Okay, so here's the hard question here is. What you just described, some of the folks, particularly the builders are going to say, "Aw, such a waterfall approach. Just start coding." Remember 15 years ago, it was like, "Okay, how do we "write better software, just start building! "Forget about the requirements, "Just start writing code." Okay, but then what happens, is you have to bolt on governance and security and everything else so, talk about how you are able to maintain agility in this model. >> Yeah, I was going to use the word agile, right? So even in each of these phases, it is an agile approach. So the mindset is about agile sprints and our two week long sprints, with very specific metrics at the end of each sprint that is validated against the line of business requirements. So although it might sound waterfall, you're actually taking an agile approach to each of these steps. And if you are going through this, you have also the option to course correct as it goes along, because think of this, the first step was scoping. The line of business gave you a bunch of business metrics or business KPI's they care about, but somewhere in the build phase, past sprint one or sprint 2, you realize, oh well, you know what, that business KPI is not directly achievable or it needs to be refined or tweaked. And there is that circle back with the line of business and a course correction as it was. So it's a very agile approach that you have to take. >> Are they, are they, That's I think right on, because again, if you go and bolt on compliance and governance and security after the fact, we know from years of experience, that it really doesn't work well. You build up technical debt faster. But are these quasi-parallel? I mean there's somethings that you can do in build as the scoping is going on. Is there collaboration so you can describe, can you describe that a little bit? >> Absolutely, so for example, if I know the domain of the problem, I can actually get started with templates that help me accelerate the build process. So I think in your group, for example, IBM internally, there are many, many templates these guys are using. Want to talk a little bit about that? >> Well, we can't just start building up every single time. You know, that's again, I'm going to use this word and really resonate it, you know it's not extensible. Each project, we have to get to the point of using templates, so we had to look at those initiatives and invest in those initiatives, 'cause initially it's harder. But at least once we have some of those cookie-cutter templates and some of them, they might have to have abstractions around certain parts of them, but that's the only way we're ever able to kind of tackle so many problems. So no, without a doubt, it's an important consideration, but at the same time, you have to appreciate there's a lot of projects that are fundamentally different. And that's when you have to have very senior people kind of looking at how to abstract those templates to make them reusable and consumable by others. >> But the team structure, it's not a single amoeba going through all these steps right? These are smaller teams that are, and then there's some threading between each step? >> This is important. >> Yeah, that's tough. We were just talking about that concept. >> Just talking about skills and >> The bind between those groups is something that we're trying to figure out how to break down. 'Cause that's something he recognizes, I recognize internally, but understanding that those peoples tasks, they're never going to be able to iterate through different enterprise problems, unless they break down those borders and really invest in the communication and building those tools. >> Exactly, you talk about full stack teams. So you, it is not enough to have coding skills obviously. >> Right. What is the skill needed to get this into a run environment, right? What is the skill needed to take metrics like not metrics, but explainability, fairness in the moderates, and map that to business metrics. That's a very different skill from Python coding skills. So full stack teams are important, and at the beginning of this process where someone, line of business throws 100 different ideas at you, and you have to go through the scoping exercise, that is a very specific skill that is needed, working together with your coders and runtime administrators. Because how do you define the business KPI's and how do you refine them later on in the life cycle? And how do you translate between line of business lingo and what the coders are going to call it? So it's a full stack team concept. It may not necessarily all be in one group, it may be, but they have to work together across these different side loads to make it successful. >> All right guys, we got to leave it there, the trains are backing up here at IBM CDO conference. Thanks so much for sharing the perspectives on this. All right, keep it right there everybody. You're watchin' "theCUBE" from San Francisco, we're here at Fisherman's Wharf. The IBM Chief Data Officer event. Right back. (bubbly electronic music)
SUMMARY :
Brought to you by IBM. John, good to see you again. So John, you and I have talked over the years at this event. and how to actually get value out of that. Okay, so I know one of the things that you've talked about and development of the models, right. What kind of learnings can you share? as the process evolves, being able to move KPI's, and the like, okay, reasonable starting point. the models themselves need to be cataloged, just start the dessert, and then you get into So it's a very agile approach that you have to take. can do in build as the scoping is going on. that help me accelerate the build process. but at the same time, you have to appreciate Yeah, that's tough. and really invest in the communication Exactly, you talk about full stack teams. What is the skill needed to take metrics like Thanks so much for sharing the perspectives on this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve Aliouk | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
tenth year | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
San Francisco, California | LOCATION | 0.99+ |
each | QUANTITY | 0.99+ |
two week | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
100 different ideas | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Each project | QUANTITY | 0.99+ |
each step | QUANTITY | 0.98+ |
each sprint | QUANTITY | 0.98+ |
15 years ago | DATE | 0.98+ |
one aspect | QUANTITY | 0.98+ |
Fisherman's Wharf | LOCATION | 0.98+ |
IBM Chief Data Officer Summit | EVENT | 0.97+ |
Chief Data Officer | EVENT | 0.96+ |
both | QUANTITY | 0.96+ |
one group | QUANTITY | 0.96+ |
single | QUANTITY | 0.95+ |
IBM CDO | EVENT | 0.95+ |
one | QUANTITY | 0.95+ |
theCUBE | TITLE | 0.94+ |
hundred head count | QUANTITY | 0.94+ |
IBM CDO Summit 2019 | EVENT | 0.94+ |
Global Chief Data Office | ORGANIZATION | 0.9+ |
Vice President | PERSON | 0.88+ |
#IBMCDO | EVENT | 0.84+ |
single time | QUANTITY | 0.83+ |
agile | TITLE | 0.81+ |
Inderpal | PERSON | 0.8+ |
Deep Learning | ORGANIZATION | 0.76+ |
Chief | EVENT | 0.72+ |
Watson | TITLE | 0.69+ |
Officer | EVENT | 0.69+ |
sprint 2 | OTHER | 0.65+ |
use cases | QUANTITY | 0.62+ |
Global | PERSON | 0.57+ |
once | QUANTITY | 0.56+ |
Chief Data Officer | PERSON | 0.53+ |
Cube | ORGANIZATION | 0.49+ |
theCUBE | EVENT | 0.45+ |
Karthik Lakshminarayanan, Google & Kim Perrin, Doctor on Demand | Google Cloud Next 2019
>> live from San Francisco. It's the Cube covering Google Club Next nineteen Rodeo by Google Cloud and its ecosystem partners. >> Hey, welcome back. Everyone's the live Cube covers here in San Francisco for Google Cloud. Next nineteen. I'm Javert Day Volante here on the ground floor, day two of three days of wall to wall coverage to great guests. We got Kartik lost. Meena Ryan, product management director of Cloud Identity for Google and Kim parent chief security officer for Doctor on Demand. Guys, welcome to the Cube. Appreciated Coming on. >> Great to be here. >> Thank you so honestly Way covering Google Cloud and Google for many, many years. And one of the things that jumps out at me, besides allows the transformation for the enterprise is Google's always had great technology, and last year I did an interview, and we learned a lot about what's going on the chip level with the devices you got. Chrome browser. Always extension. All these security features built into a lot of the edge devices that Google has, so there's definitely a security DNA in there and Google the world. But now, when you start getting into cloud access and permissions yesterday and the Kino, Thomas Kurian and Jennifer Lin said, Hey, let's focus on agility. Not all his access stuff. This is kind of really were identity matters. Kartik talk about what's going on with cloud identity. Where are we? What's the big news? >> Yeah, thank you. So clouded. Entities are solution to manage identity devices and the whole axis management for the clouds. And you must have heard of beyond Corp and the whole zero trust model and access. One thing we know about the cloud if you don't make the access simple and easy and at the same time you don't provide security. You can get it right. So you need security and you need that consumer level simplicity. >> Think it meant explain beyond core. This is important. Just take a minute to refresh for the folks that might not know some of the innovations. They're just start >> awesome. Yeah. So traditional on premises world, the security model was your corporate network. Your trust smaller. Lose The corporate network invested a lot to get to keep the bad people out. You get the right people on and that made ten T applications on premises. Your data was on premises now the Internet being a new network, you work from anywhere. Work is no longer a thing. You work from anywhere. What gets done right? So what is the new access? More look like? That's what people have been struggling with. What Google came up with in two thousand eleven is this model called Beyond Core versus Security Access Model will rely on three things. Who you are is a user authentication the device identity and security question and last but not least, the context off. What are you trying to access in very trying to access from So these things together from how you security and access model And this is all about identity. And this is Bianca. >> And anyone who has a mobile device knows what two factor authentication is. That's when you get a text messages. That's just two factor M. F. A multi factor. Authentication really is where the action is, and you mentioned three of them. There's also other dimensions. This is where you guys are really taking to the next level. Yeah, where are we with FAA and some of the advances around multi factor >> s O. So I think keeping you on the highlight is wear always about customer choice. We meet customers where they are. So customers today have invested in things like one time use passwords and things like that. So we support all of that here in cloud identity. But a technology that we are super excited about the security, Keith. And it's built on the fighter standard. And it's inserted this into your USB slot of that make sense. And we just announced here at next you can now use your android phone as a security key. So this basically means you don't have to enter any codes because all those codes you enter can be fished on way. Have this thing at Google and we talked about it last time. Since we roll our security keys. No Google account, it's >> harder for the hackers. Really Good job, Kim. Let's get the reality. You run a business. You've been involved in a lot of start ups. You've been cloud nated with your company. Now talk about your environment does at the end of the year, the chief security officer, the buck stops with you. You've got to figure this out. How are you dealing with all this? These threats at the same time trying to be innovative with your company. >> So for clarity. So I've been there six years since the very beginning of the company. And we started the company with zero hardware, all cloud and before there was beaten beyond Corp. Where there was it was called de-perimeterization. And that's effectively the posture we took from the very beginning so our users could go anywhere. And our I always say, our corporate network is like your local coffee shop. You know, WiFi like that's the way we view it. We wanted to be just a secure there at the coffee shop, you know, we don't care. Like we always have people assessing us and they're looking at a corporate network saying, You know, where your switches that you're, you know, like where your hardware like, we want to come in and look at all like we don't have anything like, >> there's no force. The scan >> is like way. Just all go to the Starbucks will be the same thing. So that's part of it. And now you know, when we started like way wanted to wrap a lot of our services in the Google, but we had the problem with hip a compliance. So in the early days, Google didn't have six years ago. In our early days, Google didn't have a lot of hip, a compliant services. Now they do. Now we're moving. We're trying to move everything we do almost in the Google. That's not because we just love everything about Google. It's for me. I have assessed Google security are team has assessed their security. We have contracts with them and in health care. It's very hard to take on new vendors and say Hey, is there security? Okay, are their contracts okay? It's like a months long process and then even at the end of the day, you still have another vendor out there that sharing your day, that you're sharing your data with them and it's precarious for me. It just it doubles my threat landscape. When I go from Google toe one more, it's like if I put my data there, >> so you're saying multi vendor the old way. This is actually a problematic situation for you. Both technically and what operate timewise or both are super >> problematic for me in terms of like where we spread our data to like It just means that company every hack against that company is brutal for us, like And you know, the other side of the equation is Google has really good pricing. Comparatively, yes, Today we're talking about Big Query, for example, and they wanted to compare Big Query to some other systems and be crazy. G, c p. And And we looked at the other systems and we couldn't find the pricing online. And, like Google's pricing was right there was completely transparent. Easy to understand. The >> security's been vetted. The security's >> exactly Kim. Can you explain when you said the multi vendor of creates problems for you? Why is this? Is it not so much that one vendor is better? The other assistant? It's different. It's different processes or their discernible differences in the quality of the security. >> There are definitely discernible differences in quality, for sure. Yeah, >> and then add to that different processes. Skill sets. Is that writer? Yes, Double click on that E >> everybody away. There's always some I mean almost every vendor. You know, there's always something that you're not perfectly okay with. On the part of the security is something you don't totally like about it. And the more vendors you add, you have. Okay. This person, they're not too good on their physical security at their data center or they're not too good on their policies. They're not too good on their disaster recovery. Like there's you always give a little bit somewhere. I hate to say it, but it's true. It's like nobody's super >> perfect like it's It's so it's a multiplication effects on the trade offs that you have to make. Yeah, it's necessarily bad, but it's just not the way you want to do it. All right? Okay. >> All the time. So you got to get in an S L A u have meetings. You gotta do something vetting. It's learning curves like on the airport taking your shoes off. Yeah. Yeah. And then there's the >> other part. Beyond the security is also downtime. Like if they suffer downtime. How much is that going to impact our company? >> Karthik, you talked about this This new access mall, this three layer who authentication that is the device trusted in the context. I don't understand how you balance the ratio between sort of false positives versus blocking. I think for authentication and devices pretty clear I can authenticate. You are. I don't trust this device. You're not getting in, but the context is interesting. Is that like a tap on the shoulder with with looking at mail? Hey, be careful. Or how are you balancing that? The context realm? >> Yeah, I think it's all about customer choice. Again, customers have, but they look at their application footprint there, making clear decisions on Hey, this is a parole application is a super sensitive as an example, maybe about based meeting application. Brotherly, not a sensitive. So when they're making decisions about hey, you have a manage device. I will need a manage device in order for you to access the payroll application. But if you have you bring your own device. I'm off perfectly fine if you launch a meeting from that. So those are the levels that people are making decisions on today, and it's super easy to segment and classify your application. >> Talk about the the people that are out there watching might say, You know what? I've been really struggling with identity. I've had, you know, l'd app servers at all this stuff out there, you name it. They've all kinds of access medals over the years, the perimeters now gone. So I got a deal to coffee shop, kind of working experience and multiple devices. All these things are reality. I gotta put a plan together. So the folks that are trying to figure this out, what's that? You guys have both weigh in on on approach to take or certain framework. What's what's? How does someone get the first few steps off to go out towards good cloud identity? >> Sure, I only go first, so I think many ways. That's what we try to simplify it. One solution that we call cloud identity because what people want is I want that model. Seems like a huge mountain in front of me, like how do I figure these things out? I'm getting a lot of these terminologies, so I think the key is to just get started on. We've given them lots of ways. You can take the whole of cloud identity solution back to Kim's point. It can be one license from us, that's it and you're done. It's one unified. You I thinks like that. You can also, if you just want to run state three applications on DCP we have something called identity ofher Proxy. It's very fast. Just load yaps random on disability and experience this beyond >> work Classic enterprise Khun >> Yeah, you run all the applications and dcpd and you can And now they're announcing some things that help you connect back with John Thomas application. That's a great way to get started. >> Karthik painted this picture of Okay, it's no perimeter. You can't just dig a moat. The queen wants to leave the castle. All the security, you know, metaphors that we use. I'm interested in how you're approaching response to these days because you have to make trade us because there are discernible differences with different vendors. Make the assumption that people are going to get in so response becomes increasingly important. What have you changed to respond more quickly? What is Google doing to help? >> Well, yeah, So in a model where we are using, a lot of different vendors were having to like they're not necessarily giving us response and detection. Google. Every service we'd wrap into them automatically gets effectively gets wrapped into our security dashboard. There's a couple of different passwords we can use and weaken. Do reporting. We do it. A tremendous amount of compliance content, compliance controls on our DLP, out of e mail out of Dr and there's detection. There's like it's like we don't have to buy an extra tool for detection for every different type of service we have, it's just built into the Google platform, which is it's It's phenomenal from >> detection baked in, It's just >> baked in. We're not to pay extra for it. In fact, I mean way by the enterprise license because it's completely worth it for us. Um, you know, assumes that came out, the enterprise part of it and all the extra tools. We were just immediately on that because the vault is a big thing for us as well. It's like not only response, but how you dig through your assets toe. Look for evidence of things like, if you have some sort of legal case, you need vault, Tio, you know, make the proper ah, data store for that stuff >> is prioritization to Is it not like, figure it out? Okay, which, which threats to actually go after and step out? And I guess other automation. I mean, I don't know if you're automating your run book and things of that nature. But automation is our friends. Ah, big friend of starting >> on the product measures I What's the roadmap looks like and you share any insight into what your priorities are to go the next level. Aussie Enterprise Focus. For Google Cloud is clear Customs on stage. You guys have got a lot of integration points from Chromebooks G Sweep all the way down through Big Query with Auto ML All the stuff's happening. What's on your plate for road map? What things are you innovating around? >> I mean, it's beyond car vision that we're continuing to roll out. We've just ruled out this bit of a sweet access, for example, but all these conditions come in. Do you want to take that to G et? You're gonna look. We're looking at extending that context framework with all the third party applications that we have even answers Thing called beyond our devices FBI and beyond Corp Alliance, because we know it's not just Google security posture. Customers are made investments and other security companies and you want to make sure all of that interoperate really nicely. So you see a lot more of that coming out >> immigration with other security platform. Certainly, enterprises require that I buy everything on the planet these days to protect themselves >> Like there's another company. Let's say that you're using for securing your devices. That sends a signal thing. I trust this device. It security, passing my checks. You want to make sure that that comes through and >> now we're gonna go. But what's your boss's title? Kim Theo, you report to the CEO. Yeah, Awesome guys. >> Creation. Thank you >> way. We've seen a lot of shifts in where security is usually now pretty much right. Strategic is core for the operations with their own practices. So, guys, thanks for coming on. Thanks for the thing you think of the show so far. What's the What's The takeaway came I'll go to you first. What's your What's the vibe of the >> show? It's a little tough for me because I have one of my senior security engineers here, and he's been going to a lot of the events and he comes to me and just >> look at all >> this stuff that they have like, way were just going over before this. I was like, Oh my God, we want to go back to our r R R office and take it all in right today. You know, if we could So yeah, it's a little tough because >> in the candy store way >> love it because again, it's like it's already paying for it. It's like they're just adding on services that we wanted, that we're gonna pay for it now. It's >> and carted quickly. Just get the last word I know was commenting on our opening this morning around how Google's got all five been falling Google since really the beginning of the company and I know for a fact is a tana big day that secures all spread for the company matter. Just kind of getting it. Yeah, share some inside quickly about what's inside Google. From a security asset standpoint, I p software. >> Absolutely. I mean, security's built from the ground up. We've been seeing that and going back to the candy store analogy. It feels like you've always had this amazing candy, but now there's like a stampede to get it, and it's just built in from the ground up. I love the solution. Focus that you found the keynotes and all the sessions that's happening. >> That's handsome connective tissue like Antos. Maybe the kind of people together. >> Yeah. I don't like >> guys. Thanks for coming on. We appreciate Kartik, Kim. Thanks for coming on. It's accused. Live coverage here on the ground floor were on the floor here. Day two of Google Cloud next here in San Francisco on Jeffrey David Lantz Stevens for more coverage after this short break.
SUMMARY :
It's the Cube covering I'm Javert Day Volante here on the ground floor, day two of three days of the chip level with the devices you got. One thing we know about the cloud if you don't make the access simple and easy and at the same Just take a minute to refresh for the folks that might not know some of the innovations. So these things together from how you security and access model And this is all about identity. This is where you guys are really taking to the next level. And it's built on the fighter standard. at the end of the year, the chief security officer, the buck stops with you. the coffee shop, you know, we don't care. there's no force. It's like a months long process and then even at the end of the day, you still have another This is actually a problematic situation for you. every hack against that company is brutal for us, like And you know, The security's the security. There are definitely discernible differences in quality, for sure. and then add to that different processes. On the part of the security is something you don't totally like about Yeah, it's necessarily bad, but it's just not the way you want to do it. It's learning curves like on the airport taking your shoes off. Beyond the security is also downtime. Is that like a tap on the shoulder with with looking at mail? But if you have you bring your own device. So the folks that are trying to figure this out, what's that? You can also, if you just want to run state three applications Yeah, you run all the applications and dcpd and you can And now they're announcing some things that help All the security, you know, metaphors that we use. There's a couple of different passwords we can use and weaken. It's like not only response, but how you dig through your assets toe. I mean, I don't know if you're automating your run book and on the product measures I What's the roadmap looks like and you share any insight into what your priorities are to Customers are made investments and other security companies and you want to make sure Certainly, enterprises require that I buy everything on the planet these Let's say that you're using for securing your devices. Kim Theo, you report to the CEO. Thank you Thanks for the thing you think of the show so far. You know, if we could So yeah, It's like they're just adding on services that we five been falling Google since really the beginning of the company and I know for a fact is a tana big day that secures and it's just built in from the ground up. Maybe the kind of people together. Live coverage here on the ground floor were
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Kim Theo | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Karthik Lakshminarayanan | PERSON | 0.99+ |
Karthik | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Kartik | PERSON | 0.99+ |
Meena Ryan | PERSON | 0.99+ |
Kim | PERSON | 0.99+ |
FBI | ORGANIZATION | 0.99+ |
Jennifer Lin | PERSON | 0.99+ |
six years | QUANTITY | 0.99+ |
Keith | PERSON | 0.99+ |
Kim Perrin | PERSON | 0.99+ |
last year | DATE | 0.99+ |
Today | DATE | 0.99+ |
Starbucks | ORGANIZATION | 0.99+ |
John Thomas | PERSON | 0.99+ |
three days | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
Chrome | TITLE | 0.99+ |
Thomas Kurian | PERSON | 0.99+ |
both | QUANTITY | 0.99+ |
yesterday | DATE | 0.99+ |
Doctor on Demand | ORGANIZATION | 0.98+ |
one license | QUANTITY | 0.98+ |
Both | QUANTITY | 0.98+ |
One solution | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
six years ago | DATE | 0.98+ |
first | QUANTITY | 0.98+ |
five | QUANTITY | 0.97+ |
Cloud Identity | ORGANIZATION | 0.97+ |
Day two | QUANTITY | 0.97+ |
Jeffrey David Lantz Stevens | PERSON | 0.97+ |
three layer | QUANTITY | 0.97+ |
android | TITLE | 0.96+ |
FAA | ORGANIZATION | 0.96+ |
two factor | QUANTITY | 0.96+ |
Google Cloud | TITLE | 0.96+ |
three things | QUANTITY | 0.95+ |
Javert Day Volante | PERSON | 0.95+ |
Next nineteen | DATE | 0.94+ |
Google Club | ORGANIZATION | 0.93+ |
One thing | QUANTITY | 0.92+ |
Google Cloud | ORGANIZATION | 0.92+ |
this morning | DATE | 0.92+ |
one time | QUANTITY | 0.92+ |
Strategic | ORGANIZATION | 0.91+ |
Corp Alliance | ORGANIZATION | 0.9+ |
one vendor | QUANTITY | 0.9+ |
three applications | QUANTITY | 0.9+ |
Double | QUANTITY | 0.89+ |
Bianca | PERSON | 0.88+ |
G et | TITLE | 0.86+ |
day two | QUANTITY | 0.85+ |
zero hardware | QUANTITY | 0.83+ |
Kino | ORGANIZATION | 0.82+ |
ten T applications | QUANTITY | 0.82+ |
Chromebooks | COMMERCIAL_ITEM | 0.8+ |
John Thomas, IBM & Elenita Elinon, JP Morgan Chase | IBM Think 2019
>> Live from San Francisco, it's theCUBE covering IBM Think 2019, brought to you by IBM. >> Welcome back everyone, live here in Moscone North in San Francisco, it's theCUBE's exclusive coverage of IBM Think 2019. I'm John Furrier, Dave Vellante. We're bringing down all the action, four days of live coverage. We've got two great guests here, Elenita Elinon, Executive Director of Quantitative Research at JP Morgan Chase, and John Thomas, Distinguished Engineer and Director of the Data Science Elite Team... great team, elite data science team at IBM, and of course, JP Morgan Chase, great innovator. Welcome to theCUBE. >> Welcome. >> Thank you very much. >> Thank you, thank you, guys. >> So I like to dig in, great use case here real customer on the cutting edge, JP Morgan Chase, known for being on the bleeding edge sometimes, but financial, money, speed... time is money, insights is money. >> Absolutely. Yes. >> Tell us what you do at the Quantitative Group. >> Well, first of all, thank you very much for having me here, I'm quite honored. I hope you get something valuable out of what I say here. At the moment, I have two hats on, I am co-head of Quantitative Research Analytics. It's a very small SWAT, very well selected group of technologists who are also physicists and mathematicians, statisticians, high-performance compute experts, machine learning experts, and we help the larger organization of Quantitative Research which is about 700-plus strong, as well as some other technology organizations in the firm to use the latest, greatest technologies. And how we do this is we actually go in there, we're very hands-on, we're working with the systems, we're working with the tools, and we're applying it to real use cases and real business problems that we see in Quantitative Research, and we prove out the technology. We make sure that we're going to save millions of dollars using this thing, or we're going to be able to execute a lot on this particular business that was difficult to execute on before because we didn't have the right compute behind it. So we go in there, we try out these various technologies, we have lots of partnerships with the different vendors, and IBM's been obviously one of few, very major vendors that we work with, and we find the ones that work. We have an influencing role as well in the organization, so we go out and tell people, "Hey, look, "this particular tool, perfect for this type of problem. "You should try it out." We help them set it up. They can't figure out the technology? We help them out. We're kind of like what I said, we're a SWAT team, very small compared to the rest of the organization, but we add a lot of value. >> You guys are the brain trust too. You've got the math skills, you've got the quantitative modeling going on, and it's a competitive advantage for your business. This is like a key thing, a lot of new things are emerging. One of things we're seeing here in the industry, certainly at this show, it's not your yesterday's machine learning. There's certainly math involved, you've got cognition and math kind of coming together, deterministic, non-deterministic elements, you guys are seeing these front edge, the problems, opportunities, for you guys. How do you see that world evolving because you got the classic math, school of math machine learning, and then the school of learning machines coming together? What kind of problems do you see these things, this kind of new model attacking? >> So we're making a very, very large investment in machine learning and data science as a whole in the organization. You probably heard in the press that we've brought in the Head of Machine Learning from CMU, Manuela Veloso. She's now heading up the AI Research Organization, JP Morgan, and she's making herself very available to the rest of the firm, setting strategies, trying different things out, partnering with the businesses, and making sure that she understands the use case of where machine learning will be a success. We've also put a lot of investments in tooling and hiring the right kinds of people from the right kinds of universities. My organization, we're changing the focus in our recruiting efforts to bring in more data science and machine learning. But, I think the most important thing, in addition to all that investment is that we, first and foremost, understand our own problems, we work with researchers, we work with IBM, we work with the vendors, and say, "Okay, this is the types of problems, "what is the best thing to throw at it?" And then we PoC, we prove it out, we look for the small wins, we try to strategize, and then we come up with the recommendations for a full-out, scalable architecture. >> John, talk about the IBM Elite Program. You guys roll your sleeves up. It's a service that you guys provide with your top clients. You bring in the best and you just jump in, co-create opportunities together, solving problems. >> That is exactly right. >> How does this work? What's your relationship with JP Morgan Chase? What specific use case are you going after? What are the opportunities? >> Yeah, so the Data Science Elite Team was setup to really help our top clients in their AI journey, in terms of bringing skills, tools, expertise to work collaboratively with clients like JP Morgan Chase. It's been a great partnership working with Elenita and her team. We've had some very interesting use cases related to her model risk management platform, and some interesting challenges in that space about how do you apply machine learning and deep learning to solve those problems. >> So what exactly is model risk management? How does that all work? >> Good question. (laughing) That's why we're building a very large platform around it. So model risk is one of several types of risk that we worry about and keep us awake at night. There's a long history of risk management in the banks. Of course, there's credit risk, there's market risk, these are all very well-known, very quantified risks. Model risk isn't a number, right? You can't say, "this model, which is some stochastic model "it's going to cost us X million dollars today," right? We currently... it's so somewhat new, and at the moment, it's more prescriptive and things like, you can't do that, or you can use that model in this context, or you can't use it for this type of trade. It's very difficult to automate that type of model risk in the banks, so I'm attempting to put together a platform that captures all of the prescriptive, and the conditions, and the restrictions around what to do, and what to use models for in the bank. Making sure that we actually know this in real time, or at least when the trade is being booked, We have an awareness of where these models are getting somewhat abused, right? We look out for those types of situations, and we make sure that we alert the correct stakeholders, and they do something about it. >> So in essence, you're governing the application of the model, and then learning as you go on, in terms of-- >> That's the second phase. So we do want to learn at the moment, what's in production today. Morpheus running in production, it's running against all of the trading systems in the firm, inside the investment bank. We want to make sure that as these trades are getting booked from day to day, we understand which ones are risky, and we flag those. There's no learning yet in that, but what we've worked with John on are the potential uses of machine learning to help us manage all those risks because it's difficult. There's a lot of data out there. I was just saying, "I don't want our Quants to do stupid things," 'cause there's too much stupidity happening right now. We're looking at emails, we're looking at data that doesn't make sense, so Morpheus is an attempt to make all of that understandable, and make the whole workflow efficient. >> So it's financial programming in a way, that's come with a whole scale of computing, a model gone astray could be very dangerous? >> Absolutely. >> This is what you're getting at right? >> It will cost real money to the firm. This is all the use-- >> So a model to watch the model? So policing the models, kind of watching-- >> Yes, another model. >> When you have to isolate the contribution of the model not like you saying before, "Are there market risks "or other types of risks--" >> Correct. >> You isolate it to the narrow component. >> And there's a lot of work. We work with the Model Governance Organization, another several hundred person organization, and that's all they do. They figure out, they review the models, they understand what the risk of the models are. Now, it's the job of my team to take what they say, which could be very easy to interpret or very hard, and there's a little bit of NLP that I think is potentially useful there, to convert what they say about a model, and what controls around the model are to something that we can systematize and run everyday, and possibly even in real time. >> This is really about getting it right and not letting it get out of control, but also this is where the scale comes in so when you get the model right, you can deploy it, manage it in a way that helps the business, versus if someone throws the wrong number in there, or the classic "we've got a model for that." >> Right, exactly. (laughing) There's two things here, right? There's the ability to monitor a model such that we don't pay fines, and we don't go out of compliance, and there's the ability to use the model exactly to the extreme where we're still within compliance, and make money, right? 'Cause we want to use these models and make our business stronger. >> There's consequences too, I mean, if it's an opportunity, there's upside, it's a problem, there's downside. You guys look at the quantification of those kinds of consequences where the risk management comes in? >> Yeah, absolutely. And there's real money that's at stake here, right? If the regulators decide that a model's too risky, you have to set aside a certain amount of capital so that you're basically protecting your investors and your business, and the stakeholders. If that's done incorrectly, we end up putting a lot more capital in reserve than we should be, and that's a bad thing. So quantifying the risks correctly and accurately is a very important part of what we do. >> So a lot of skillsets obviously, and I always say, "In the money business, you want the best nerds." Don't hate me for saying that... the smartest people. What are some of the challenges that are unique to model risk management that you might not see in sort of other risk management approaches? >> There are some technical challenges, right? The volume of data that you're dealing with is very large. If you are building... so at the very simplistic level, you have classification problems that you're addressing with data that might not actually be all there, so that is one. When you get into time series analysis for exposure prediction and so on, these are complex problems to handle. The training time for these models, especially deep learning models, if you are doing time series analysis, can be pretty challenging. Data volume, training time for models, how do you turn this around quickly? We use a combination of technologies for some of these use cases. Watson Studio running on power hardware with GPUs. So the idea here is you can cut down your model training time dramatically and we saw that as part of the-- >> Talk about how that works because this is something that we're seeing people move from manual to automated machine learning and deep learning, it give you augmented assistance to get this to the market. How does it actually work? >> So there is a training part of this, and then there is the operationalizing part of this, right? At the training part itself, you have a challenge, which is you're dealing with very large data volumes, you're dealing with training times that need to be shrunk down. And having a platform that allows you to do that, so you build models quickly, your data science folks can iterate through model creation very quickly is essential. But then, once the models have been built, how do you operationalize those models? How do you actually invoke the models at scale? How do you do workflow management of those models? How do you make sure that a certain exposure model is not thrashing some other models that are also essential to the business? How do you do policies and workflow management? >> And on top of that, we need to be very transparent, right? If the model is used to make certain decisions that have obvious impact financially on the bottom line, and an auditor comes back and says, "Okay, you made this trade so and so, why? What was happening at that time?" So we need to be able to capture and snapshot and understand what the model was doing at that particular instant in time, and go back and understand the inputs that went into that model and made it operate the way it did. >> It can't be a black box. >> It cannot be, yeah. >> Holistically, you got to look at the time series in real time, when things were happening and happened, happening, and then holistically tie that together. Is that kind of the impact analysis? >> We have to make our regulars happy. (laughing) That's number one, and we have to make our traders happy. We, as quantitative researchers, we're the ones that give them the hard math and the models, and then they use it. They use their own skillsets too to apply them, but-- >> What's the biggest needs that your stakeholders on the trading side want, and what's the needs on the compliance side, the traders want more, they want to move quickly? >> They're coming from different sides of it. Traders want to make more money, right? And they want to make decisions quickly. They want all the tools to tell them what to do, and for them to exercise whatever they normally exercise-- >> They want a competitive advantage. >> They want that competitive advantage, and they're also... we've got algo-trades as well, we want to have the best algo behind our trading. >> And the regulator side, we just want to make sure laws aren't broken, that there's auditing-- >> We use the phrase, "model explainability," right? Can you explain how the model came to a conclusion, right? Can you make sure that there is no bias in the model? How can you ensure the models are fair? And if you can detect there is a drift, what do you do to correct that? So that is very important. >> Do you have means of detecting sort of misuse of the model? Is that part of the governance process? >> That is exactly what Morpheus is doing. The unique thing about Morpheus is that we're tied into the risk management systems in the investment bank. We're actually running the same exact code that's pricing these trades, and what that brings is the ability to really understand pretty much the full stack trace of what's going into the price of a trade. We also have captured the restrictions and the conditions. It's in the Python script, it's essentially Python. And we can marry the two, and we can do all the checks that the governance person indicated we should be doing, and so we know, okay, if this trade is operating beyond maturity or a certain maturity, or beyond a certain expiry, we'll know that, and then we'll tag that information. >> And just for clarification, Morpheus is the name of the platform that does the-- >> Morpheus is the name of the model risk platform that I'm building out, yes. >> A final question for you, what's the biggest challenge that you guys have seen from a complexity standpoint that you're solving? What's the big complex... You don't want to just be rubber-stamping models. You want to solve big problems. What are the big problems that you guys are going after? >> I have many big problems. (laughing) >> Opportunities. >> The one that is right now facing me, is the problem of metadata, data ingestion, getting disparate sources, getting different disparate data from different sources. One source calls it a delta, this other source calls it something else. We've got a strategic data warehouse, that's supposed to take all of these exposures and make sense out of it. I'm in the middle because they're there, probably at the ten-year roadmap, who knows? And I have a one-month roadmap, I have something that was due last week and I need to come up with these regulatory reports today. So what I end up doing is a mix of a tactical strategic data ingestion, and I have to make sense of the data that I'm getting. So I need tools out there that will help support that type of data ingestion problem that will also lead the way towards the more strategic one, where we're better integrated with this-- >> John, talk about how you solve the problems? What are some of the things that you guys do? Give the plug for IBM real quick, 'cause I know you guys got the Studio. Explain how you guys are helping and working with JP Morgan Chase. >> Yeah, I touched upon this briefly earlier, which is from the model training perspective, Watson Studio running on Power hardware is very powerful, in terms of cutting down training time, right? But you've got to go beyond model building to how do you operationalize these models? How do I deploy these models at scale? How do I define workload management policies for these models, and connecting to their backbone. So that is part of this, and model explainability, we touched upon that, to eliminate this problem of how do I ingest data from different sources without having to manually oversee all of that. We need to manually apply auto-classification at the time of ingestion. Can I capture metadata around the model and reconcile data from different data sources as the data is being brought in? And can I apply ML to solve that problem, right? There is multiple applications of ML along this workflow. >> Talk about real quick, comment before we break, I want to get this in, machine learning has been around for a while now with compute and scale. It really is a renaissance in AI, it's great things are happening. But what feeds machine learning is data, the cleaner the data, the better the AI, the better the machine learning, so data cleanliness now has to be more real-time, it's less of a cleaning group, right? It used to be clean the data, bring it in, wrangle it, now you got to be much more agile, use speed of compute to make sure that you're qualifying data before it comes in, these machine learning. How do you guys see that rolling out, is that impacting you now? Are you thinking about it? How should people think about data quality as an input in machine learning? >> Well, I think the whole problem of setting up an application properly for data science and machine learning is really making sure that from the beginning, you're designing, and you're thinking about all of these problems of data quality, if it's the speed of ingestion, the speed of publication, all of that stuff. You need to think about the beginning, set yourself up to have the right elements, and it may not all be built out, and that's been a big strategy I've had with Morpheus. I've had a very small team working on it, but we think ahead and we put elements of the right components in place so data quality is just one of those things, and we're always trying to find the right tool sets that will enable use to do that better, faster, quicker. One of the things I'd like to do is to upscale and uplift the skillsets on my team, so that we are building the right things in the system from the beginning. >> A lot of that's math too, right? I mean, you talk about classification, getting that right upfront. Mathematics is-- >> And we'll continue to partner with Elenita and her team on this, and this helps us shape the direction in which our data science offerings go because we need to address complex enterprise challenges. >> I think you guys are really onto something big. I love the elite program, but I think having the small team, thinking about the model, thinking about the business model, the team model before you build the technology build-out, is super important, that seems to be the new model versus the old days, build some great technology and then, we'll put a team around it. So you see the world kind of being a little bit more... it's easier to build out and acquire technology, than to get it right, that seems to be the trend here. Congratulations. >> Thank you. >> Thanks for coming on. I appreciate it. theCUBE here, CUBE Conversations here. We're live in San Francisco, IBM Think. I'm John Furrier, Dave Vellante, stay with us for more day two coverage. Four days we'll be here in the hallway and lobby of Moscone North, stay with us.
SUMMARY :
covering IBM Think 2019, brought to you by IBM. and Director of the Data Science Elite Team... known for being on the bleeding edge sometimes, Absolutely. Well, first of all, thank you very much the problems, opportunities, for you guys. "what is the best thing to throw at it?" You bring in the best and you just jump in, Yeah, so the Data Science Elite Team was setup and the restrictions around what to do, and make the whole workflow efficient. This is all the use-- Now, it's the job of my team to take what they say, so when you get the model right, you can deploy it, There's the ability to monitor a model You guys look at the quantification of those kinds So quantifying the risks correctly "In the money business, you want the best nerds." So the idea here is you can cut down it give you augmented assistance to get this to the market. At the training part itself, you have a challenge, and made it operate the way it did. Is that kind of the impact analysis? and then they use it. and for them to exercise whatever they normally exercise-- and they're also... we've got algo-trades as well, what do you do to correct that? that the governance person indicated we should be doing, Morpheus is the name of the model risk platform What are the big problems that you guys are going after? I have many big problems. The one that is right now facing me, is the problem What are some of the things that you guys do? to how do you operationalize these models? is that impacting you now? One of the things I'd like to do is to upscale I mean, you talk about classification, because we need to address complex enterprise challenges. the team model before you build the technology build-out, of Moscone North, stay with us.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Elenita Elinon | PERSON | 0.99+ |
Manuela Veloso | PERSON | 0.99+ |
John | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
JP Morgan Chase | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
one-month | QUANTITY | 0.99+ |
John Thomas | PERSON | 0.99+ |
ten-year | QUANTITY | 0.99+ |
Quantitative Research | ORGANIZATION | 0.99+ |
last week | DATE | 0.99+ |
two | QUANTITY | 0.99+ |
two things | QUANTITY | 0.99+ |
JP Morgan | ORGANIZATION | 0.99+ |
Four days | QUANTITY | 0.99+ |
Elenita | PERSON | 0.99+ |
second phase | QUANTITY | 0.99+ |
Moscone North | LOCATION | 0.99+ |
Quantitative Research Analytics | ORGANIZATION | 0.99+ |
Morpheus | PERSON | 0.99+ |
today | DATE | 0.99+ |
Python | TITLE | 0.99+ |
Quantitative Group | ORGANIZATION | 0.99+ |
IBM Think | ORGANIZATION | 0.98+ |
Model Governance Organization | ORGANIZATION | 0.98+ |
one | QUANTITY | 0.97+ |
two great guests | QUANTITY | 0.97+ |
four days | QUANTITY | 0.97+ |
One | QUANTITY | 0.96+ |
million dollars | QUANTITY | 0.96+ |
millions of dollars | QUANTITY | 0.95+ |
theCUBE | ORGANIZATION | 0.95+ |
2019 | DATE | 0.95+ |
AI Research Organization | ORGANIZATION | 0.94+ |
CMU | ORGANIZATION | 0.94+ |
One source | QUANTITY | 0.93+ |
yesterday | DATE | 0.92+ |
Watson Studio | TITLE | 0.92+ |
Research | ORGANIZATION | 0.9+ |
Morpheus | TITLE | 0.89+ |
Data Science Elite | ORGANIZATION | 0.86+ |
hundred person | QUANTITY | 0.85+ |
Data Science | ORGANIZATION | 0.83+ |
two hats | QUANTITY | 0.79+ |
about 700-plus | QUANTITY | 0.79+ |
2019 | TITLE | 0.79+ |
first | QUANTITY | 0.78+ |
day | QUANTITY | 0.76+ |
Think | COMMERCIAL_ITEM | 0.66+ |
Program | OTHER | 0.65+ |
Think 2019 | TITLE | 0.56+ |
SWAT | ORGANIZATION | 0.52+ |
IBM | TITLE | 0.43+ |
Elite | TITLE | 0.38+ |
John Thomas, IBM | IBM CDO Fall Summit 2018
>> Live from Boston, it's theCUBE, covering IBM Chief Data Officer Summit, brought to you by IBM. >> Welcome back everyone to theCUBE's live coverage of the IBM CDO Summit here in Boston, Massachusetts. I'm your host Rebecca Knight*, and I'm joined by cohost, Paul Gillan*. We have a guest today, John Thomas. He is the Distinguished Engineer and Director* at IBM. Thank you so much for coming, returning to theCUBE. You're a CUBE veteran, CUBE alum. >> Oh thank you Rebecca, thank you for having me on this. >> So tell our viewers a little bit about, you're a distinguished engineer. There are only 672 in all of IBM. What do you do? What is your role? >> Well that's a good question. Distinguished Engineer is kind of a technical executive role, which is a combination of applying the technology skills, as well as helping shape IBM strategy in a technical way, working with clients, et cetera. So it is a bit of a jack of all trades, but also deep skills in some specific areas, and I love what I do (laughs lightly). So, I get to work with some very talented people, brilliant people, in terms of shaping IBM technology and strategy. Product strategy, that is part of it. We also work very closely with clients, in terms of how to apply that technology in the context of the client's use status. >> We've heard a lot today about soft skills, the importance of organizational people skills to being a successful Chief Data Officer, but there's still a technical component. How important is the technical side? What is, what are the technical skills that the CDOs need? >> Well, this is a very good question Paul. So, absolutely, so, navigating the organizational structure is important. It's a soft skill. You are absolutely right. And being able to understand the business strategy for the company, and then aligning your data strategy to the business strategy is important, right? But the underlying technical pieces need to be solid. So for example, how do you deal with large volumes of different types of data spread across a company? How do you manage that data? How do you understand the data? How do you govern that data? How do you then master leveraging the value of that data in the context of your business, right? So an understanding, a deep understanding of the technology of collecting, organizing, and analyzing that data is needed for you to be a successful CDO. >> So in terms of, in terms of those skillsets that you're looking for, and one of the things that Inderpal said earlier in his keynote, is that, there are just, it's a rare individual who truly understands the idea of how to collect, store, analyze, curatize, monetize the data, and then also have the soft skills of being able to navigate the organization, being able to be a change agent who is inspiring, inspiring the rank and file. How do you recruit and retain talent? I mean, this seems to be a major challenge. >> Expertise is, and getting the right expertise in place, and Inderpal talked about it in his keynote, which was the very first thing he did was bring in talent. Sometimes it is from outside of your company. Maybe you have a kind of talent that has grown up in your company. Maybe you have to go outside, but you've got to bring in the right skills together. Form the team that understands the technology, and the business side of things, and build this team, and that is essential for you to be a successful CDO. And to some extent, that's what Inderpal has done. That's what the analytic CDO's office has done. Seth Dobrin, my boss, is the analytics CDO , and he and the analytics CDO team actually hired people with different skills. Data engineering skills, data science skills, visualization skills, and then put this team together which understands the, how to collect, govern, curate, and analyze the data, and then apply them in specific situations. >> There's been a lot of talk about AI, at this conference, which seems to be finally happening. What do you see in the field, or perhaps projects that you've worked on, of examples of AI that are really having a meaningful business impact? >> Yeah Paul, that is a very good question because, you know, the term AI is overused a lot as you can imagine, a lot of hype around it. But I think we are past that hype cycle, and people are looking at, how do I implement successful use cases? And I stress the word use case, right? In my experience these, how I'm going to transform my business in one big boil the ocean exercise, does not work. But if you have a very specific bounded use case that you can identify, the business tells you this is relevant. The business tells you what the metrics for success are. And then you focus your attention, your efforts on that specific use case with the skills needed for that use case, then it's successful. So, you know, examples of use cases from across the industries, right? I mean everything that you can think of. Customer-facing examples, like, how do I read the customer's mind? So when, if I'm a business and I interact with my customers, can I anticipate what the customer is looking for, maybe for a cross-sell opportunity, or maybe to reduce the call handing time when a customer calls into my call center. Or trying to segment my customers so I can do a proper promotion, or a campaign for that customer. All of these are specific customer phasing examples. There also are examples of applying this internally to improve precesses, capacity planning for your infrastructure, can I predict when a system is likely to have an outage, or can I predict the traffic coming into my systems, into my infrastructure and provision capacity for that on demand, So all of these are interesting applications of AI in the enterprise. >> So when your trying, what are the things we keep hearing, is that we need to data to tell a story To, the data needs to be compelling enough so that the people, the data scientist get it but then also the other kinds of business decision makers get it to. >> Yep >> So, what are sort of, the best practices that have emerged from your experience? In terms of, being able to, for your data to tell a story that you want it to tell. >> Yeah, well I mean if the pattern doesn't exist in the data then no amount of fancy algorithms can help, you know? and sometimes its like searching for a needle in a haystack but assuming, I guess the first step is, like I said, What is the use case? Once you have a clear understanding of your use case and such metrics for your use case, do you have the data to support that use case? So for example if it's fraud detection, do you actually have the historical data to support the fraud use case? Sometimes you may have transactional data from your, transocular from your core enterprise systems but that may not be enough. You may need to alt mend it with external data, third party data, maybe unstructured data, that goes along with your transaction data. So the question is, can you identify the data that is needed to support the use case and if so can I, is that data clean, is that data, do you understand the lineage of the data, who has touched and modified the data, who owns the data. So then I can start building predictive models and machine learning, deep learning models with that data. So use case, do you have the data to support the use case? Do you understand how that sata reached you? Then comes the process of applying machine learning algorithms and deep learning algorithms against that data. >> What are the risks of machine learning and particularly deep learning, I think because it becomes kind of a black box and people can fall into the trap of just believing what comes back, regardless of whether the algorithms are really sound or the data is. What is the responsibility of data scientist to sort of show their work? >> Yeah, Paul this is fascinating and not completely solid area, right? So, bias detection, can I explain how my model behaved, can I ensure that the models are fair in their predictions. So there is a lot of research, a lot of innovation happening in the space. IBM is investing a lot into space. We call trust and transparency, being able to explain a model, it's got multiple levels to it. You need some level of AI governments itself, just like we talked about data governments that is the notion of AI governments. Which is what motion of the model was used to make a prediction? What were the imports that went into that model? What were the decisions that were, that were the features that were used to make a sudden prediction? What was the prediction? And how did that match up with ground truth. You need to be able to capture all that information but beyond that, we have got actual mechanisms in place that IBM Research is developing to look at bias detection. So pre processing during execution post processing, can I look for bias in how my models behave and do I have mechanisms to mitigate that? So one example is the open source Python library, called AIF360 that comes from IBM Research and has contributed to the open source community. You can look at, there are mechanisms to look at bias and provide some level of bias mitigation as part of your model building exercises. >> And the bias mitigation, does it have to do with, and I'm going to use an IMB term of art here, the human in the loop, is it how much are you actually looking at the humans that are part of this process >> Yeah, humans are at least at this point in time, humans are very much in the loop. This notion of Peoria high where humans are completely outside the loop is, we're not there yet so very much something that the system can for awhile set off recommendations, can provide a set of explanations and can someone who understands the business look at it and make a corrective, take corrective actions. >> There has been, however to Rebecca's point, some prominent people including Bill Gates, who have speculated that the AI could ultimately be a negative for humans. What is the responsibility of company's like IBM to ensure that humans are kept in the loop? >> I think at least at this point IBM's view is humans are an essential part of AI. In fact, we don't even use artificial intelligence that much we call it augmented intelligence. Where the system is pro sending a set of recommendations, expert advise to the human who can then make a decision. For example, you know my team worked with a prominent health care provider on you know, models for predicting patient death in the case of sepsis, sepsis-onset. This is, we are talking literally life and death decisions being made and this is not something you can just automate and throw into a magic black box, and have a decision be made. So this is absolutely a place where people with deep, domain knowledge are supported, are opt mended with, with AI to make better decisions, that's where I think we are today. As to what will happen five years from now, I can't predict that yet. >> Well I actually want to- >> But the question >> bring this up to both of you, the role, so you are helping doctor's make these decisions, not just this is what the computer program says about this patient's symptoms here but this is really, so you are helping the doctor make better decisions. What about the doctors gut, in the, his or her intuition to. I mean, what is the role of that, in the future? >> I think it goes away, I mean I think, the intuition really will be trumped by data in the long term because you can't argue with the facts. Some people do these days. (soft laughter) But I don't remember (everyone laughing) >> We have take break there for some laughter >> Intrested in your perspective onthat is there, will there, should there always be a human on the front line, who is being supported by the back end or would you see a scenario were an AI is making decisions, customer facing decisions that are, really are life and death decisions? >> So I think in the consumer invest way, I can definitely see AI making decisions on it's own. So you know if lets say a recommender system would say as you know I think, you know John Thomas, bought these last five things online. He's likely to buy this other thing, let's make an offer to him. You know, I don't need another human in the loop for that >> No harm right? >> Right. >> It's pretty straight forward, it's already happening, in a big way but when it comes to some of these >> Prepoping a mortgage, how about that one? >> Yeah >> Where bias creeps in a lot. >> But that's one big decision. >> Even that I think can be automated, can be automated if the threshold is set to be what the business is comfortable with, were it says okay, above this probity level, I don't really need a human to look at this. But, and if it is below this level, I do want someone to look at this. That's you know, that is relatively straight forward, right? But if it is a decision about you know life or death situation or something that effects the very fabric of the business that you are in, then you probably want a domain explore to look at it. In most enterprises, enterprises cases will fall, lean toward that category. >> These are big questions. These are hard questions. >> These are hard questions, yes. >> Well John, thank you so much for doing >> Oh absolutely, thank you >> On theCUBE, we really had a great time with you. >> No thank you for having me. >> I'm Rebecca Knight for Paul Gillan, we will have more from theCUBE's live coverage of IBM CDO, here in Boston, just after this. (Upbeat Music)
SUMMARY :
brought to you by IBM. of the IBM CDO Summit here in Boston, Massachusetts. What do you do? in the context of the client's use status. How important is the technical side? in the context of your business, right? and one of the things that Inderpal said and that is essential for you to be a successful CDO. What do you see in the field, the term AI is overused a lot as you can imagine, To, the data needs to be compelling enough the best practices that have emerged from your experience? So the question is, can you identify the data and people can fall into the trap of just can I ensure that the models are fair in their predictions. are completely outside the loop is, What is the responsibility of company's being made and this is not something you can just automate What about the doctors gut, in the, his or her intuition to. in the long term because you can't argue with the facts. So you know if lets say a recommender system would say as of the business that you are in, These are hard questions. we really had a great time with you. here in Boston, just after this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca Knight | PERSON | 0.99+ |
Paul Gillan | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Seth Dobrin | PERSON | 0.99+ |
Rebecca | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
Inderpal | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Paul | PERSON | 0.99+ |
Bill Gates | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
IBM Research | ORGANIZATION | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
first step | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Python | TITLE | 0.98+ |
theCUBE | ORGANIZATION | 0.98+ |
672 | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
one example | QUANTITY | 0.98+ |
IBM CDO Summit | EVENT | 0.96+ |
one | QUANTITY | 0.95+ |
Bost | LOCATION | 0.95+ |
five things | QUANTITY | 0.94+ |
sepsis | OTHER | 0.88+ |
Peoria | LOCATION | 0.88+ |
CUBE | ORGANIZATION | 0.88+ |
IBM Chief Data Officer Summit | EVENT | 0.87+ |
IBM | EVENT | 0.85+ |
first thing | QUANTITY | 0.82+ |
CDO Fall Summit 2018 | EVENT | 0.81+ |
AIF360 | TITLE | 0.71+ |
CDO | TITLE | 0.66+ |
five years | DATE | 0.66+ |
John Thomas, IBM | IBM CDO Fall Summit
live from Boston it's the cube covering IBM chief data officer summit brought to you by IBM welcome back everyone to the cubes live coverage of the IBM CDO summit here in Boston Massachusetts I'm your host Rebecca Knight and I'm joined by co-host Paul Gillan we have a guest today John Thomas he is the distinguished engineer and director at IBM thank you so much for coming returning to the cube you're a cube veteran so tell our viewers a little bit about your distinguished engineer there are only 672 in all of IBM what do you do what is your role that's a good question distinguished engineer is kind of a technical execute a role which is a combination of applying the technology skills as well as helping shape by the inscriber gene in a technical way working with clients etcetera right so it is it is a bit of a jack-of-all-trades but also deep skills in some specific areas and I love what I do so you get to work with some very talented people brilliant people in terms of shaping IBM technology and strategy products for energy that is part of it we also work very closely with clients in terms of how do you apply that technology in the context of the clients use cases we've heard a lot today about soft skills the importance of organizational people skills to being a successful chief data officer but there's still a technical component how important is the technical side what is what are the technical skills that the cdos need oh this is a very good question Paul so absolutely so navigating the organizational structure is important it's a soft skill you're absolutely right and being able to understand the business strategy for the company and then aligning your data strategy to the business strategy is important right but the underlying technical pieces need to be solid so for example how do you deal with large volumes of different types of data spread across the company how do you manage the data how do you understand the data how do you govern that data how do you then mast are leveraging the value of the data in the context of your business right so and understand deep understanding of the technology of collecting organizing and analyzing that data is needed for you to be a success for CBL so in terms of in terms of those skill sets that you're looking for and one of the things that Interpol said earlier in his keynote is that they're just it's a rare individual who truly understands the idea of how to collect store analyze curate eyes monetize the data and then also has the the soft skills of being able to navigate the organization being able to be a change agent is inspiring yeah inspiring the rank-and-file yeah how do you recruit and retain talent it seems to be a major tech expertise is not getting the right expertise in place and Interpol talked about it in his keynote which was the very first thing he did was bring in Terrence sometimes it is from outside of your company maybe you have a kind of talent that has grown up in your company maybe you have to go outside buddy God bring in the right skills together form the team that understands the technology and the business side of things and build esteem and that is essential for you to be a successful CTO and to some extent that's what Interpol has done that's what the analytic CEOs office has done a set up in my boss is the analytics EDF and he and the analytic CDO team actually engineering skills data science skills visualization skills and then put this team together which understands the how to collect govern curate and analyze the data and then apply them in specific situations a lot of talk about AI at this conference what seems to be finally happening what do you see in the field or perhaps projects that you've worked on examples of AI that are really having a meaningful business impact yeah Paul it's a very good question because you know the term AI is overused a lot as you can imagine a lot of hype around it but I think we are past that hype cycle and people are looking at how do i implement successful use cases and I stressed the word use case right in my experience these how I'm going to transform my business in one big boil the ocean exercise does not work but if you have a very specific bounded use case that you can identify the business tells you this is relevant the business tells you what the metrics for success are and then you focus your your attention your your efforts on that specific use case with the skills need for that use case then it's successful so you know examples of use cases from across the industries right I mean everything that you can think of customer-facing examples like how do I read the customers mind so when when if I'm a business and I interact with my customers can I anticipate what the customer is looking for maybe for a cross-sell opportunity or maybe to reduce the call handling time and a customer calls in to my call center or trying to segment my customer so I can do a proper promotion or a campaign for that customer all of these are specific customer facing examples there are also examples of applying this internally to improve processes capacity planning for your infrastructure can I predict when a system is likely to have an outage and or can I predict the traffic coming into my systems into my infrastructure and provision capacity that on-demand so all these are interesting applications of AI in the enterprise so when you're trying I mean one of the things we keep hearing is that we need data to tell a story the data needs to the data needs to be compelling enough so that the people the data scientists get it but then also that the other kinds of business decision makers get it - so what are sort of the best practices that have emerged from your experience in terms of being able to for your data to tell the story that you wanted to tell yeah well I mean if the pattern doesn't exist in the data then no amount of fancy algorithms can help you know so and sometimes it's like searching for a needle in a haystack but assuming I guess the first step is like I said what is the a use case once you have a clear understanding of your use case and success metrics for the use case do you have the data to support that use case so for example if it's fraud detection do you actually have the historical data to support the fraud use case sometimes you may have transactional data from your your transaction data from your current or PI systems but that may not be enough you may need to augment it with external data third party data may be unstructured data that goes along with the transaction data so question is can you identify the data that is needed to support the use case and if so can I do is that data clean is that is that data do you understand the lineage of the data who has touched and modified the data who owns the data so that I can then start building predictive models and machine learning be planning models with that data so use case do you have the data to support the use case do you understand how the data reached you then comes the process of applying machine learning algorithms and deep learning algorithms against that data one of the risks of machine learning and particularly deep learning I think is it becomes kind of a black box and people can fall into the trap of just believing what comes back regardless of whether the algorithms are really sound or the data is somewhat what is the responsibility of data scientists to sort of show their work yeah Paul this is a fascinating and not completely solved area right so bias detection can I explain how my model behaved can I ensure that the models are fair in their predictions so there's a lot of research lot of innovation happening in the space iBM is investing a lot in the space we call trust and transparency being able to explain a model it's got multiple levels to it you need some level of AI governments itself so just like we talked about data governance there is the notion of AI governance which is what version of the model was used to make a prediction what were the inputs that went into that model what were the decisions that are that what were the features that were used to make a certain prediction what was the prediction and how did that match up with ground truth you need to be able to capture all that information but beyond that we have got actual mechanisms in place that IBM Research is developing to look at bias detection so pre-processing during execution post-processing can I look for bias in how my models behave and do I have mechanisms to mitigate that so one example is the open source Python library called AI F 360 that comes from IBM's research on its contributor to the open source community you can look at there are mechanisms to look at bias and and and provide some level of bias mitigation as part of your model building exercises and is the bias mitigation does it have to do with and I'm gonna use an IBM term of art here at the human in the loop I mean is how much are you actually looking at the humans that are part of this process humans are at least at this point in time humans are very much in the loop this this notion of P or AI where humans are completely outside the loop is we're not there yet so very much something that the system can it provide a set of recommendations can it provide a set of explanations in can someone who understands the business look at it and make corrective take corrective action as needed there has been however to Rebecca's point some prominent people including Bill Gates who have have speculated that AI could ultimately be a negative for humans are what is the responsibility of companies like IBM to ensure that humans are kept in the loop I think at least at this point IBM's V was humans are an essential part of AI in fact we don't even use the term artificial intelligence that much we call it augmented intelligence where the system is presenting a set of recommendations expert advice to the human who can then make a decision so for example you know my team worked with a prominent healthcare provider on you know models for predicting patient death death in in the case of sepsis sepsis onset this is we're talking literally life and death decisions being made and this is not something that you can just automate and throw it into a magic black box and have a decision be made right so this is absolutely a place where people with deep domain knowledge are supported are augmented with with AI to make better decisions that's where that's where I think we are today as to what will happen five years from now I can't predict that yet the role so you are helping doctors make these decisions not just this is what the computer program says about this patients symptoms here but this is really you're helping the doctor make better decisions what about the doctors gut and the ease into his or her intuition too I mean what is what is the role of that in the future I think it goes away I mean I think the intuition really will be trumped by data in the long term because you can't argue with the facts much as some some people do these days the perspective on that is there will there all should there always be a human on the front lines who is being supported by the backend or would would you see a scenario where an AI is making decisions customer-facing decisions that are really are life and death so I think in the consumer industry I can definitely see AI making decisions on its own right so you know if let's say a recommender system which says you know I think you know John Thomas bought these last five things online he's likely to buy this other thing let's make an offer team you know I don't even in the loop for no harm it's it's it's it's pretty straightforward it's already happening in a big way but when it comes to some of these mortgage yeah about that one even that I think can be can be automated can be automated if the thresholds are said to be what the business is comfortable with where it says okay about this probability level I don't really need a human to look at this but and if it is below this level I do want someone to look at this that's you know that is relatively straightforward right but if it is a decision about you know life-or-death situations or something that affects the the very fabric of the business that you are in then you probably want to domain expert to look at it and most enterprises enterprise use cases will for lean towards that category these are big questions they're hard questions are questions yes well John thank you so much oh absolutely thank you we've really had a great time with you yeah thank you for having me I'm Rebecca night for Paul Gillen we will have more from the cubes live coverage of IBM CDO here in Boston just after this
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca Knight | PERSON | 0.99+ |
Paul Gillan | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Bill Gates | PERSON | 0.99+ |
Rebecca | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Paul Gillen | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Paul | PERSON | 0.99+ |
IBM Research | ORGANIZATION | 0.99+ |
Python | TITLE | 0.99+ |
first step | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
Interpol | ORGANIZATION | 0.97+ |
first thing | QUANTITY | 0.97+ |
one | QUANTITY | 0.95+ |
Boston Massachusetts | LOCATION | 0.94+ |
one example | QUANTITY | 0.94+ |
672 | QUANTITY | 0.93+ |
five things | QUANTITY | 0.92+ |
Interpol | PERSON | 0.92+ |
CBL | ORGANIZATION | 0.83+ |
IBM CDO summit | EVENT | 0.83+ |
EDF | ORGANIZATION | 0.82+ |
sepsis | OTHER | 0.81+ |
AI F 360 | TITLE | 0.78+ |
Terrence | LOCATION | 0.78+ |
iBM | ORGANIZATION | 0.77+ |
chief data officer | EVENT | 0.74+ |
lot of | QUANTITY | 0.7+ |
CDO Fall Summit | EVENT | 0.66+ |
five years | DATE | 0.58+ |
CDO | TITLE | 0.24+ |
Rob Thomas, IBM | Change the Game: Winning With AI 2018
>> [Announcer] Live from Times Square in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to theCUBE's special presentation. We're covering IBM's announcements today around AI. IBM, as theCUBE does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on theCUBE, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching theCUBE.
SUMMARY :
brought to you by IBM. Long time Cube alum, Rob, great to see you. But Rob, let's start with what you guys have going on, it's great when you have Strata, a lot of people in town. and kind of get ready for this era that we're in now. where you want to go, you're just going to wind around, and data science collaboration, you guys have It's hard to provide self-service if you don't have and it's an executive level, what are you seeing let's get to an outcome, and you can do this and I think we have a customer who's actually as the architecture to drive that modernization. So, just to remind people, you remember ODPI, folks? has the skills they need, so we're sponsoring a community and it can find data anywhere in the world. of processing power on the edge, where you can get data a couple billion dollar moves, to do some acquisitions This is why you see such a premium being put on things Is that the right way to think about it? to a Cloud-Native architecture if that's what they prefer. certain laws of the land, if you will, that say, for how you execute models that you've built. I mean, it's clear, Rob, from the conversation here, and it's not a lot of time, you'll see the examples tonight, Rob, we'll see you there, thanks so much for coming back. we'll be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
six months | QUANTITY | 0.99+ |
Rob | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
two months | QUANTITY | 0.99+ |
one month | QUANTITY | 0.99+ |
Germany | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
Monday | DATE | 0.99+ |
one | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
GDPR | TITLE | 0.99+ |
three people | QUANTITY | 0.99+ |
first | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
ibm.com/WinWithAI | OTHER | 0.99+ |
Watson Studio | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
Scala | TITLE | 0.99+ |
First | QUANTITY | 0.99+ |
Data Science Elite | ORGANIZATION | 0.99+ |
both | QUANTITY | 0.99+ |
Cube | ORGANIZATION | 0.99+ |
one step | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Times Square | LOCATION | 0.99+ |
today | DATE | 0.99+ |
Vince Lombardi | PERSON | 0.98+ |
three | QUANTITY | 0.98+ |
Stack Overflow | ORGANIZATION | 0.98+ |
tonight | DATE | 0.98+ |
Javits Center | LOCATION | 0.98+ |
Barney | ORGANIZATION | 0.98+ |
Terminal 5 | LOCATION | 0.98+ |
IBM Analytics | ORGANIZATION | 0.98+ |
Watson | TITLE | 0.97+ |
two steps | QUANTITY | 0.97+ |
New York City | LOCATION | 0.97+ |
Watson Applications | TITLE | 0.97+ |
Cloud | TITLE | 0.96+ |
This evening | DATE | 0.95+ |
Watson Machine Learning | TITLE | 0.94+ |
two area | QUANTITY | 0.93+ |
seven-figure deals | QUANTITY | 0.92+ |
Cube | PERSON | 0.91+ |
John Thomas, IBM | Change the Game: Winning With AI
(upbeat music) >> Live from Time Square in New York City, it's The Cube. Covering IBM's change the game, winning with AI. Brought to you by IBM. >> Hi everybody, welcome back to The Big Apple. My name is Dave Vellante. We're here in the Theater District at The Westin Hotel covering a Special Cube event. IBM's got a big event today and tonight, if we can pan here to this pop-up. Change the game: winning with AI. So IBM has got an event here at The Westin, The Tide at Terminal 5 which is right up the Westside Highway. Go to IBM.com/winwithAI. Register, you can watch it online, or if you're in the city come down and see us, we'll be there. Uh, we have a bunch of customers will be there. We had Rob Thomas on earlier, he's kind of the host of the event. IBM does these events periodically throughout the year. They gather customers, they put forth some thought leadership, talk about some hard dues. So, we're very excited to have John Thomas here, he's a distinguished engineer and Director of IBM Analytics, long time Cube alum, great to see you again John >> Same here. Thanks for coming on. >> Great to have you. >> So we just heard a great case study with Niagara Bottling around the Data Science Elite Team, that's something that you've been involved in, and we're going to get into that. But give us the update since we last talked, what have you been up to?? >> Sure sure. So we're living and breathing data science these days. So the Data Science Elite Team, we are a team of practitioners. We actually work collaboratively with clients. And I stress on the word collaboratively because we're not there to just go do some work for a client. We actually sit down, expect the client to put their team to work with our team, and we build AI solutions together. Scope use cases, but sort of you know, expose them to expertise, tools, techniques, and do this together, right. And we've been very busy, (laughs) I can tell you that. You know it has been a lot of travel around the world. A lot of interest in the program. And engagements that bring us very interesting use cases. You know, use cases that you would expect to see, use cases that are hmmm, I had not thought of a use case like that. You know, but it's been an interesting journey in the last six, eight months now. >> And these are pretty small, agile teams. >> Sometimes people >> Yes. use tiger teams and they're two to three pizza teams, right? >> Yeah. And my understanding is you bring some number of resources that's called two three data scientists, >> Yes and the customer matches that resource, right? >> Exactly. That's the prerequisite. >> That is the prerequisite, because we're not there to just do the work for the client. We want to do this in a collaborative fashion, right. So, the customers Data Science Team is learning from us, we are working with them hand in hand to build a solution out. >> And that's got to resonate well with customers. >> Absolutely I mean so often the services business is like kind of, customers will say well I don't want to keep going back to a company to get these services >> Right, right. I want, teach me how to fish and that's exactly >> That's exactly! >> I was going to use that phrase. That's exactly what we do, that's exactly. So at the end of the two or three month period, when IBM leaves, my team leaves, you know, the client, the customer knows what the tools are, what the techniques are, what to watch out for, what are success criteria, they have a good handle of that. >> So we heard about the Niagara Bottling use case, which was a pretty narrow, >> Mm-hmm. How can we optimize the use of the plastic wrapping, save some money there, but at the same time maintain stability. >> Ya. You know very, quite a narrow in this case. >> Yes, yes. What are some of the other use cases? >> Yeah that's a very, like you said, a narrow one. But there are some use cases that span industries, that cut across different domains. I think I may have mentioned this on one of our previous discussions, Dave. You know customer interactions, trying to improve customer interactions is something that cuts across industry, right. Now that can be across different channels. One of the most prominent channels is a call center, I think we have talked about this previously. You know I hate calling into a call center (laughter) because I don't know Yeah, yeah. What kind of support I'm going to get. But, what if you could equip the call center agents to provide consistent service to the caller, and handle the calls in the best appropriate way. Reducing costs on the business side because call handling is expensive. And eventually lead up to can I even avoid the call, through insights on why the call is coming in in the first place. So this use case cuts across industry. Any enterprise that has got a call center is doing this. So we are looking at can we apply machine-learning techniques to understand dominant topics in the conversation. Once we understand with these have with unsupervised techniques, once we understand dominant topics in the conversation, can we drill into that and understand what are the intents, and does the intent change as the conversation progress? So you know I'm calling someone, it starts off with pleasantries, it then goes into weather, how are the kids doing? You know, complain about life in general. But then you get to something of substance why the person was calling in the first place. And then you may think that is the intent of the conversation, but you find that as the conversation progresses, the intent might actually change. And can you understand that real time? Can you understand the reasons behind the call, so that you could take proactive steps to maybe avoid the call coming in at the first place? This use case Dave, you know we are seeing so much interest in this use case. Because call centers are a big cost to most enterprises. >> Let's double down on that because I want to understand this. So you basically doing. So every time you call a call center this call may be recorded, >> (laughter) Yeah. For quality of service. >> Yeah. So you're recording the calls maybe using MLP to transcribe those calls. >> MLP is just the first step, >> Right. so you're absolutely right, when a calls come in there's already call recording systems in place. We're not getting into that space, right. So call recording systems record the voice calls. So often in offline batch mode you can take these millions of calls, pass it through a speech-to-text mechanism, which produces a text equivalent of the voice recordings. Then what we do is we apply unsupervised machine learning, and clustering, and topic-modeling techniques against it to understand what are the dominant topics in this conversation. >> You do kind of an entity extraction of those topics. >> Exactly, exactly, exactly. >> Then we find what is the most relevant, what are the relevant ones, what is the relevancy of topics in a particular conversation. That's not enough, that is just step two, if you will. Then you have to, we build what is called an intent hierarchy. So this is at top most level will be let's say payments, the call is about payments. But what about payments, right? Is it an intent to make a late payment? Or is the intent to avoid the payment or contest a payment? Or is the intent to structure a different payment mechanism? So can you get down to that level of detail? Then comes a further level of detail which is the reason that is tied to this intent. What is a reason for a late payment? Is it a job loss or job change? Is it because they are just not happy with the charges that I have coming? What is a reason? And the reason can be pretty complex, right? It may not be in the immediate vicinity of the snippet of conversation itself. So you got to go find out what the reason is and see if you can match it to this particular intent. So multiple steps off the journey, and eventually what we want to do is so we do our offers in an offline batch mode, and we are building a series of classifiers instead of classifiers. But eventually we want to get this to real time action. So think of this, if you have machine learning models, supervised models that can predict the intent, the reasons, et cetera, you can have them deployed operationalize them, so that when a call comes in real time, you can screen it in real time, do the speech to text, you can do this pass it to the supervise models that have been deployed, and the model fires and comes back and says this is the intent, take some action or guide the agent to take some action real time. >> Based on some automated discussion, so tell me what you're calling about, that kind of thing, >> Right. Is that right? >> So it's probably even gone past tell me what you're calling about. So it could be the conversation has begun to get into you know, I'm going through a tough time, my spouse had a job change. You know that is itself an indicator of some other reasons, and can that be used to prompt the CSR >> Ah, to take some action >> Ah, oh case. appropriate to the conversation. >> So I'm not talking to a machine, at first >> no no I'm talking to a human. >> Still talking to human. >> And then real time feedback to that human >> Exactly, exactly. is a good example of >> Exactly. human augmentation. >> Exactly, exactly. I wanted to go back and to process a little bit in terms of the model building. Are there humans involved in calibrating the model? >> There has to be. Yeah, there has to be. So you know, for all the hype in the industry, (laughter) you still need a (laughter). You know what it is is you need expertise to look at what these models produce, right. Because if you think about it, machine learning algorithms don't by themselves have an understanding of the domain. They are you know either statistical or similar in nature, so somebody has to marry the statistical observations with the domain expertise. So humans are definitely involved in the building of these models and claiming of these models. >> Okay. >> (inaudible). So that's who you got math, you got stats, you got some coding involved, and you >> Absolutely got humans are the last mile >> Absolutely. to really bring that >> Absolutely. expertise. And then in terms of operationalizing it, how does that actually get done? What tech behind that? >> Ah, yeah. >> It's a very good question, Dave. You build models, and what good are they if they stay inside your laptop, you know, they don't go anywhere. What you need to do is, I use a phrase, weave these models in your business processes and your applications. So you need a way to deploy these models. The models should be consumable from your business processes. Now it could be a Rest API Call could be a model. In some cases a Rest API Call is not sufficient, the latency is too high. Maybe you've got embed that model right into where your application is running. You know you've got data on a mainframe. A credit card transaction comes in, and the authorization for the credit card is happening in a four millisecond window on the mainframe on all, not all, but you know CICS COBOL Code. I don't have the time to make a Rest API call outside. I got to have the model execute in context with my CICS COBOL Code in that memory space. >> Yeah right. You know so the operationalizing is deploying, consuming these models, and then beyond that, how do the models behave over time? Because you can have the best programmer, the best data scientist build the absolute best model, which has got great accuracy, great performance today. Two weeks from now, performance is going to go down. >> Hmm. How do I monitor that? How do I trigger a loads map for below certain threshold. And, can I have a system in place that reclaims this model with new data as it comes in. >> So you got to understand where the data lives. >> Absolutely. You got to understand the physics, >> Yes. The latencies involved. >> Yes. You got to understand the economics. >> Yes. And there's also probably in many industries legal implications. >> Oh yes. >> No, the explainability of models. You know, can I prove that there is no bias here. >> Right. Now all of these are challenging but you know, doable things. >> What makes a successful engagement? Obviously you guys are outcome driven, >> Yeah. but talk about how you guys measure success. >> So um, for our team right now it is not about revenue, it's purely about adoption. Does the client, does the customer see the value of what IBM brings to the table. This is not just tools and technology, by the way. It's also expertise, right? >> Hmm. So this notion of expertise as a service, which is coupled with tools and technology to build a successful engagement. The way we measure success is has the client, have we built out the use case in a way that is useful for the business? Two, does a client see value in going further with that. So this is right now what we look at. It's not, you know yes of course everybody is scared about revenue. But that is not our key metric. Now in order to get there though, what we have found, a little bit of hard work, yes, uh, no you need different constituents of the customer to come together. It's not just me sending a bunch of awesome Python Programmers to the client. >> Yeah right. But now it is from the customer's side we need involvement from their Data Science Team. We talk about collaborating with them. We need involvement from their line of business. Because if the line of business doesn't care about the models we've produced you know, what good are they? >> Hmm. And third, people don't usually think about it, we need IT to be part of the discussion. Not just part of the discussion, part of being the stakeholder. >> Yes, so you've got, so IBM has the chops to actually bring these constituents together. >> Ya. I have actually a fair amount of experience in herding cats on large organizations. (laughter) And you know, the customer, they've got skin in the IBM game. This is to me a big differentiator between IBM, certainly some of the other technology suppliers who don't have the depth of services, expertise, and domain expertise. But on the flip side of that, differentiation from many of the a size who have that level of global expertise, but they don't have tech piece. >> Right. >> Now they would argue well we do anybodies tech. >> Ya. But you know, if you've got tech. >> Ya. >> You just got to (laughter) Ya. >> Bring those two together. >> Exactly. And that's really seems to me to be the big differentiator >> Yes, absolutely. for IBM. Well John, thanks so much for stopping by theCube and explaining sort of what you've been up to, the Data Science Elite Team, very exciting. Six to nine months in, >> Yes. are you declaring success yet? Still too early? >> Uh, well we're declaring success and we are growing, >> Ya. >> Growth is good. >> A lot of lot of attention. >> Alright, great to see you again, John. >> Absolutely, thanks you Dave. Thanks very much. Okay, keep it right there everybody. You're watching theCube. We're here at The Westin in midtown and we'll be right back after this short break. I'm Dave Vellante. (tech music)
SUMMARY :
Brought to you by IBM. he's kind of the host of the event. Thanks for coming on. last talked, what have you been up to?? We actually sit down, expect the client to use tiger teams and they're two to three And my understanding is you bring some That's the prerequisite. That is the prerequisite, because we're not And that's got to resonate and that's exactly So at the end of the two or three month period, How can we optimize the use of the plastic wrapping, Ya. You know very, What are some of the other use cases? intent of the conversation, but you So every time you call a call center (laughter) Yeah. So you're recording the calls maybe So call recording systems record the voice calls. You do kind of an entity do the speech to text, you can do this Is that right? has begun to get into you know, appropriate to the conversation. I'm talking to a human. is a good example of Exactly. a little bit in terms of the model building. You know what it is is you need So that's who you got math, you got stats, to really bring that how does that actually get done? I don't have the time to make a Rest API call outside. You know so the operationalizing is deploying, that reclaims this model with new data as it comes in. So you got to understand where You got to understand Yes. You got to understand And there's also probably in many industries No, the explainability of models. but you know, doable things. but talk about how you guys measure success. the value of what IBM brings to the table. constituents of the customer to come together. about the models we've produced you know, Not just part of the discussion, to actually bring these differentiation from many of the a size Now they would argue Ya. But you know, And that's really seems to me to be Six to nine months in, are you declaring success yet? Alright, great to see you Absolutely, thanks you Dave.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
John Thomas | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Six | QUANTITY | 0.99+ |
Time Square | LOCATION | 0.99+ |
tonight | DATE | 0.99+ |
first step | QUANTITY | 0.99+ |
three | QUANTITY | 0.99+ |
three month | QUANTITY | 0.99+ |
nine months | QUANTITY | 0.99+ |
third | QUANTITY | 0.98+ |
Two | QUANTITY | 0.98+ |
One | QUANTITY | 0.98+ |
New York City | LOCATION | 0.98+ |
today | DATE | 0.98+ |
Python | TITLE | 0.98+ |
IBM Analytics | ORGANIZATION | 0.97+ |
Terminal 5 | LOCATION | 0.97+ |
Data Science Elite Team | ORGANIZATION | 0.96+ |
Niagara | ORGANIZATION | 0.96+ |
one | QUANTITY | 0.96+ |
IBM.com/winwithAI | OTHER | 0.96+ |
first place | QUANTITY | 0.95+ |
eight months | QUANTITY | 0.94+ |
Change the Game: Winning With AI | TITLE | 0.89+ |
The Westin | ORGANIZATION | 0.89+ |
Niagara Bottling | PERSON | 0.89+ |
Theater District | LOCATION | 0.88+ |
four millisecond window | QUANTITY | 0.87+ |
step two | QUANTITY | 0.86+ |
Cube | PERSON | 0.85+ |
Westside Highway | LOCATION | 0.83+ |
first | QUANTITY | 0.83+ |
Two weeks | DATE | 0.82+ |
millions of calls | QUANTITY | 0.79+ |
two three data scientists | QUANTITY | 0.78+ |
CICS | TITLE | 0.77+ |
COBOL | OTHER | 0.69+ |
Rest API call | OTHER | 0.68+ |
The Tide | LOCATION | 0.68+ |
theCube | ORGANIZATION | 0.67+ |
The Westin | LOCATION | 0.66+ |
Rest API | OTHER | 0.66+ |
Apple | LOCATION | 0.63+ |
Big | ORGANIZATION | 0.62+ |
Westin | LOCATION | 0.51+ |
last six | DATE | 0.48+ |
Hotel | ORGANIZATION | 0.45+ |
theCube | TITLE | 0.33+ |
Bottling | COMMERCIAL_ITEM | 0.3+ |
Rob Thomas, IBM | Change the Game: Winning With AI
>> Live from Times Square in New York City, it's The Cube covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to The Cube's special presentation. We're covering IBM's announcements today around AI. IBM, as The Cube does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on The Cube, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching The Cube.
SUMMARY :
brought to you by IBM. Rob, great to see you. what you guys have going on, it's great when you have on the phases, the waves that we've seen where you want to go, you're the BI data warehouse modernization, a data catalog, if you and get the infrastructure right with, and help them get to a first and I think we have a as the architecture to news that you guys announced that are looking to do new things, I point it as that server, I get the data, of processing power on the the edge, where essentially, it's not just the IBM Cloud, Is that the right way to think about it? We need to give them seamless connectivity certain laws of the land, that is the runtime for people to consume. and it's not a lot of time, register for the event tonight. Yeah, it's going to be fun, we'll be back with our next guest
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
IBM | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
John Thomas | PERSON | 0.99+ |
two months | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
Rob | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Monday | DATE | 0.99+ |
last year | DATE | 0.99+ |
one month | QUANTITY | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
Germany | LOCATION | 0.99+ |
New York City | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
Vince Lombardi | PERSON | 0.99+ |
GDPR | TITLE | 0.99+ |
three people | QUANTITY | 0.99+ |
Watson Studio | TITLE | 0.99+ |
Cube | ORGANIZATION | 0.99+ |
ibm.com/WinWithAI | OTHER | 0.99+ |
two | QUANTITY | 0.99+ |
Times Square | LOCATION | 0.99+ |
both | QUANTITY | 0.99+ |
tonight | DATE | 0.99+ |
First | QUANTITY | 0.99+ |
today | DATE | 0.98+ |
Data Science Elite | ORGANIZATION | 0.98+ |
The Cube | TITLE | 0.98+ |
two steps | QUANTITY | 0.98+ |
Scala | TITLE | 0.98+ |
Python | TITLE | 0.98+ |
One | QUANTITY | 0.98+ |
three | QUANTITY | 0.98+ |
Barney | ORGANIZATION | 0.98+ |
Javits Center | LOCATION | 0.98+ |
Watson | TITLE | 0.98+ |
This evening | DATE | 0.98+ |
IBM Analytics | ORGANIZATION | 0.97+ |
one step | QUANTITY | 0.97+ |
Stack Overflow | ORGANIZATION | 0.96+ |
Cloud | TITLE | 0.96+ |
seven-figure deals | QUANTITY | 0.96+ |
Terminal 5 | LOCATION | 0.96+ |
Watson Applications | TITLE | 0.95+ |
Watson Machine Learning | TITLE | 0.94+ |
a month | QUANTITY | 0.94+ |
50 million developers | QUANTITY | 0.92+ |
John Thomas, IBM | IBM CDO Summit Spring 2018
>> Narrator: Live from downtown San Francisco, it's theCUBE, covering IBM Chief Data Officer Strategy Summit 2018, brought to you by IBM. >> We're back in San Francisco, we're here at the Parc 55 at the IBM Chief Data Officer Strategy Summit. You're watching theCUBE, the leader in live tech coverage. My name is Dave Vellante and IBM's Chief Data Officer Strategy Summit, they hold them on both coasts, one in Boston and one in San Francisco. A couple times each year, about 150 chief data officers coming in to learn how to apply their craft, learn what IBM is doing, share ideas. Great peer networking, really senior audience. John Thomas is here, he's a distinguished engineer and director at IBM, good to see you again John. >> Same to you. >> Thanks for coming back in theCUBE. So let's start with your role, distinguished engineer, we've had this conversation before but it just doesn't happen overnight, you've got to be accomplished, so congratulations on achieving that milestone, but what is your role? >> The road to distinguished engineer is long but today, these days I spend a lot of my time working on data science and in fact am part of what is called a data science elite team. We work with clients on data science engagements, so this is not consulting, this is not services, this is where a team of data scientists work collaboratively with a client on a specific use case and we build it out together. We bring data science expertise, machine learning, deep learning expertise. We work with the business and build out a set of tangible assets that are relevant to that particular client. >> So this is not a for-pay service, this is hey you're a great customer, a great client of ours, we're going to bring together some resources, you'll learn, we'll learn, we'll grow together, right? >> This is an investment IBM is making. It's a major investment for our top clients working with them on their use cases. >> This is a global initiative? >> This is global, yes. >> We're talking about, what, hundreds of clients, thousands of clients? >> Well eventually thousands but we're starting small. We are trying to scale now so obviously once you get into these engagements, you find out that it's not just about building some models. There are a lot of challenges that you've got to deal with in an enterprise setting. >> Dave: What are some of the challenges? >> Well in any data science engagement the first thing is to have clarity on the use case that you're engaging in. You don't want to build models for models' sake. Just because Tensorflow or scikit-learn is great and build models, that doesn't serve a purpose. That's the first thing, do you have clarity of the business use case itself? Then comes data, now I cannot stress this enough, Dave, there is no data science without data, and you might think this is the most obvious thing, of course there has to be data, but when I say data I'm talking about access to the right data. Do we have governance over the data? Do we know who touched the data? Do we have lineage on that data? Because garbage in, garbage out, you know this. Do we have access to the right data in the right control setting for my machine learning models we built. These are challenges and then there's another challenge around, okay, I built my models but how do I operationalize them? How do I weave those models into the fabric of my business? So these are all challenges that we have to deal with. >> That's interesting what you're saying about the data, it does sound obvious but having the right data model as well. I think about when I interact with Netflix, I don't talk to their customer service department or their marketing department or their sales department or their billing department, it's one experience. >> You just have an experience, exactly. >> This notion of incumbent disruptors, is that a logical starting point for these guys to get to that point where they have a data model that is a single data model? >> Single data model. (laughs) >> Dave: What does that mean, right? At least from an experienced standpoint. >> Once we know this is the kind of experience we want to target, what are the relevant data sets and data pieces that are necessary to make their experience happen or come together. Sometimes there's core enterprise data that you have in many cases, it has been augmented with external data. Do you have a strategy around handling your internal, external data, your structured transactional data, your semi-structured data, your newsfeeds. All of these need to come together in a consistent fashion for that experience to be true. It is not just about I've got my credit card transaction data but what else is augmenting that data? You need a model, you need a strategy around that. >> I talk to a lot of organizations and they say we have a good back-end reporting system, we have Cognos we can build cubes and all kinds of financial data that we have, but then it doesn't get down to the front line. We have an instrument at the front line, we talk about IOT and that portends change there but there's a lot of data that either isn't persisted or not stored or doesn't even exist, so is that one of the challenges that you see enterprises dealing with? >> It is a challenge. Do I have access to the right data, whether that is data at rest or in motion? Am I persisting it the way I can consume it later? Or am I just moving big volumes of data around because analytics is there, or machine learning is there and I have to move data out of my core systems into that area. That is just a waste of time, complexity, cost, hidden costs often, 'cause people don't usually think about the hidden costs of moving large volumes of data around. But instead of that can I bring analytics and machine learning and data science itself to where my data is. Not necessarily to move it around all the time. Whether you're dealing with streaming data or large volumes of data in your Hadoop environment or mainframes or whatever. Can I do ML in place and have the most value out of the data that is there? >> What's happening with all that Hadoop? Nobody talks about Hadoop anymore. Hadoop largely became a way to store data for less, but there's all this data now and a data lake. How are customers dealing with that? >> This is such an interesting thing. People used to talk about the big data, you're right. We jumped from there to the cognitive It's not like that right? No, without the data then there is no cognition there is no AI, there is no ML. In terms of existing investments in Hadoop for example, you have to absolutely be able to tap in and leverage those investments. For example, many large clients have investments in large Cloudera or Hortonworks environment, or Hadoop environments so if you're doing data science, how do you push down, how do you leverage that for scale, for example? How do you access the data using the same access control mechanisms that are already in place? Maybe you have Carbros as your mechanism how do you work with that? How do you avoid moving data off of that environment? How do you push down data prep into the spar cluster? How do you do model training in that spar cluster? All of these become important in terms of leveraging your existing investments. It is not just about accessing data where it is, it's also about leveraging the scale that the company has already invested in. You have hundred, 500 node Hadoop clusters well make the most of them in terms of scaling your data science operations. So push down and access data as much as possible in those environments. >> So Beth talked today, Beth Smith, about Watson's law, and she made a little joke about that, but to me its poignant because we are entering a new era. For decades this industry marched to the cadence of Moore's law, then of course Metcalfe's law in the internet era. I want to make an observation and see if it resonates. It seems like innovation is no longer going to come from doubling microprocessor speed and the network is there, it's built out, the internet is built. It seems like innovation comes from applying AI to data together to get insights and then being able to scale, so it's cloud economics. Marginal costs go to zero and massive network effects, and scale, ability to track innovation. That seems to be the innovation equation, but how do you operationalize that? >> To your point, Dave, when we say cloud scale, we want the flexibility to do that in an off RAM public cloud or in a private cloud or in between, in a hybrid cloud environment. When you talk about operationalizing, there's a couple different things. People think that, say I've got a super Python programmer and he's great with Tensorflow or scikit-learn or whatever and he builds these models, great, but what happens next, how do you actually operationalize those models? You need to be able to deploy those models easily. You need to be able to consume those models easily. For example you have a chatbot, a chatbot is dumb until it actually calls these machine learning models, real time to make decisions on which way the conversation should go. So how do you make that chatbot intelligent? It's when it consumes the ML models that have been built. So deploying models, consuming models, you create a model, you deploy it, you've got to push it through the development test staging production phases. Just the same rigor that you would have for any applications that are deployed. Then another thing is, a model is great on day one. Let's say I built a fraud detection model, it works great on day one. A week later, a month later it's useless because the data that it trained on is not what the fraudsters are using now. So patterns have changed, the model needs to be retrained How do I understand the performance of the model stays good over time? How do I do monitoring? How do I retrain the models? How do I do the life cycle management of the models and then scale? Which is okay I deployed this model out and its great, every application is calling it, maybe I have partners calling these models. How do I automatically scale? Whether what you are using behind the scenes or if you are going to use external clusters for scale? Technology is like spectrum connector from our HPC background are very interesting counterparts to this. How do I scale? How do I burst? How do I go from an on-frame to an off-frame environment? How do I build something behind the firewall but deploy it into the cloud? We have a chatbot or some other cloud-native application, all of these things become interesting in the operationalizing. >> So how do all these conversations that you're having with these global elite clients and the challenges that you're unpacking, how do they get back into innovation for IBM, what's that process like? >> It's an interesting place to be in because I am hearing and experiencing first hand real enterprise challenges and there we see our product doesn't handle this particular thing now? That is an immediate circling back with offering management and development. Hey guys we need this particular function because I'm seeing this happening again and again in customer engagements. So that helps us shape our products, shape our data science offerings, and sort of running with the flow of what everyone is doing, we'll look at that. What do our clients want? Where are they headed? And shape the products that way. >> Excellent, well John thanks very much for coming back in theCUBE and it's a pleasure to see you again. I appreciate your time. >> Thank you Dave. >> All right good to see you. Keep it right there everybody we'll be back with our next guest. We're live from the IBM CDO strategy summit in San Francisco, you're watching theCUBE.
SUMMARY :
brought to you by IBM. to see you again John. but what is your role? that are relevant to This is an investment IBM is making. into these engagements, you find out the first thing is to have but having the right data model as well. Single data model. Dave: What does that mean, right? for that experience to be true. so is that one of the challenges and I have to move data out but there's all this that the company has already invested in. and scale, ability to track innovation. How do I do the life cycle management to be in because I am hearing pleasure to see you again. All right good to see you.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
Beth Smith | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Beth | PERSON | 0.99+ |
Netflix | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
A week later | DATE | 0.99+ |
a month later | DATE | 0.99+ |
thousands | QUANTITY | 0.99+ |
Hadoop | TITLE | 0.99+ |
Watson | PERSON | 0.99+ |
one experience | QUANTITY | 0.99+ |
Moore | PERSON | 0.98+ |
today | DATE | 0.98+ |
Python | TITLE | 0.98+ |
Metcalfe | PERSON | 0.98+ |
Parc 55 | LOCATION | 0.97+ |
both coasts | QUANTITY | 0.97+ |
zero | QUANTITY | 0.96+ |
Single | QUANTITY | 0.96+ |
about 150 chief data officers | QUANTITY | 0.96+ |
day one | QUANTITY | 0.94+ |
Cognos | ORGANIZATION | 0.94+ |
each year | QUANTITY | 0.93+ |
hundreds of clients | QUANTITY | 0.92+ |
Hortonworks | ORGANIZATION | 0.91+ |
first thing | QUANTITY | 0.9+ |
Tensorflow | TITLE | 0.9+ |
IBM CDO Summit | EVENT | 0.87+ |
Strategy Summit | EVENT | 0.86+ |
hundred, 500 node Hadoop clusters | QUANTITY | 0.85+ |
thousands of clients | QUANTITY | 0.84+ |
single data model | QUANTITY | 0.81+ |
Strategy Summit 2018 | EVENT | 0.81+ |
Chief Data Officer | EVENT | 0.79+ |
IBM CDO strategy summit | EVENT | 0.79+ |
Chief Data Officer Strategy Summit | EVENT | 0.79+ |
couple times | QUANTITY | 0.77+ |
Cloudera | ORGANIZATION | 0.75+ |
decades | QUANTITY | 0.74+ |
Spring 2018 | DATE | 0.72+ |
Data Officer | EVENT | 0.67+ |
Carbros | ORGANIZATION | 0.63+ |
Tensorflow | ORGANIZATION | 0.61+ |
scikit | ORGANIZATION | 0.58+ |
theCUBE | ORGANIZATION | 0.58+ |
Data Science: Present and Future | IBM Data Science For All
>> Announcer: Live from New York City it's The Cube, covering IBM data science for all. Brought to you by IBM. (light digital music) >> Welcome back to data science for all. It's a whole new game. And it is a whole new game. >> Dave Vellante, John Walls here. We've got quite a distinguished panel. So it is a new game-- >> Well we're in the game, I'm just happy to be-- (both laugh) Have a swing at the pitch. >> Well let's what we have here. Five distinguished members of our panel. It'll take me a minute to get through the introductions, but believe me they're worth it. Jennifer Shin joins us. Jennifer's the founder of 8 Path Solutions, the director of the data science of Comcast and part of the faculty at UC Berkeley and NYU. Jennifer, nice to have you with us, we appreciate the time. Joe McKendrick an analyst and contributor of Forbes and ZDNet, Joe, thank you for being here at well. Another ZDNetter next to him, Dion Hinchcliffe, who is a vice president and principal analyst of Constellation Research and also contributes to ZDNet. Good to see you, sir. To the back row, but that doesn't mean anything about the quality of the participation here. Bob Hayes with a killer Batman shirt on by the way, which we'll get to explain in just a little bit. He runs the Business over Broadway. And Joe Caserta, who the founder of Caserta Concepts. Welcome to all of you. Thanks for taking the time to be with us. Jennifer, let me just begin with you. Obviously as a practitioner you're very involved in the industry, you're on the academic side as well. We mentioned Berkeley, NYU, steep experience. So I want you to kind of take your foot in both worlds and tell me about data science. I mean where do we stand now from those two perspectives? How have we evolved to where we are? And how would you describe, I guess the state of data science? >> Yeah so I think that's a really interesting question. There's a lot of changes happening. In part because data science has now become much more established, both in the academic side as well as in industry. So now you see some of the bigger problems coming out. People have managed to have data pipelines set up. But now there are these questions about models and accuracy and data integration. So the really cool stuff from the data science standpoint. We get to get really into the details of the data. And I think on the academic side you now see undergraduate programs, not just graduate programs, but undergraduate programs being involved. UC Berkeley just did a big initiative that they're going to offer data science to undergrads. So that's a huge news for the university. So I think there's a lot of interest from the academic side to continue data science as a major, as a field. But I think in industry one of the difficulties you're now having is businesses are now asking that question of ROI, right? What do I actually get in return in the initial years? So I think there's a lot of work to be done and just a lot of opportunity. It's great because people now understand better with data sciences, but I think data sciences have to really think about that seriously and take it seriously and really think about how am I actually getting a return, or adding a value to the business? >> And there's lot to be said is there not, just in terms of increasing the workforce, the acumen, the training that's required now. It's a still relatively new discipline. So is there a shortage issue? Or is there just a great need? Is the opportunity there? I mean how would you look at that? >> Well I always think there's opportunity to be smart. If you can be smarter, you know it's always better. It gives you advantages in the workplace, it gets you an advantage in academia. The question is, can you actually do the work? The work's really hard, right? You have to learn all these different disciplines, you have to be able to technically understand data. Then you have to understand it conceptually. You have to be able to model with it, you have to be able to explain it. There's a lot of aspects that you're not going to pick up overnight. So I think part of it is endurance. Like are people going to feel motivated enough and dedicate enough time to it to get very good at that skill set. And also of course, you know in terms of industry, will there be enough interest in the long term that there will be a financial motivation. For people to keep staying in the field, right? So I think it's definitely a lot of opportunity. But that's always been there. Like I tell people I think of myself as a scientist and data science happens to be my day job. That's just the job title. But if you are a scientist and you work with data you'll always want to work with data. I think that's just an inherent need. It's kind of a compulsion, you just kind of can't help yourself, but dig a little bit deeper, ask the questions, you can't not think about it. So I think that will always exist. Whether or not it's an industry job in the way that we see it today, and like five years from now, or 10 years from now. I think that's something that's up for debate. >> So all of you have watched the evolution of data and how it effects organizations for a number of years now. If you go back to the days when data warehouse was king, we had a lot of promises about 360 degree views of the customer and how we were going to be more anticipatory in terms and more responsive. In many ways the decision support systems and the data warehousing world didn't live up to those promises. They solved other problems for sure. And so everybody was looking for big data to solve those problems. And they've begun to attack many of them. We talked earlier in The Cube today about fraud detection, it's gotten much, much better. Certainly retargeting of advertising has gotten better. But I wonder if you could comment, you know maybe start with Joe. As to the effect that data and data sciences had on organizations in terms of fulfilling that vision of a 360 degree view of customers and anticipating customer needs. >> So. Data warehousing, I wouldn't say failed. But I think it was unfinished in order to achieve what we need done today. At the time I think it did a pretty good job. I think it was the only place where we were able to collect data from all these different systems, have it in a single place for analytics. The big difference between what I think, between data warehousing and data science is data warehouses were primarily made for the consumer to human beings. To be able to have people look through some tool and be able to analyze data manually. That really doesn't work anymore, there's just too much data to do that. So that's why we need to build a science around it so that we can actually have machines actually doing the analytics for us. And I think that's the biggest stride in the evolution over the past couple of years, that now we're actually able to do that, right? It used to be very, you know you go back to when data warehouses started, you had to be a deep technologist in order to be able to collect the data, write the programs to clean the data. But now you're average causal IT person can do that. Right now I think we're back in data science where you have to be a fairly sophisticated programmer, analyst, scientist, statistician, engineer, in order to do what we need to do, in order to make machines actually understand the data. But I think part of the evolution, we're just in the forefront. We're going to see over the next, not even years, within the next year I think a lot of new innovation where the average person within business and definitely the average person within IT will be able to do as easily say, "What are my sales going to be next year?" As easy as it is to say, "What were my sales last year." Where now it's a big deal. Right now in order to do that you have to build some algorithms, you have to be a specialist on predictive analytics. And I think, you know as the tools mature, as people using data matures, and as the technology ecosystem for data matures, it's going to be easier and more accessible. >> So it's still too hard. (laughs) That's something-- >> Joe C.: Today it is yes. >> You've written about and talked about. >> Yeah no question about it. We see this citizen data scientist. You know we talked about the democratization of data science but the way we talk about analytics and warehousing and all the tools we had before, they generated a lot of insights and views on the information, but they didn't really give us the science part. And that's, I think that what's missing is the forming of the hypothesis, the closing of the loop of. We now have use of this data, but are are changing, are we thinking about it strategically? Are we learning from it and then feeding that back into the process. I think that's the big difference between data science and the analytics side. But, you know just like Google made search available to everyone, not just people who had highly specialized indexers or crawlers. Now we can have tools that make these capabilities available to anyone. You know going back to what Joe said I think the key thing is we now have tools that can look at all the data and ask all the questions. 'Cause we can't possibly do it all ourselves. Our organizations are increasingly awash in data. Which is the life blood of our organizations, but we're not using it, you know this a whole concept of dark data. And so I think the concept, or the promise of opening these tools up for everyone to be able to access those insights and activate them, I think that, you know, that's where it's headed. >> This is kind of where the T shirt comes in right? So Bob if you would, so you've got this Batman shirt on. We talked a little bit about it earlier, but it plays right into what Dion's talking about. About tools and, I don't want to spoil it, but you go ahead (laughs) and tell me about it. >> Right, so. Batman is a super hero, but he doesn't have any supernatural powers, right? He can't fly on his own, he can't become invisible on his own. But the thing is he has the utility belt and he has these tools he can use to help him solve problems. For example he as the bat ring when he's confronted with a building that he wants to get over, right? So he pulls it out and uses that. So as data professionals we have all these tools now that these vendors are making. We have IBM SPSS, we have data science experience. IMB Watson that these data pros can now use it as part of their utility belt and solve problems that they're confronted with. So if you''re ever confronted with like a Churn problem and you have somebody who has access to that data they can put that into IBM Watson, ask a question and it'll tell you what's the key driver of Churn. So it's not that you have to be a superhuman to be a data scientist, but these tools will help you solve certain problems and help your business go forward. >> Joe McKendrick, do you have a comment? >> Does that make the Batmobile the Watson? (everyone laughs) Analogy? >> I was just going to add that, you know all of the billionaires in the world today and none of them decided to become Batman yet. It's very disappointing. >> Yeah. (Joe laughs) >> Go ahead Joe. >> And I just want to add some thoughts to our discussion about what happened with data warehousing. I think it's important to point out as well that data warehousing, as it existed, was fairly successful but for larger companies. Data warehousing is a very expensive proposition it remains a expensive proposition. Something that's in the domain of the Fortune 500. But today's economy is based on a very entrepreneurial model. The Fortune 500s are out there of course it's ever shifting. But you have a lot of smaller companies a lot of people with start ups. You have people within divisions of larger companies that want to innovate and not be tied to the corporate balance sheet. They want to be able to go through, they want to innovate and experiment without having to go through finance and the finance department. So there's all these open source tools available. There's cloud resources as well as open source tools. Hadoop of course being a prime example where you can work with the data and experiment with the data and practice data science at a very low cost. >> Dion mentioned the C word, citizen data scientist last year at the panel. We had a conversation about that. And the data scientists on the panel generally were like, "Stop." Okay, we're not all of a sudden going to turn everybody into data scientists however, what we want to do is get people thinking about data, more focused on data, becoming a data driven organization. I mean as a data scientist I wonder if you could comment on that. >> Well I think so the other side of that is, you know there are also many people who maybe didn't, you know follow through with science, 'cause it's also expensive. A PhD takes a lot of time. And you know if you don't get funding it's a lot of money. And for very little security if you think about how hard it is to get a teaching job that's going to give you enough of a pay off to pay that back. Right, the time that you took off, the investment that you made. So I think the other side of that is by making data more accessible, you allow people who could have been great in science, have an opportunity to be great data scientists. And so I think for me the idea of citizen data scientist, that's where the opportunity is. I think in terms of democratizing data and making it available for everyone, I feel as though it's something similar to the way we didn't really know what KPIs were, maybe 20 years ago. People didn't use it as readily, didn't teach it in schools. I think maybe 10, 20 years from now, some of the things that we're building today from data science, hopefully more people will understand how to use these tools. They'll have a better understanding of working with data and what that means, and just data literacy right? Just being able to use these tools and be able to understand what data's saying and actually what it's not saying. Which is the thing that most people don't think about. But you can also say that data doesn't say anything. There's a lot of noise in it. There's too much noise to be able to say that there is a result. So I think that's the other side of it. So yeah I guess in terms for me, in terms of data a serious data scientist, I think it's a great idea to have that, right? But at the same time of course everyone kind of emphasized you don't want everyone out there going, "I can be a data scientist without education, "without statistics, without math," without understanding of how to implement the process. I've seen a lot of companies implement the same sort of process from 10, 20 years ago just on Hadoop instead of SQL. Right and it's very inefficient. And the only difference is that you can build more tables wrong than they could before. (everyone laughs) Which is I guess >> For less. it's an accomplishment and for less, it's cheaper, yeah. >> It is cheaper. >> Otherwise we're like I'm not a data scientist but I did stay at a Holiday Inn Express last night, right? >> Yeah. (panelists laugh) And there's like a little bit of pride that like they used 2,000, you know they used 2,000 computers to do it. Like a little bit of pride about that, but you know of course maybe not a great way to go. I think 20 years we couldn't do that, right? One computer was already an accomplishment to have that resource. So I think you have to think about the fact that if you're doing it wrong, you're going to just make that mistake bigger, which his also the other side of working with data. >> Sure, Bob. >> Yeah I have a comment about that. I've never liked the term citizen data scientist or citizen scientist. I get the point of it and I think employees within companies can help in the data analytics problem by maybe being a data collector or something. I mean I would never have just somebody become a scientist based on a few classes here she takes. It's like saying like, "Oh I'm going to be a citizen lawyer." And so you come to me with your legal problems, or a citizen surgeon. Like you need training to be good at something. You can't just be good at something just 'cause you want to be. >> John: Joe you wanted to say something too on that. >> Since we're in New York City I'd like to use the analogy of a real scientist versus a data scientist. So real scientist requires tools, right? And the tools are not new, like microscopes and a laboratory and a clean room. And these tools have evolved over years and years, and since we're in New York we could walk within a 10 block radius and buy any of those tools. It doesn't make us a scientist because we use those tools. I think with data, you know making, making the tools evolve and become easier to use, you know like Bob was saying, it doesn't make you a better data scientist, it just makes the data more accessible. You know we can go buy a microscope, we can go buy Hadoop, we can buy any kind of tool in a data ecosystem, but it doesn't really make you a scientist. I'm very involved in the NYU data science program and the Columbia data science program, like these kids are brilliant. You know these kids are not someone who is, you know just trying to run a day to day job, you know in corporate America. I think the people who are running the day to day job in corporate America are going to be the recipients of data science. Just like people who take drugs, right? As a result of a smart data scientist coming up with a formula that can help people, I think we're going to make it easier to distribute the data that can help people with all the new tools. But it doesn't really make it, you know the access to the data and tools available doesn't really make you a better data scientist. Without, like Bob was saying, without better training and education. >> So how-- I'm sorry, how do you then, if it's not for everybody, but yet I'm the user at the end of the day at my company and I've got these reams of data before me, how do you make it make better sense to me then? So that's where machine learning comes in or artificial intelligence and all this stuff. So how at the end of the day, Dion? How do you make it relevant and usable, actionable to somebody who might not be as practiced as you would like? >> I agree with Joe that many of us will be the recipients of data science. Just like you had to be a computer science at one point to develop programs for a computer, now we can get the programs. You don't need to be a computer scientist to get a lot of value out of our IT systems. The same thing's going to happen with data science. There's far more demand for data science than there ever could be produced by, you know having an ivory tower filled with data scientists. Which we need those guys, too, don't get me wrong. But we need to have, productize it and make it available in packages such that it can be consumed. The outputs and even some of the inputs can be provided by mere mortals, whether that's machine learning or artificial intelligence or bots that go off and run the hypotheses and select the algorithms maybe with some human help. We have to productize it. This is a constant of data scientist of service, which is becoming a thing now. It's, "I need this, I need this capability at scale. "I need it fast and I need it cheap." The commoditization of data science is going to happen. >> That goes back to what I was saying about, the recipient also of data science is also machines, right? Because I think the other thing that's happening now in the evolution of data is that, you know the data is, it's so tightly coupled. Back when you were talking about data warehousing you have all the business transactions then you take the data out of those systems, you put them in a warehouse for analysis, right? Maybe they'll make a decision to change that system at some point. Now the analytics platform and the business application is very tightly coupled. They become dependent upon one another. So you know people who are using the applications are now be able to take advantage of the insights of data analytics and data science, just through the app. Which never really existed before. >> I have one comment on that. You were talking about how do you get the end user more involved, well like we said earlier data science is not easy, right? As an end user, I encourage you to take a stats course, just a basic stats course, understanding what a mean is, variability, regression analysis, just basic stuff. So you as an end user can get more, or glean more insight from the reports that you're given, right? If you go to France and don't know French, then people can speak really slowly to you in French, you're not going to get it. You need to understand the language of data to get value from the technology we have available to us. >> Incidentally French is one of the languages that you have the option of learning if you're a mathematicians. So math PhDs are required to learn a second language. France being the country of algebra, that's one of the languages you could actually learn. Anyway tangent. But going back to the point. So statistics courses, definitely encourage it. I teach statistics. And one of the things that I'm finding as I go through the process of teaching it I'm actually bringing in my experience. And by bringing in my experience I'm actually kind of making the students think about the data differently. So the other thing people don't think about is the fact that like statisticians typically were expected to do, you know, just basic sort of tasks. In a sense that they're knowledge is specialized, right? But the day to day operations was they ran some data, you know they ran a test on some data, looked at the results, interpret the results based on what they were taught in school. They didn't develop that model a lot of times they just understand what the tests were saying, especially in the medical field. So when you when think about things like, we have words like population, census. Which is when you take data from every single, you have every single data point versus a sample, which is a subset. It's a very different story now that we're collecting faster than it used to be. It used to be the idea that you could collect information from everyone. Like it happens once every 10 years, we built that in. But nowadays you know, you know here about Facebook, for instance, I think they claimed earlier this year that their data was more accurate than the census data. So now there are these claims being made about which data source is more accurate. And I think the other side of this is now statisticians are expected to know data in a different way than they were before. So it's not just changing as a field in data science, but I think the sciences that are using data are also changing their fields as well. >> Dave: So is sampling dead? >> Well no, because-- >> Should it be? (laughs) >> Well if you're sampling wrong, yes. That's really the question. >> Okay. You know it's been said that the data doesn't lie, people do. Organizations are very political. Oftentimes you know, lies, damned lies and statistics, Benjamin Israeli. Are you seeing a change in the way in which organizations are using data in the context of the politics. So, some strong P&L manager say gets data and crafts it in a way that he or she can advance their agenda. Or they'll maybe attack a data set that is, probably should drive them in a different direction, but might be antithetical to their agenda. Are you seeing data, you know we talked about democratizing data, are you seeing that reduce the politics inside of organizations? >> So you know we've always used data to tell stories at the top level of an organization that's what it's all about. And I still see very much that no matter how much data science or, the access to the truth through looking at the numbers that story telling is still the political filter through which all that data still passes, right? But it's the advent of things like Block Chain, more and more corporate records and corporate information is going to end up in these open and shared repositories where there is not alternate truth. It'll come back to whoever tells the best stories at the end of the day. So I still see the organizations are very political. We are seeing now more open data though. Open data initiatives are a big thing, both in government and in the private sector. It is having an effect, but it's slow and steady. So that's what I see. >> Um, um, go ahead. >> I was just going to say as well. Ultimately I think data driven decision making is a great thing. And it's especially useful at the lower tiers of the organization where you have the routine day to day's decisions that could be automated through machine learning and deep learning. The algorithms can be improved on a constant basis. On the upper levels, you know that's why you pay executives the big bucks in the upper levels to make the strategic decisions. And data can help them, but ultimately, data, IT, technology alone will not create new markets, it will not drive new businesses, it's up to human beings to do that. The technology is the tool to help them make those decisions. But creating businesses, growing businesses, is very much a human activity. And that's something I don't see ever getting replaced. Technology might replace many other parts of the organization, but not that part. >> I tend to be a foolish optimist when it comes to this stuff. >> You do. (laughs) >> I do believe that data will make the world better. I do believe that data doesn't lie people lie. You know I think as we start, I'm already seeing trends in industries, all different industries where, you know conventional wisdom is starting to get trumped by analytics. You know I think it's still up to the human being today to ignore the facts and go with what they think in their gut and sometimes they win, sometimes they lose. But generally if they lose the data will tell them that they should have gone the other way. I think as we start relying more on data and trusting data through artificial intelligence, as we start making our lives a little bit easier, as we start using smart cars for safety, before replacement of humans. AS we start, you know, using data really and analytics and data science really as the bumpers, instead of the vehicle, eventually we're going to start to trust it as the vehicle itself. And then it's going to make lying a little bit harder. >> Okay, so great, excellent. Optimism, I love it. (John laughs) So I'm going to play devil's advocate here a little bit. There's a couple elephant in the room topics that I want to, to explore a little bit. >> Here it comes. >> There was an article today in Wired. And it was called, Why AI is Still Waiting for It's Ethics Transplant. And, I will just read a little segment from there. It says, new ethical frameworks for AI need to move beyond individual responsibility to hold powerful industrial, government and military interests accountable as they design and employ AI. When tech giants build AI products, too often user consent, privacy and transparency are overlooked in favor of frictionless functionality that supports profit driven business models based on aggregate data profiles. This is from Kate Crawford and Meredith Whittaker who founded AI Now. And they're calling for sort of, almost clinical trials on AI, if I could use that analogy. Before you go to market you've got to test the human impact, the social impact. Thoughts. >> And also have the ability for a human to intervene at some point in the process. This goes way back. Is everybody familiar with the name Stanislav Petrov? He's the Soviet officer who back in 1983, it was in the control room, I guess somewhere outside of Moscow in the control room, which detected a nuclear missile attack against the Soviet Union coming out of the United States. Ordinarily I think if this was an entirely AI driven process we wouldn't be sitting here right now talking about it. But this gentlemen looked at what was going on on the screen and, I'm sure he's accountable to his authorities in the Soviet Union. He probably got in a lot of trouble for this, but he decided to ignore the signals, ignore the data coming out of, from the Soviet satellites. And as it turned out, of course he was right. The Soviet satellites were seeing glints of the sun and they were interpreting those glints as missile launches. And I think that's a great example why, you know every situation of course doesn't mean the end of the world, (laughs) it was in this case. But it's a great example why there needs to be a human component, a human ability for human intervention at some point in the process. >> So other thoughts. I mean organizations are driving AI hard for profit. Best minds of our generation are trying to figure out how to get people to click on ads. Jeff Hammerbacher is famous for saying it. >> You can use data for a lot of things, data analytics, you can solve, you can cure cancer. You can make customers click on more ads. It depends on what you're goal is. But, there are ethical considerations we need to think about. When we have data that will have a racial bias against blacks and have them have higher prison sentences or so forth or worse credit scores, so forth. That has an impact on a broad group of people. And as a society we need to address that. And as scientists we need to consider how are we going to fix that problem? Cathy O'Neil in her book, Weapons of Math Destruction, excellent book, I highly recommend that your listeners read that book. And she talks about these issues about if AI, if algorithms have a widespread impact, if they adversely impact protected group. And I forget the last criteria, but like we need to really think about these things as a people, as a country. >> So always think the idea of ethics is interesting. So I had this conversation come up a lot of times when I talk to data scientists. I think as a concept, right as an idea, yes you want things to be ethical. The question I always pose to them is, "Well in the business setting "how are you actually going to do this?" 'Cause I find the most difficult thing working as a data scientist, is to be able to make the day to day decision of when someone says, "I don't like that number," how do you actually get around that. If that's the right data to be showing someone or if that's accurate. And say the business decides, "Well we don't like that number." Many people feel pressured to then change the data, change, or change what the data shows. So I think being able to educate people to be able to find ways to say what the data is saying, but not going past some line where it's a lie, where it's unethical. 'Cause you can also say what data doesn't say. You don't always have to say what the data does say. You can leave it as, "Here's what we do know, "but here's what we don't know." There's a don't know part that many people will omit when they talk about data. So I think, you know especially when it comes to things like AI it's tricky, right? Because I always tell people I don't know everyone thinks AI's going to be so amazing. I started an industry by fixing problems with computers that people didn't realize computers had. For instance when you have a system, a lot of bugs, we all have bug reports that we've probably submitted. I mean really it's no where near the point where it's going to start dominating our lives and taking over all the jobs. Because frankly it's not that advanced. It's still run by people, still fixed by people, still managed by people. I think with ethics, you know a lot of it has to do with the regulations, what the laws say. That's really going to be what's involved in terms of what people are willing to do. A lot of businesses, they want to make money. If there's no rules that says they can't do certain things to make money, then there's no restriction. I think the other thing to think about is we as consumers, like everyday in our lives, we shouldn't separate the idea of data as a business. We think of it as a business person, from our day to day consumer lives. Meaning, yes I work with data. Incidentally I also always opt out of my credit card, you know when they send you that information, they make you actually mail them, like old school mail, snail mail like a document that says, okay I don't want to be part of this data collection process. Which I always do. It's a little bit more work, but I go through that step of doing that. Now if more people did that, perhaps companies would feel more incentivized to pay you for your data, or give you more control of your data. Or at least you know, if a company's going to collect information, I'd want you to be certain processes in place to ensure that it doesn't just get sold, right? For instance if a start up gets acquired what happens with that data they have on you? You agree to give it to start up. But I mean what are the rules on that? So I think we have to really think about the ethics from not just, you know, someone who's going to implement something but as consumers what control we have for our own data. 'Cause that's going to directly impact what businesses can do with our data. >> You know you mentioned data collection. So slightly on that subject. All these great new capabilities we have coming. We talked about what's going to happen with media in the future and what 5G technology's going to do to mobile and these great bandwidth opportunities. The internet of things and the internet of everywhere. And all these great inputs, right? Do we have an arms race like are we keeping up with the capabilities to make sense of all the new data that's going to be coming in? And how do those things square up in this? Because the potential is fantastic, right? But are we keeping up with the ability to make it make sense and to put it to use, Joe? >> So I think data ingestion and data integration is probably one of the biggest challenges. I think, especially as the world is starting to become more dependent on data. I think you know, just because we're dependent on numbers we've come up with GAAP, which is generally accepted accounting principles that can be audited and proven whether it's true or false. I think in our lifetime we will see something similar to that we will we have formal checks and balances of data that we use that can be audited. Getting back to you know what Dave was saying earlier about, I personally would trust a machine that was programmed to do the right thing, than to trust a politician or some leader that may have their own agenda. And I think the other thing about machines is that they are auditable. You know you can look at the code and see exactly what it's doing and how it's doing it. Human beings not so much. So I think getting to the truth, even if the truth isn't the answer that we want, I think is a positive thing. It's something that we can't do today that once we start relying on machines to do we'll be able to get there. >> Yeah I was just going to add that we live in exponential times. And the challenge is that the way that we're structured traditionally as organizations is not allowing us to absorb advances exponentially, it's linear at best. Everyone talks about change management and how are we going to do digital transformation. Evidence shows that technology's forcing the leaders and the laggards apart. There's a few leading organizations that are eating the world and they seem to be somehow rolling out new things. I don't know how Amazon rolls out all this stuff. There's all this artificial intelligence and the IOT devices, Alexa, natural language processing and that's just a fraction, it's just a tip of what they're releasing. So it just shows that there are some organizations that have path found the way. Most of the Fortune 500 from the year 2000 are gone already, right? The disruption is happening. And so we are trying, have to find someway to adopt these new capabilities and deploy them effectively or the writing is on the wall. I spent a lot of time exploring this topic, how are we going to get there and all of us have a lot of hard work is the short answer. >> I read that there's going to be more data, or it was predicted, more data created in this year than in the past, I think it was five, 5,000 years. >> Forever. (laughs) >> And that to mix the statistics that we're analyzing currently less than 1% of the data. To taking those numbers and hear what you're all saying it's like, we're not keeping up, it seems like we're, it's not even linear. I mean that gap is just going to grow and grow and grow. How do we close that? >> There's a guy out there named Chris Dancy, he's known as the human cyborg. He has 700 hundred sensors all over his body. And his theory is that data's not new, having access to the data is new. You know we've always had a blood pressure, we've always had a sugar level. But we were never able to actually capture it in real time before. So now that we can capture and harness it, now we can be smarter about it. So I think that being able to use this information is really incredible like, this is something that over our lifetime we've never had and now we can do it. Which hence the big explosion in data. But I think how we use it and have it governed I think is the challenge right now. It's kind of cowboys and indians out there right now. And without proper governance and without rigorous regulation I think we are going to have some bumps in the road along the way. >> The data's in the oil is the question how are we actually going to operationalize around it? >> Or find it. Go ahead. >> I will say the other side of it is, so if you think about information, we always have the same amount of information right? What we choose to record however, is a different story. Now if you want wanted to know things about the Olympics, but you decide to collect information every day for years instead of just the Olympic year, yes you have a lot of data, but did you need all of that data? For that question about the Olympics, you don't need to collect data during years there are no Olympics, right? Unless of course you're comparing it relative. But I think that's another thing to think about. Just 'cause you collect more data does not mean that data will produce more statistically significant results, it does not mean it'll improve your model. You can be collecting data about your shoe size trying to get information about your hair. I mean it really does depend on what you're trying to measure, what your goals are, and what the data's going to be used for. If you don't factor the real world context into it, then yeah you can collect data, you know an infinite amount of data, but you'll never process it. Because you have no question to ask you're not looking to model anything. There is no universal truth about everything, that just doesn't exist out there. >> I think she's spot on. It comes down to what kind of questions are you trying to ask of your data? You can have one given database that has 100 variables in it, right? And you can ask it five different questions, all valid questions and that data may have those variables that'll tell you what's the best predictor of Churn, what's the best predictor of cancer treatment outcome. And if you can ask the right question of the data you have then that'll give you some insight. Just data for data's sake, that's just hype. We have a lot of data but it may not lead to anything if we don't ask it the right questions. >> Joe. >> I agree but I just want to add one thing. This is where the science in data science comes in. Scientists often will look at data that's already been in existence for years, weather forecasts, weather data, climate change data for example that go back to data charts and so forth going back centuries if that data is available. And they reformat, they reconfigure it, they get new uses out of it. And the potential I see with the data we're collecting is it may not be of use to us today, because we haven't thought of ways to use it, but maybe 10, 20, even 100 years from now someone's going to think of a way to leverage the data, to look at it in new ways and to come up with new ideas. That's just my thought on the science aspect. >> Knowing what you know about data science, why did Facebook miss Russia and the fake news trend? They came out and admitted it. You know, we miss it, why? Could they have, is it because they were focused elsewhere? Could they have solved that problem? (crosstalk) >> It's what you said which is are you asking the right questions and if you're not looking for that problem in exactly the way that it occurred you might not be able to find it. >> I thought the ads were paid in rubles. Shouldn't that be your first clue (panelists laugh) that something's amiss? >> You know red flag, so to speak. >> Yes. >> I mean Bitcoin maybe it could have hidden it. >> Bob: Right, exactly. >> I would think too that what happened last year is actually was the end of an age of optimism. I'll bring up the Soviet Union again, (chuckles). It collapsed back in 1991, 1990, 1991, Russia was reborn in. And think there was a general feeling of optimism in the '90s through the 2000s that Russia is now being well integrated into the world economy as other nations all over the globe, all continents are being integrated into the global economy thanks to technology. And technology is lifting entire continents out of poverty and ensuring more connectedness for people. Across Africa, India, Asia, we're seeing those economies that very different countries than 20 years ago and that extended into Russia as well. Russia is part of the global economy. We're able to communicate as a global, a global network. I think as a result we kind of overlook the dark side that occurred. >> John: Joe? >> Again, the foolish optimist here. But I think that... It shouldn't be the question like how did we miss it? It's do we have the ability now to catch it? And I think without data science without machine learning, without being able to train machines to look for patterns that involve corruption or result in corruption, I think we'd be out of luck. But now we have those tools. And now hopefully, optimistically, by the next election we'll be able to detect these things before they become public. >> It's a loaded question because my premise was Facebook had the ability and the tools and the knowledge and the data science expertise if in fact they wanted to solve that problem, but they were focused on other problems, which is how do I get people to click on ads? >> Right they had the ability to train the machines, but they were giving the machines the wrong training. >> Looking under the wrong rock. >> (laughs) That's right. >> It is easy to play armchair quarterback. Another topic I wanted to ask the panel about is, IBM Watson. You guys spend time in the Valley, I spend time in the Valley. People in the Valley poo-poo Watson. Ah, Google, Facebook, Amazon they've got the best AI. Watson, and some of that's fair criticism. Watson's a heavy lift, very services oriented, you just got to apply it in a very focused. At the same time Google's trying to get you to click on Ads, as is Facebook, Amazon's trying to get you to buy stuff. IBM's trying to solve cancer. Your thoughts on that sort of juxtaposition of the different AI suppliers and there may be others. Oh, nobody wants to touch this one, come on. I told you elephant in the room questions. >> Well I mean you're looking at two different, very different types of organizations. One which is really spent decades in applying technology to business and these other companies are ones that are primarily into the consumer, right? When we talk about things like IBM Watson you're looking at a very different type of solution. You used to be able to buy IT and once you installed it you pretty much could get it to work and store your records or you know, do whatever it is you needed it to do. But these types of tools, like Watson actually tries to learn your business. And it needs to spend time doing that watching the data and having its models tuned. And so you don't get the results right away. And I think that's been kind of the challenge that organizations like IBM has had. Like this is a different type of technology solution, one that has to actually learn first before it can provide value. And so I think you know you have organizations like IBM that are much better at applying technology to business, and then they have the further hurdle of having to try to apply these tools that work in very different ways. There's education too on the side of the buyer. >> I'd have to say that you know I think there's plenty of businesses out there also trying to solve very significant, meaningful problems. You know with Microsoft AI and Google AI and IBM Watson, I think it's not really the tool that matters, like we were saying earlier. A fool with a tool is still a fool. And regardless of who the manufacturer of that tool is. And I think you know having, a thoughtful, intelligent, trained, educated data scientist using any of these tools can be equally effective. >> So do you not see core AI competence and I left out Microsoft, as a strategic advantage for these companies? Is it going to be so ubiquitous and available that virtually anybody can apply it? Or is all the investment in R&D and AI going to pay off for these guys? >> Yeah, so I think there's different levels of AI, right? So there's AI where you can actually improve the model. I remember when I was invited when Watson was kind of first out by IBM to a private, sort of presentation. And my question was, "Okay, so when do I get "to access the corpus?" The corpus being sort of the foundation of NLP, which is natural language processing. So it's what you use as almost like a dictionary. Like how you're actually going to measure things, or things up. And they said, "Oh you can't." "What do you mean I can't?" It's like, "We do that." "So you're telling me as a data scientist "you're expecting me to rely on the fact "that you did it better than me and I should rely on that." I think over the years after that IBM started opening it up and offering different ways of being able to access the corpus and work with that data. But I remember at the first Watson hackathon there was only two corpus available. It was either the travel or medicine. There was no other foundational data available. So I think one of the difficulties was, you know IBM being a little bit more on the forefront of it they kind of had that burden of having to develop these systems and learning kind of the hard way that if you don't have the right models and you don't have the right data and you don't have the right access, that's going to be a huge limiter. I think with things like medical, medical information that's an extremely difficult data to start with. Partly because you know anything that you do find or don't find, the impact is significant. If I'm looking at things like what people clicked on the impact of using that data wrong, it's minimal. You might lose some money. If you do that with healthcare data, if you do that with medical data, people may die, like this is a much more difficult data set to start with. So I think from a scientific standpoint it's great to have any information about a new technology, new process. That's the nice that is that IBM's obviously invested in it and collected information. I think the difficulty there though is just 'cause you have it you can't solve everything. And if feel like from someone who works in technology, I think in general when you appeal to developers you try not to market. And with Watson it's very heavily marketed, which tends to turn off people who are more from the technical side. Because I think they don't like it when it's gimmicky in part because they do the opposite of that. They're always trying to build up the technical components of it. They don't like it when you're trying to convince them that you're selling them something when you could just give them the specs and look at it. So it could be something as simple as communication. But I do think it is valuable to have had a company who leads on the forefront of that and try to do so we can actually learn from what IBM has learned from this process. >> But you're an optimist. (John laughs) All right, good. >> Just one more thought. >> Joe go ahead first. >> Joe: I want to see how Alexa or Siri do on Jeopardy. (panelists laugh) >> All right. Going to go around a final thought, give you a second. Let's just think about like your 12 month crystal ball. In terms of either challenges that need to be met in the near term or opportunities you think will be realized. 12, 18 month horizon. Bob you've got the microphone headed up, so I'll let you lead off and let's just go around. >> I think a big challenge for business, for society is getting people educated on data and analytics. There's a study that was just released I think last month by Service Now, I think, or some vendor, or Click. They found that only 17% of the employees in Europe have the ability to use data in their job. Think about that. >> 17. >> 17. Less than 20%. So these people don't have the ability to understand or use data intelligently to improve their work performance. That says a lot about the state we're in today. And that's Europe. It's probably a lot worse in the United States. So that's a big challenge I think. To educate the masses. >> John: Joe. >> I think we probably have a better chance of improving technology over training people. I think using data needs to be iPhone easy. And I think, you know which means that a lot of innovation is in the years to come. I do think that a keyboard is going to be a thing of the past for the average user. We are going to start using voice a lot more. I think augmented reality is going to be things that becomes a real reality. Where we can hold our phone in front of an object and it will have an overlay of prices where it's available, if it's a person. I think that we will see within an organization holding a camera up to someone and being able to see what is their salary, what sales did they do last year, some key performance indicators. I hope that we are beyond the days of everyone around the world walking around like this and we start actually becoming more social as human beings through augmented reality. I think, it has to happen. I think we're going through kind of foolish times at the moment in order to get to the greater good. And I think the greater good is using technology in a very, very smart way. Which means that you shouldn't have to be, sorry to contradict, but maybe it's good to counterpoint. I don't think you need to have a PhD in SQL to use data. Like I think that's 1990. I think as we evolve it's going to become easier for the average person. Which means people like the brain trust here needs to get smarter and start innovating. I think the innovation around data is really at the tip of the iceberg, we're going to see a lot more of it in the years to come. >> Dion why don't you go ahead, then we'll come down the line here. >> Yeah so I think over that time frame two things are likely to happen. One is somebody's going to crack the consumerization of machine learning and AI, such that it really is available to the masses and we can do much more advanced things than we could. We see the industries tend to reach an inflection point and then there's an explosion. No one's quite cracked the code on how to really bring this to everyone, but somebody will. And that could happen in that time frame. And then the other thing that I think that almost has to happen is that the forces for openness, open data, data sharing, open data initiatives things like Block Chain are going to run headlong into data protection, data privacy, customer privacy laws and regulations that have to come down and protect us. Because the industry's not doing it, the government is stepping in and it's going to re-silo a lot of our data. It's going to make it recede and make it less accessible, making data science harder for a lot of the most meaningful types of activities. Patient data for example is already all locked down. We could do so much more with it, but health start ups are really constrained about what they can do. 'Cause they can't access the data. We can't even access our own health care records, right? So I think that's the challenge is we have to have that battle next to be able to go and take the next step. >> Well I see, with the growth of data a lot of it's coming through IOT, internet of things. I think that's a big source. And we're going to see a lot of innovation. A new types of Ubers or Air BnBs. Uber's so 2013 though, right? We're going to see new companies with new ideas, new innovations, they're going to be looking at the ways this data can be leveraged all this big data. Or data coming in from the IOT can be leveraged. You know there's some examples out there. There's a company for example that is outfitting tools, putting sensors in the tools. Industrial sites can therefore track where the tools are at any given time. This is an expensive, time consuming process, constantly loosing tools, trying to locate tools. Assessing whether the tool's being applied to the production line or the right tool is at the right torque and so forth. With the sensors implanted in these tools, it's now possible to be more efficient. And there's going to be innovations like that. Maybe small start up type things or smaller innovations. We're going to see a lot of new ideas and new types of approaches to handling all this data. There's going to be new business ideas. The next Uber, we may be hearing about it a year from now whatever that may be. And that Uber is going to be applying data, probably IOT type data in some, new innovative way. >> Jennifer, final word. >> Yeah so I think with data, you know it's interesting, right, for one thing I think on of the things that's made data more available and just people we open to the idea, has been start ups. But what's interesting about this is a lot of start ups have been acquired. And a lot of people at start ups that got acquired now these people work at bigger corporations. Which was the way it was maybe 10 years ago, data wasn't available and open, companies kept it very proprietary, you had to sign NDAs. It was like within the last 10 years that open source all of that initiatives became much more popular, much more open, a acceptable sort of way to look at data. I think that what I'm kind of interested in seeing is what people do within the corporate environment. Right, 'cause they have resources. They have funding that start ups don't have. And they have backing, right? Presumably if you're acquired you went in at a higher title in the corporate structure whereas if you had started there you probably wouldn't be at that title at that point. So I think you have an opportunity where people who have done innovative things and have proven that they can build really cool stuff, can now be in that corporate environment. I think part of it's going to be whether or not they can really adjust to sort of the corporate, you know the corporate landscape, the politics of it or the bureaucracy. I think every organization has that. Being able to navigate that is a difficult thing in part 'cause it's a human skill set, it's a people skill, it's a soft skill. It's not the same thing as just being able to code something and sell it. So you know it's going to really come down to people. I think if people can figure out for instance, what people want to buy, what people think, in general that's where the money comes from. You know you make money 'cause someone gave you money. So if you can find a way to look at a data or even look at technology and understand what people are doing, aren't doing, what they're happy about, unhappy about, there's always opportunity in collecting the data in that way and being able to leverage that. So you build cooler things, and offer things that haven't been thought of yet. So it's a very interesting time I think with the corporate resources available if you can do that. You know who knows what we'll have in like a year. >> I'll add one. >> Please. >> The majority of companies in the S&P 500 have a market cap that's greater than their revenue. The reason is 'cause they have IP related to data that's of value. But most of those companies, most companies, the vast majority of companies don't have any way to measure the value of that data. There's no GAAP accounting standard. So they don't understand the value contribution of their data in terms of how it helps them monetize. Not the data itself necessarily, but how it contributes to the monetization of the company. And I think that's a big gap. If you don't understand the value of the data that means you don't understand how to refine it, if data is the new oil and how to protect it and so forth and secure it. So that to me is a big gap that needs to get closed before we can actually say we live in a data driven world. >> So you're saying I've got an asset, I don't know if it's worth this or this. And they're missing that great opportunity. >> So devolve to what I know best. >> Great discussion. Really, really enjoyed the, the time as flown by. Joe if you get that augmented reality thing to work on the salary, point it toward that guy not this guy, okay? (everyone laughs) It's much more impressive if you point it over there. But Joe thank you, Dion, Joe and Jennifer and Batman. We appreciate and Bob Hayes, thanks for being with us. >> Thanks you guys. >> Really enjoyed >> Great stuff. >> the conversation. >> And a reminder coming up a the top of the hour, six o'clock Eastern time, IBMgo.com featuring the live keynote which is being set up just about 50 feet from us right now. Nick Silver is one of the headliners there, John Thomas is well, or rather Rob Thomas. John Thomas we had on earlier on The Cube. But a panel discussion as well coming up at six o'clock on IBMgo.com, six to 7:15. Be sure to join that live stream. That's it from The Cube. We certainly appreciate the time. Glad to have you along here in New York. And until the next time, take care. (bright digital music)
SUMMARY :
Brought to you by IBM. Welcome back to data science for all. So it is a new game-- Have a swing at the pitch. Thanks for taking the time to be with us. from the academic side to continue data science And there's lot to be said is there not, ask the questions, you can't not think about it. of the customer and how we were going to be more anticipatory And I think, you know as the tools mature, So it's still too hard. I think that, you know, that's where it's headed. So Bob if you would, so you've got this Batman shirt on. to be a data scientist, but these tools will help you I was just going to add that, you know I think it's important to point out as well that And the data scientists on the panel And the only difference is that you can build it's an accomplishment and for less, So I think you have to think about the fact that I get the point of it and I think and become easier to use, you know like Bob was saying, So how at the end of the day, Dion? or bots that go off and run the hypotheses So you know people who are using the applications are now then people can speak really slowly to you in French, But the day to day operations was they ran some data, That's really the question. You know it's been said that the data doesn't lie, the access to the truth through looking at the numbers of the organization where you have the routine I tend to be a foolish optimist You do. I think as we start relying more on data and trusting data There's a couple elephant in the room topics Before you go to market you've got to test And also have the ability for a human to intervene to click on ads. And I forget the last criteria, but like we need I think with ethics, you know a lot of it has to do of all the new data that's going to be coming in? Getting back to you know what Dave was saying earlier about, organizations that have path found the way. than in the past, I think it was (laughs) I mean that gap is just going to grow and grow and grow. So I think that being able to use this information Or find it. But I think that's another thing to think about. And if you can ask the right question of the data you have And the potential I see with the data we're collecting is Knowing what you know about data science, for that problem in exactly the way that it occurred I thought the ads were paid in rubles. I think as a result we kind of overlook And I think without data science without machine learning, Right they had the ability to train the machines, At the same time Google's trying to get you And so I think you know And I think you know having, I think in general when you appeal to developers But you're an optimist. Joe: I want to see how Alexa or Siri do on Jeopardy. in the near term or opportunities you think have the ability to use data in their job. That says a lot about the state we're in today. I don't think you need to have a PhD in SQL to use data. Dion why don't you go ahead, We see the industries tend to reach an inflection point And that Uber is going to be applying data, I think part of it's going to be whether or not if data is the new oil and how to protect it I don't know if it's worth this or this. Joe if you get that augmented reality thing Glad to have you along here in New York.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Hammerbacher | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
Dion Hinchcliffe | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Jennifer | PERSON | 0.99+ |
Joe | PERSON | 0.99+ |
Comcast | ORGANIZATION | 0.99+ |
Chris Dancy | PERSON | 0.99+ |
Jennifer Shin | PERSON | 0.99+ |
Cathy O'Neil | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Stanislav Petrov | PERSON | 0.99+ |
Joe McKendrick | PERSON | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Nick Silver | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
100 variables | QUANTITY | 0.99+ |
John Walls | PERSON | 0.99+ |
1990 | DATE | 0.99+ |
Joe Caserta | PERSON | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
UC Berkeley | ORGANIZATION | 0.99+ |
1983 | DATE | 0.99+ |
1991 | DATE | 0.99+ |
2013 | DATE | 0.99+ |
Constellation Research | ORGANIZATION | 0.99+ |
Europe | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Bob Hayes | PERSON | 0.99+ |
United States | LOCATION | 0.99+ |
360 degree | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
New York | LOCATION | 0.99+ |
Benjamin Israeli | PERSON | 0.99+ |
France | LOCATION | 0.99+ |
Africa | LOCATION | 0.99+ |
12 month | QUANTITY | 0.99+ |
Soviet Union | LOCATION | 0.99+ |
Batman | PERSON | 0.99+ |
New York City | LOCATION | 0.99+ |
last year | DATE | 0.99+ |
Olympics | EVENT | 0.99+ |
Meredith Whittaker | PERSON | 0.99+ |
iPhone | COMMERCIAL_ITEM | 0.99+ |
Moscow | LOCATION | 0.99+ |
Ubers | ORGANIZATION | 0.99+ |
20 years | QUANTITY | 0.99+ |
Joe C. | PERSON | 0.99+ |
John Thomas, IBM | IBM Data Science For All
(upbeat music) >> Narrator: Live from New York City, it's the Cube, covering IBM Data Science for All. Brought to you by IMB. >> Welcome back to Data Science for All. It's a whole new game here at IBM's event, two-day event going on, 6:00 tonight the big keynote presentation on IBM.com so be sure to join the festivities there. You can watch it live stream, all that's happening. Right now, we're live here on the Cube, along with Dave Vellente, I'm John Walls and we are joined by John Thomas who is a distinguished engineer and director at IBM. John, thank you for your time, good to see you. >> Same here, John. >> Yeah, pleasure, thanks for being with us here. >> John Thomas: Sure. >> I know, in fact, you just wrote this morning about machine learning, so that's obviously very near and dear to you. Let's talk first off about IBM, >> John Thomas: Sure. >> Not a new concept by any means, but what is new with regard to machine learning in your work? >> Yeah, well, that's a good question, John. Actually, I get that question a lot. Machine learning itself is not new, companies have been doing it for decades, so exactly what is new, right? I actually wrote this in a blog today, this morning. It's really three different things, I call them democratizing machine learning, operationalizing machine learning, and hybrid machine learning, right? And we can talk through each of these if you like. But I would say hybrid machine learning is probably closest to my heart. So let me explain what that is because it's sounds fancy, right? (laughter) >> Right. It's what we need is another hybrid something, right? >> In reality, what it is is let data gravity decide where your data stays and let your performance requirements, your SLA's, dictate where your machine learning models go, right? So what do I mean by that? You might have sensitive data, customer data, which you want to keep on a certain platform, right? Instead of moving data off that platform to do machine learning, bring machine learning to that platform, whether that be the mainframe or specialized appliances or hadoop clusters, you name it, right? Bring machine learning to where the data is. Do the training, building of the model, where that is, but then have complete flexibility in terms of where you deploy that model. As an example, you might choose to build and train your model on premises behind the firewall using very sensitive data, but the model that has been built, you may choose to deploy that into a Cloud environment because you have other applications that need to consume it. That flexibility is what I mean by hybrid. Another example is, especially when you get into so many more complex machine learning, deep learning domains, you need exploration and there is hardware that provides that exploration, right? For example, GPU's provide exploration. Well, you need to have the flexibility to train and build the models on hardware that provides that kind of exploration, but then the model that has been built might go into inside of a CICS mainframe transaction for some second scoring of a credit card transaction as to whether it's fraudulent or not, right? So there's flexibility off peri, on peri, different platforms, this is what I mean by hybrid. >> What is the technical enabler to allow that to happen? Is it just a modern software architecture, microservices, containers, blah, blah, blah? Explain that in more detail. >> Yeah, that's a good question and we're not, you know, it's a couple different things. One is bringing native machine learning to these platforms themselves. So you need native machine learning on the mainframe, in the Cloud, in a hadoop cluster environment, in an appliance, right? So you need the run times, the libraries, the frameworks running native on those platforms. And that is not easy to do that, you know? You've got machine learning running native on ZOS, not even Linux on Z. It's native to ZOS on the mainframe. >> At the very primitive level you're talking about. >> Yeah. >> So you get the performance you need. >> You have the runtime environments there and then what you need is a seamless experience across all of these platforms. You need way to export models, repositories into which you can save models, the same API's to save models into a different repository and then consume from them there. So it's a bit of engineering that IBM is doing to enable this, right? Native capabilities on the platforms, the same API's to talk to repositories and consume from the repositories. >> So the other piece of that architecture is talking a lot of tooling that integrated and native. >> John Thomas: Yes. >> And the tooling, as you know, changes, I feel like daily. There's a new tool out there and everybody gloms onto it, so the architecture has to be able to absorb those. What is the enabler there? >> Yeah, so you actually bring up a very good point. There is a new language, a new framework everyday, right? I mean, we all know that, in the world of machine learning, Python and R and Scala. Frameworks like Spark and TensorFlow, they're table scapes now, you know? You have to support all of these, scikit-learning, you name it, right? Obviously, you need a way to support all these frameworks on the platforms you want to enable, right? And then you need an environment which lets you work with the tools of your choice. So you need an environment like a workbench which can allow you to work in the language, the framework that you are the most comfortable with. And that's what we are doing with data science experience. I don't know if you have thought of this, but data science experience is an enterprise ML platform, right, runs in the Cloud, on prem, on x86 machines, you can have it on a (mumbles) box. The idea here is support for a variety of open languages, frameworks, enable through a collaborative workbench kind of interface. >> And the decision to move, whether it's on-prem or in the Cloud, it's a function of many things, but let's talk about those. I mean, data volume is one. You can't just move your business into the Cloud. It's not going to work that well. >> It's a journey, yeah. >> It's too expensive. But then there's others, there's governance edicts and security edicts, not that the security in the Cloud is any worse, it might just different than what your organization requires, and the Cloud supplier might not support that. It's different Clouds, it's location, etc. When you talked about the data thing being on trend, maybe training a model, and then that model moving to the Cloud, so obviously, it's a lighter weight ... It's not as much-- >> Yeah, yeah, yeah, you're not moving the entire data. Right. >> But I have a concern. I wonder if clients as you about this. Okay, well, it's my data, my data, I'm going to keep behind my firewall. But that data trained that model and I'm really worried that that model is now my IP that's going to seep out into the industry. What do you tell a client? >> Yeah, that's a fair point. Obviously, you still need your security mechanisms, you access control mechanisms, your governance control mechanisms. So you need governance whether you are on the Cloud or on prem. And your encryption mechanisms, your version control mechanisms, your governance mechanisms, all need to be in place, regardless of where you deploy, right? And to your question of how do you decide where the model should go, as I said earlier to John, you know, let data gravity SLA's performance security requirements dictate where the model should go. >> We're talking so much about concepts, right, and theories that you have. Lets roll up our sleeves and get to the nitty-gritty a little bit here and talk about what are people really doing out there? >> John Thomas: Oh yeah, use cases. >> Yeah, just give us an idea for some of the ... Kind of the latest and greatest that you're seeing. >> Lots of very interesting, interesting use cases out there so actually, a part of what IBM calls a data science elite team. We go out and engage with customers on very interesting use cases, right? And we see a lot of these hybrid discussions happen as well. On one end of the spectrum is understanding customers better. So I call this reading the customer's mind. So can you understand what is in the customer's mind and have an interaction with the client without asking a bunch of questions, right? Can you look at his historical data, his browsing behavior, his purchasing behavior, and have an offer that he will really love? Can you really understand him and give him a celebrity experience? That's one class of use cases, right? Another class of use cases is around improving operations, improving your own internal processes. One example is fraud detection, right? I mean, that is a hot topic these days. So how do you, as the credit card is swiped, right, it's just a few milliseconds before that travels through a network and kicks you back in mainframe and a scoring is done to as to whether this should be approved or not. Well, you need to have a prediction of how likely this is to be fraudulent or not in the span of the transaction. Here's another one. I don't know if you call help desks now. I sometimes call them "helpless desks." (laughter) >> Try not to. >> Dave: Hell desks. >> Try not to helpless desks but, you know, for pretty every enterprise that I am talking to, there is a goal to optimize their help desk, their call centers. And call center optimization is good. So as the customer calls in, can you understand the intent of the customer? See, he may start off talking about something, but as the call progresses, the intent might change. Can you understand that? In fact, not just understand, but predict it and intercept with something that the client will love before the conversation takes a bad turn? (laughter) >> You must be listening in on my calls. >> Your calls, must be your calls! >> I meander, I go every which way. >> I game the system and just go really mad and go, let me get you an operator. (laughter) Agent, okay. >> You tow guys, your data is a special case. >> Dave: Yeah right, this guy's pissed. >> We are red-flagged right off the top. >> We're not even analyzing you. >> Day job, forget about, you know. What about things, you know, because they're moving so far out to the edge and now with mobile and that explosion there, and sensor data being what it is and all this is tremendous growth. Tough to manage. >> Dave: It is, it really is. >> I guess, maybe tougher to make sense of it, so how are you helping people make sense of this so they can really filter through and find the data that matters? >> Yeah, this is a lot of things rolled up into that question, right? One is just managing those devices, those endpoints in multiple thousands, tens of thousands, millions of these devices. How would you manage them? Then, are you doing the processing of the data and applying ML and DL right at the edge, or are you bringing the data back behind the firewall or into Cloud and then processing it there? If you are doing image reduction in a car, in a self-driving car, can you allow the latency of data being shipping of an image of a pedestrian jumping in front, do we ship across the Cloud for a deep-learning network to process it and give you an answer - oh, that's a pedestrian? You know, you may not have that latency now. So you may want to do some processing on the edge, so that is another interesting discussion, right? And you need exploration there as well. Another aspect now is, as you said, separating the signal from the noise, you know. It's just really, really coming down to the different industries that we go into, what are the signals that we understand now? Can we build on them and can we re-use them? That is an interesting discussion as well. But, yeah, you're right. With the world of exploding data that we are in, with all these devices, it's very important to have systematic approach to managing your data, cataloging it, understanding where to apply ML, where to apply exploration, governance. All of these things become important. >> I want to ask you about, come back to the use cases for a moment. You talk about celebrity experiences, I put that in sort of a marketing category. Fraud detection's always been one of the favorite, big data use cases, help desks, recommendation engines and so forth. Let's start with the fraud detection. About a year ago, first of all, fraud detection in the last six, seven years, has been getting immensely better, no question. And it's great. However, the number of false positives, about a year ago, it was too many. We're a small company but we buy a lot of equipment and lights and cameras and stuff. The number of false positives that I personally get was overwhelming. >> Yeah. >> They've gone down dramatically. >> Yeah. >> In the last 12 months. Is that just a coincidence, happenstance, or is it getting better? >> No, it's not that the bad guys have gone down in number. It's not that at all, no. (laughter) >> Well, that, I know. >> No, I think there is a lot of sophistication in terms of the algorithms that are available now. In terms of ... If you have tens of thousands of features that you're looking at, how do you collapse that space and how do you do that efficiently, right? There are techniques that are evolving in terms of handing that kind of information. In terms of the actual algorithms, are different types of innovations that are happening in that space. But I think, perhaps, the most important one is that things that use to take weeks or days to train and test, now can be done in days or minutes, right? The exploration that comes from GPU's, for example, allows you to test out different algorithms, different models and say, okay, well, this performs well enough for me to roll it out and try this out, right? It gives you a very quick cycle of innovation. >> The time to value is really compressed. Okay, now let's take one that's not so good. Ad recommendations, the Google ads that pop up. One in a hundred are maybe relevant, if that, right? And they pop up on the screen and they're annoying. I worry that Siri's listening somehow. I talk to my wife about Israel and then next thing I know, I'm getting ads for going to Israel. Is that a coincidence or are they listening? What's happening there? >> I don't know about what Google's doing. I can't comment on that. (laughter) I don't want to comment on that. >> Maybe just from a technology perspective. >> From a technology perspective, this notion of understanding what is in the customer's mind and really getting to a customer segment at one, this is top interest for many, many organizations. Regardless of which industry you are, insurance or banking or retail, doesn't matter, right? And it all comes down to the fundamental principles about how efficiently can you do. Now, can you identify the features that have the most predictive power? This is a level of sophistication in terms of the feature engineering, in terms of collapsing that space of features that I had talked about, and then, how do I actually go to the latest science of this? How do I do the exploratory analysis? How do I actually build and test my machine learning models quickly? Do the tools allow me to be very productive about this? Or do I spend weeks and weeks coding in lower-level formats? Or do I get help, do I get guided interfaces, which guide me through the process, right? And then, the topic of exploration we talk about, right? These things come together and then couple that with cognitive API's. For example, speech to text, the word (mumbles) have gone down dramatically now. So as you talk on the phone, with a very high accuracy, we can understand what is being talked about. Image recognition, the accuracy has gone up dramatically. You can create custom classifiers for industry-specific topics that you want to identify in pictures. Natural language processing, natural language understanding, all of these have evolved in the last few years. And all these come together. So machine learning's not an island. All these things coming together is what makes these dramatic advancements possible. >> Well, John, if you've figured out anything about the past 20 minutes or so, is that Dave and I want ads delivered that matter and we want our help desk questions answered right away. (laugher) so if you can help us with that, you're welcome back on the Cube anytime, okay? >> We will try, John. >> That's all we want, that's all we ask. >> You guys, your calls are still being screened. (laughter) >> John Thomas, thank you for joining us, we appreciate that. >> Thank you. >> Our panel discussion coming up at 4:00 Eastern time. Live here on the Cube, we're in New York City. Be back in a bit. (upbeat music)
SUMMARY :
Brought to you by IMB. John, thank you for your time, good to see you. I know, in fact, you just wrote this morning And we can talk through each of these if you like. It's what we need is another hybrid something, right? of where you deploy that model. What is the technical enabler to allow that to happen? And that is not easy to do that, you know? and then what you need is a seamless experience So the other piece of that architecture is And the tooling, as you know, changes, I feel like daily. the framework that you are the most comfortable with. And the decision to move, whether it's on-prem and security edicts, not that the security in the Cloud is Yeah, yeah, yeah, you're not moving the entire data. I wonder if clients as you about this. So you need governance whether you are and theories that you have. Kind of the latest and greatest that you're seeing. I don't know if you call help desks now. So as the customer calls in, can you understand and go, let me get you an operator. What about things, you know, because they're moving the signal from the noise, you know. I want to ask you about, come back to the use cases In the last 12 months. No, it's not that the bad guys have gone down in number. and how do you do that efficiently, right? I talk to my wife about Israel and then next thing I know, I don't know about what Google's doing. So as you talk on the phone, with a very high accuracy, so if you can help us with that, You guys, your calls are still being screened. Live here on the Cube, we're in New York City.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellente | PERSON | 0.99+ |
John | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
Dave | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
John Walls | PERSON | 0.99+ |
Israel | LOCATION | 0.99+ |
ORGANIZATION | 0.99+ | |
New York City | LOCATION | 0.99+ |
Siri | TITLE | 0.99+ |
ZOS | TITLE | 0.99+ |
today | DATE | 0.99+ |
Linux | TITLE | 0.99+ |
One example | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
thousands | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
Scala | TITLE | 0.99+ |
Spark | TITLE | 0.98+ |
tens of thousands | QUANTITY | 0.98+ |
this morning | DATE | 0.98+ |
each | QUANTITY | 0.98+ |
IMB | ORGANIZATION | 0.96+ |
one | QUANTITY | 0.96+ |
TensorFlow | TITLE | 0.95+ |
millions | QUANTITY | 0.95+ |
About a year ago | DATE | 0.95+ |
first | QUANTITY | 0.94+ |
one class | QUANTITY | 0.92+ |
Z. | TITLE | 0.91+ |
4:00 Eastern time | DATE | 0.9+ |
decades | QUANTITY | 0.9+ |
6:00 tonight | DATE | 0.9+ |
CICS | ORGANIZATION | 0.9+ |
about a year ago | DATE | 0.89+ |
second | QUANTITY | 0.88+ |
two-day event | QUANTITY | 0.86+ |
three different things | QUANTITY | 0.85+ |
last 12 months | DATE | 0.84+ |
IBM Data Science | ORGANIZATION | 0.82+ |
Cloud | TITLE | 0.8+ |
R | TITLE | 0.78+ |
past 20 minutes | DATE | 0.77+ |
Cube | COMMERCIAL_ITEM | 0.75+ |
a hundred | QUANTITY | 0.72+ |
one end | QUANTITY | 0.7+ |
seven years | QUANTITY | 0.69+ |
features | QUANTITY | 0.69+ |
couple | QUANTITY | 0.67+ |
last six | DATE | 0.66+ |
few milliseconds | QUANTITY | 0.63+ |
last few years | DATE | 0.59+ |
x86 | QUANTITY | 0.55+ |
IBM.com | ORGANIZATION | 0.53+ |
SLA | ORGANIZATION | 0.49+ |