Rob Thomas, IBM | IBM Data and AI Forum

>>live from Miami, Florida. It's the Q covering. IBM is data in a I forum brought to you by IBM. >>Welcome back to the port of Miami, Everybody. You're watching the Cube, the leader in live tech coverage. We're here covering the IBM data and a I form. Rob Thomas is here. He's the general manager for data in A I and I'd be great to see again. >>Right. Great to see you here in Miami. Beautiful week here on the beach area. It's >>nice. Yeah. This is quite an event. I mean, I had thought it was gonna be, like, roughly 1000 people. It's over. Sold or 17. More than 1700 people here. This is a learning event, right? I mean, people here, they're here to absorb best practice, you know, learn technical hands on presentations. Tell us a little bit more about how this event has evolved. >>It started as a really small training event, like you said, which goes back five years. And what we saw those people, they weren't looking for the normal kind of conference. They wanted to be hands on. They want to build something. They want to come here and leave with something they didn't have when they arrived. So started as a little small builder conference and now somehow continues to grow every year, which were very thankful for. And we continue to kind of expand at sessions. We've had to add hotels this year, so it's really taken off >>you and your title has two of the three superpowers data. And of course, Cloud is the third superpower, which is part of IBMs portfolio. But people want to apply those superpowers, and you use that metaphor in your your keynote today to really transform their business. But you pointed out that only about a eyes only 4 to 10% penetrated within organizations, and you talked about some of the barriers that, but this is a real appetite toe. Learn isn't there. >>There is. Let's go talk about the superpower for a bit. A. I does give employees superpowers because they can do things now. They couldn't do before, but you think about superheroes. They all have an origin story. They always have somewhere where they started and applying a I an organization. It's actually not about doing something completely different. It's about extenuating. What you already d'oh doing something massively better. That's kind of in your DNA already. So we're encouraging all of our clients this week like use the time to understand what you're great at, what your value proposition is. And then how do you use a I to accentuate that? Because your superpower is only gonna last if it's starts with who you are as a company or as a >>person who was your favorite superhero is a kid. Let's see. I was >>kind of into the whole Hall of Justice. Super Superman, that kind of thing. That was probably my cartoon. >>I was a Batman guy. And the reason I love that movie because all the combination of tech, it's kind of reminds me, is what's happening here today. In the marketplace, people are taking data. They're taking a I. They're applying machine intelligence to that data to create new insights, which they couldn't have before. But to your point, there's a There's an issue with the quality of data and and there's a there's a skills gap as well. So let's let's start with the data quality problem described that problem and how are you guys attacking it? >>You're a I is only as good as your data. I'd say that's the fundamental problem and organization we worked with. 80% of the projects get slowed down or they get stopped because the company has a date. A problem. That's why we introduce this idea of the A i ladder, which is all of the steps that a company has to think about for how they get to a level of data maturity that supports a I. So how they collect their data, organize their data, analyze their data and ultimately begin to infuse a I into business processes soap. Every organization needs to climb that ladder, and they're all different spots. So for someone might be, we gotta focus on organization a data catalogue. For others, it might be we got do a better job of data collection data management. That's for every organization to figure out. But you need a methodical approach to how you attack the data problem. >>So I wanna ask you about the Aye aye ladder so you could have these verbs, the verbs overlay on building blocks. I went back to some of my notes in the original Ai ai ladder conversation that you introduced a while back. It was data and information architecture at the at the base and then building on that analytics machine learning. Aye, aye, aye. And then now you've added the verbs, collect, organized, analyze and infused. Should we think of this as a maturity model or building blocks and verbs that you can apply depending on where you are in that maturity model, >>I would think of it as building blocks and the methodology, which is you got to decide. Do wish we focus on our data collection and doing that right? Is that our weakness or is a data organization or is it the sexy stuff? The Aye. Aye. The data science stuff. We just This is just a tool to help organizations organize themselves on what's important. I asked every company I visit. Do you have a date? A strategy? You wouldn't believe the looks you get when you ask that question, you get either. Well, she's got one. He's got one. So we got seven or you get No, we've never had one. Or Hey, we just hired a CDO. So we hope to have one. But we use the eye ladder just as a tool to encourage companies to think about your data strategy >>should do you think in the context I want follow up on that data strategy because you see a lot of tactical data strategies? Well, we use Data Thio for this initiative of that initiative. Maybe in sales or marketing, or maybe in R and D. Increasingly, our organization's developing. And should they develop a holistic data strategy, or should they trying to just get kind of quick wins? What are you seeing in the marketplace? >>It depends on where you are in your maturity cycle. I do think it behooves every company to say We understand where we are and we understand where we want to go. That could be the high level data strategy. What are our focus and priorities gonna be? Once you understand focus and priorities, the best way to get things into production is through a bunch of small experiments to your point. So I don't think it's an either or, but I think it's really valuable tohave an overarching data strategy, and I recommended companies think about a hub and spokes model for this. Have a centralized chief date officer, but your business units also need a cheap date officer. So strategy and one place execution in another. There's a best practice to going about this >>the next you ask the question. What is a I? You get that question a lot, and you said it's about predicting, automating and optimizing. Can we unpack that a little bit? What's behind those three items? >>People? People overreact a hype on topics like II. And they think, Well, I'm not ready for robots or I'm not ready for self driving Vehicles like those Mayor may not happen. Don't know. But a eyes. Let's think more basic it's about can we make better predictions of the business? Every company wants to see a future. They want the proverbial crystal ball. A. I helped you make better predictions. If you have the data to do that, it helps you automate tasks, automate the things that you don't want to do. There's a lot of work that has to happen every day that nobody really wants to do you software to automate that there's about optimization. How do you optimize processes to drive greater productivity? So this is not black magic. This is not some far off thing. We're talking about basics better predictions, better automation, better optimization. >>Now interestingly, use the term black magic because because a lot of a I is black box and IBM is always made a point of we're trying to make a I transparent. You talk a lot about taking the bias out, or at least understanding when bias makes sense. When it doesn't make sense, Talk about the black box problem and how you're addressing. >>That starts with one simple idea. A eyes, not magic. I say that over and over again. This is just computer science. Then you have to look at what are the components inside the proverbial black box. With Watson, we have a few things. We've got tools for clients that want to build their own. Aye, aye, to think of it as a tool box you can choose. Do you want a hammer and you want a screwdriver? You wanna nail you go build your own, aye, aye. Using Watson. We also have applications, so it's basically an end user application that puts a I into practice things like Watson assistant to virtually no create a virtual agent for customer service or Watson Discovery or things like open pages with Watson for governance, risk and compliance. So, aye, aye, for Watson is about tools. You want to build your own applications if you want to consume an application, but we've also got in bed today. I capability so you can pick up Watson and put it inside of any software product in the >>world. He also mentioned that Watson was built with a lot of of of, of open source components, which a lot of people might not know. What's behind Watson. >>85% of the work that happens and Watson today is open source. Most people don't know that it's Python. It's our it's deploying into tensorflow. What we've done, where we focused our efforts, is how do you make a I easier to use? So we've introduced Auto Way. I had to watch the studio, So if you're building models and python, you can use auto. I tow automate things like feature engineering algorithm, selection, the kind of thing that's hard for a lot of data scientists. So we're not trying to create our own language. We're using open source, but then we make that better so that a data scientist could do their job better >>so again come back to a adoption. We talked about three things. Quality, trust and skills. We talked about the data quality piece we talked about the black box, you know, challenge. It's not about skills you mention. There's a 250,000 person Gap data science skills. How is IBM approaching how our customers and IBM approaching closing that gap? >>So think of that. But this in basic economic terms. So we have a supply demand mismatch. Massive demand for data scientists, not enough supply. The way that we address that is twofold. One is we've created a team called Data Science Elite. They've done a lot of work for the clients that were on stage with me, who helped a client get to their first big win with a I. It's that simple. We go in for 4 to 6 weeks. It's an elite team. It's not a long project we're gonna get you do for your success. Second piece is the other way to solve demand and supply mismatch is through automation. So I talked about auto. Aye, aye. But we also do things like using a eye for building data catalogs, metadata creation data matching so making that data prep process automated through A. I can also help that supply demand. Miss Max. The way that you solve this is we put skills on the field, help clients, and we do a lot of automation in software. That's how we can help clients navigate this. So the >>data science elite team. I love that concept because way first picked up on a couple of years ago. At least it's one of the best freebies in the business. But of course you're doing it with the customers that you want to have deeper relationships with, and I'm sure it leads toe follow on business. What are some of the things that you're most proud of from the data science elite team that you might be able to share with us? >>The clients stories are amazing. I talked in the keynote about origin stories, Roll Bank of Scotland, automating 40% of their customer service. Now customer SATs going up 20% because they put their customer service reps on those hardest problems. That's data science, a lead helping them get to a first success. Now they scale it out at Wonderman Thompson on stage, part of big W P p big advertising agency. They're using a I to comb through customer records they're using auto Way I. That's the data science elite team that went in for literally four weeks and gave them the confidence that they could then do this on their own. Once we left, we got countless examples where this team has gone in for very short periods of time. And clients don't talk about this because they have to talk about it cause they're like, we can't believe what this team did. So we're really excited by the >>interesting thing about the RVs example to me, Rob was that you basically applied a I to remove a lot of these mundane tasks that weren't really driving value for the organization. And an R B s was able to shift the skill sets. It's a more strategic areas. We always talk about that, but But I love the example C. Can you talk a little bit more about really, where, where that ship was, What what did they will go from and what did they apply to and how it impacted their businesses? A improvement? I think it was 20% improvement in NPS but >>realizes the inquiry's they had coming in were two categories. There were ones that were really easy. There were when they were really hard and they were spreading those equally among their employees. So what you get is a lot of unhappy customers. And then once they said, we can automate all the easy stuff, we can put all of our people in the hardest things customer sat shot through the roof. Now what is a virtual agent do? Let's decompose that a bit. We have a thing called intent classifications as part of Watson assistant, which is, it's a model that understands customer a tent, and it's trained based on the data from Royal Bank of Scotland. So this model, after 30 days is not very good. After 90 days, it's really good. After 180 days, it's excellent, because at the core of this is we understand the intent of customers engaging with them. We use natural language processing. It really becomes a virtual agent that's done all in software, and you can only do that with things like a I. >>And what is the role of the human element in that? How does it interact with that virtual agent. Is it a Is it sort of unattended agent or is it unattended? What is that like? >>So it's two pieces. So for the easiest stuff no humans needed, we just go do that in software for the harder stuff. We've now given the RVs, customer service agents, superpowers because they've got Watson assistant at their fingertips. The hardest thing for a customer service agent is only finding the right data to solve a problem. Watson Discovery is embedded and Watson assistant so they can basically comb through all the data in the bank to answer a question. So we're giving their employees superpowers. So on one hand, it's augmenting the humans. In another case, we're just automating the stuff the humans don't want to do in the first place. >>I'm gonna shift gears a little bit. Talk about, uh, red hat in open shift. Obviously huge acquisition last year. $34 billion Next chapter, kind of in IBM strategy. A couple of things you're doing with open shift. Watson is now available on open shifts. So that means you're bringing Watson to the data. I want to talk about that and then cloudpack for data also on open shifts. So what has that Red had acquisition done for? You obviously know a lot about M and A but now you're in the position of you've got to take advantage of that. And you are taking advantage of this. So give us an update on what you're doing there. >>So look at the cloud market for a moment. You've got around $600 million of opportunity of traditional I t. On premise, you got another 600 billion. That's public clouds, dedicated clouds. And you got about 400 billion. That's private cloud. So the cloud market is fragmented between public, private and traditional. I t. The opportunity we saw was, if we can help clients integrate across all of those clouds, that's a great opportunity for us. What red at open shift is It's a liberator. It says right. Your application once deployed them anywhere because you build them on red hot, open shift. Now we've brought cloudpack for data. Our data platform on the red hot open shift certified on that Watson now runs on red had open shift. What that means is you could have the best data platform. The best Aye, Aye. And you can run it on Google. Eight of us, Azure, Your own private cloud. You get the best, Aye. Aye. With Watson from IBM and run it in any of those places. So the >>reason why that's so powerful because you're able to bring those capabilities to the data without having to move the date around It was Jennifer showed an example or no, maybe was tail >>whenever he was showing Burt analyzing the data. >>And so the beauty of that is I don't have to move any any data, talk about the importance of not having Thio move that data. And I want I want to understand what the client prerequisite is. They really take advantage of that. This one >>of the greatest inventions out of IBM research in the last 10 years, that hasn't gotten a lot attention, which is data virtualization. Data federation. Traditional federation's been around forever. The issue is it doesn't perform our data virtualization performance 500% faster than anything else in the market. So what Jennifer showed that demo was I'm training a model, and I'm gonna virtualized a data set from Red shift on AWS and on premise repositories a my sequel database. We don't have to move the data. We just virtualized those data sets into cloudpack for data and then we can train the model in one place like this is actually breaking down data silos that exist in every organization. And it's really unique. >>It was a very cool demo because what she did is she was pulling data from different data stores doing joins. It was a health care application, really trying to understand where the bias was peeling the onion, right? You know, it is it is bias, sometimes biases. Okay, you just got to know whether or not it's actionable. And so that was that was very cool without having to move any of the data. What is the prerequisite for clients? What do they have to do to take advantage of this? >>Start using cloudpack for data. We've got something on the Web called cloudpack experiences. Anybody can go try this in less than two minutes. I just say go try it. Because cloudpack for data will just insert right onto any public cloud you're running or in your private cloud environment. You just point to the sources and it will instantly begin to start to create what we call scheme a folding. So a skiing version of the schema from your source writing compact for data. This is like instant access to your data. >>It sounds like magic. OK, last question. One of the big takeaways You want people to leave this event with? >>We are trying to inspire clients to give a I shot. Adoption is 4 to 10% for what is the largest economic opportunity we will ever see in our lives. That's not an acceptable rate of adoption. So we're encouraging everybody Go try things. Don't do one, eh? I experiment. Do Ah, 100. Aye, aye. Experiments in the next year. If you do, 150 of them probably won't work. This is where you have to change the cultural idea. Ask that comes into it, be prepared that half of them are gonna work. But then for the 52 that do work, then you double down. Then you triple down. Everybody will be successful. They I if they had this iterative mindset >>and with cloud it's very inexpensive to actually do those experiments. Rob Thomas. Thanks so much for coming on. The Cuban great to see you. Great to see you. All right, Keep right, everybody. We'll be back with our next guest. Right after this short break, we'll hear from Miami at the IBM A I A data form right back.

Published Date : Oct 22 2019

SUMMARY :

IBM is data in a I forum brought to you by IBM. We're here covering the IBM data and a I form. Great to see you here in Miami. I mean, people here, they're here to absorb best practice, It started as a really small training event, like you said, which goes back five years. and you use that metaphor in your your keynote today to really transform their business. the time to understand what you're great at, what your value proposition I was kind of into the whole Hall of Justice. quality problem described that problem and how are you guys attacking it? But you need a methodical approach to how you attack the data problem. So I wanna ask you about the Aye aye ladder so you could have these verbs, the verbs overlay So we got seven or you get No, we've never had one. What are you seeing in the marketplace? It depends on where you are in your maturity cycle. the next you ask the question. There's a lot of work that has to happen every day that nobody really wants to do you software to automate that there's Talk about the black box problem and how you're addressing. Aye, aye, to think of it as a tool box you He also mentioned that Watson was built with a lot of of of, of open source components, What we've done, where we focused our efforts, is how do you make a I easier to use? We talked about the data quality piece we talked about the black box, you know, challenge. It's not a long project we're gonna get you do for your success. it with the customers that you want to have deeper relationships with, and I'm sure it leads toe follow on have to talk about it cause they're like, we can't believe what this team did. interesting thing about the RVs example to me, Rob was that you basically applied So what you get is a lot of unhappy customers. What is that like? So for the easiest stuff no humans needed, we just go do that in software for And you are taking advantage of this. What that means is you And so the beauty of that is I don't have to move any any data, talk about the importance of not having of the greatest inventions out of IBM research in the last 10 years, that hasn't gotten a lot attention, What is the prerequisite for clients? This is like instant access to your data. One of the big takeaways You want people This is where you have to change the cultural idea. The Cuban great to see you.

ENTITIES

Entity	Category	Confidence
Miami	LOCATION	0.99+
Jennifer	PERSON	0.99+
4	QUANTITY	0.99+
IBM	ORGANIZATION	0.99+
Rob Thomas	PERSON	0.99+
20%	QUANTITY	0.99+
Royal Bank of Scotland	ORGANIZATION	0.99+
40%	QUANTITY	0.99+
Python	TITLE	0.99+
IBMs	ORGANIZATION	0.99+
$34 billion	QUANTITY	0.99+
seven	QUANTITY	0.99+
Rob	PERSON	0.99+
Eight	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
python	TITLE	0.99+
two categories	QUANTITY	0.99+
250,000 person	QUANTITY	0.99+
500%	QUANTITY	0.99+
two	QUANTITY	0.99+
four weeks	QUANTITY	0.99+
less than two minutes	QUANTITY	0.99+
Second piece	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
last year	DATE	0.99+
Miami, Florida	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Max.	PERSON	0.99+
Roll Bank of Scotland	ORGANIZATION	0.99+
one	QUANTITY	0.99+
next year	DATE	0.99+
One	QUANTITY	0.99+
10%	QUANTITY	0.99+
Data Thio	ORGANIZATION	0.99+
Red	ORGANIZATION	0.99+
6 weeks	QUANTITY	0.99+
52	QUANTITY	0.98+
600 billion	QUANTITY	0.98+
Watson	TITLE	0.98+
Wonderman Thompson	ORGANIZATION	0.98+
one simple idea	QUANTITY	0.98+
More than 1700 people	QUANTITY	0.98+
today	DATE	0.98+
Batman	PERSON	0.98+
about 400 billion	QUANTITY	0.97+
first	QUANTITY	0.97+
IBM Data	ORGANIZATION	0.97+
100	QUANTITY	0.97+
this year	DATE	0.97+
around $600 million	QUANTITY	0.97+
this week	DATE	0.96+
third superpower	QUANTITY	0.96+
Burt	PERSON	0.96+
red	ORGANIZATION	0.96+
three things	QUANTITY	0.96+
17	QUANTITY	0.95+
Hall of Justice	TITLE	0.94+
Superman	PERSON	0.94+
three superpowers	QUANTITY	0.94+
cloudpack	TITLE	0.94+
Azure	ORGANIZATION	0.94+
five years	QUANTITY	0.93+
couple of years ago	DATE	0.92+
80%	QUANTITY	0.91+
1000 people	QUANTITY	0.9+

John Thomas, IBM & Elenita Elinon, JP Morgan Chase | IBM Think 2019

>> Live from San Francisco, it's theCUBE covering IBM Think 2019, brought to you by IBM. >> Welcome back everyone, live here in Moscone North in San Francisco, it's theCUBE's exclusive coverage of IBM Think 2019. I'm John Furrier, Dave Vellante. We're bringing down all the action, four days of live coverage. We've got two great guests here, Elenita Elinon, Executive Director of Quantitative Research at JP Morgan Chase, and John Thomas, Distinguished Engineer and Director of the Data Science Elite Team... great team, elite data science team at IBM, and of course, JP Morgan Chase, great innovator. Welcome to theCUBE. >> Welcome. >> Thank you very much. >> Thank you, thank you, guys. >> So I like to dig in, great use case here real customer on the cutting edge, JP Morgan Chase, known for being on the bleeding edge sometimes, but financial, money, speed... time is money, insights is money. >> Absolutely. Yes. >> Tell us what you do at the Quantitative Group. >> Well, first of all, thank you very much for having me here, I'm quite honored. I hope you get something valuable out of what I say here. At the moment, I have two hats on, I am co-head of Quantitative Research Analytics. It's a very small SWAT, very well selected group of technologists who are also physicists and mathematicians, statisticians, high-performance compute experts, machine learning experts, and we help the larger organization of Quantitative Research which is about 700-plus strong, as well as some other technology organizations in the firm to use the latest, greatest technologies. And how we do this is we actually go in there, we're very hands-on, we're working with the systems, we're working with the tools, and we're applying it to real use cases and real business problems that we see in Quantitative Research, and we prove out the technology. We make sure that we're going to save millions of dollars using this thing, or we're going to be able to execute a lot on this particular business that was difficult to execute on before because we didn't have the right compute behind it. So we go in there, we try out these various technologies, we have lots of partnerships with the different vendors, and IBM's been obviously one of few, very major vendors that we work with, and we find the ones that work. We have an influencing role as well in the organization, so we go out and tell people, "Hey, look, "this particular tool, perfect for this type of problem. "You should try it out." We help them set it up. They can't figure out the technology? We help them out. We're kind of like what I said, we're a SWAT team, very small compared to the rest of the organization, but we add a lot of value. >> You guys are the brain trust too. You've got the math skills, you've got the quantitative modeling going on, and it's a competitive advantage for your business. This is like a key thing, a lot of new things are emerging. One of things we're seeing here in the industry, certainly at this show, it's not your yesterday's machine learning. There's certainly math involved, you've got cognition and math kind of coming together, deterministic, non-deterministic elements, you guys are seeing these front edge, the problems, opportunities, for you guys. How do you see that world evolving because you got the classic math, school of math machine learning, and then the school of learning machines coming together? What kind of problems do you see these things, this kind of new model attacking? >> So we're making a very, very large investment in machine learning and data science as a whole in the organization. You probably heard in the press that we've brought in the Head of Machine Learning from CMU, Manuela Veloso. She's now heading up the AI Research Organization, JP Morgan, and she's making herself very available to the rest of the firm, setting strategies, trying different things out, partnering with the businesses, and making sure that she understands the use case of where machine learning will be a success. We've also put a lot of investments in tooling and hiring the right kinds of people from the right kinds of universities. My organization, we're changing the focus in our recruiting efforts to bring in more data science and machine learning. But, I think the most important thing, in addition to all that investment is that we, first and foremost, understand our own problems, we work with researchers, we work with IBM, we work with the vendors, and say, "Okay, this is the types of problems, "what is the best thing to throw at it?" And then we PoC, we prove it out, we look for the small wins, we try to strategize, and then we come up with the recommendations for a full-out, scalable architecture. >> John, talk about the IBM Elite Program. You guys roll your sleeves up. It's a service that you guys provide with your top clients. You bring in the best and you just jump in, co-create opportunities together, solving problems. >> That is exactly right. >> How does this work? What's your relationship with JP Morgan Chase? What specific use case are you going after? What are the opportunities? >> Yeah, so the Data Science Elite Team was setup to really help our top clients in their AI journey, in terms of bringing skills, tools, expertise to work collaboratively with clients like JP Morgan Chase. It's been a great partnership working with Elenita and her team. We've had some very interesting use cases related to her model risk management platform, and some interesting challenges in that space about how do you apply machine learning and deep learning to solve those problems. >> So what exactly is model risk management? How does that all work? >> Good question. (laughing) That's why we're building a very large platform around it. So model risk is one of several types of risk that we worry about and keep us awake at night. There's a long history of risk management in the banks. Of course, there's credit risk, there's market risk, these are all very well-known, very quantified risks. Model risk isn't a number, right? You can't say, "this model, which is some stochastic model "it's going to cost us X million dollars today," right? We currently... it's so somewhat new, and at the moment, it's more prescriptive and things like, you can't do that, or you can use that model in this context, or you can't use it for this type of trade. It's very difficult to automate that type of model risk in the banks, so I'm attempting to put together a platform that captures all of the prescriptive, and the conditions, and the restrictions around what to do, and what to use models for in the bank. Making sure that we actually know this in real time, or at least when the trade is being booked, We have an awareness of where these models are getting somewhat abused, right? We look out for those types of situations, and we make sure that we alert the correct stakeholders, and they do something about it. >> So in essence, you're governing the application of the model, and then learning as you go on, in terms of-- >> That's the second phase. So we do want to learn at the moment, what's in production today. Morpheus running in production, it's running against all of the trading systems in the firm, inside the investment bank. We want to make sure that as these trades are getting booked from day to day, we understand which ones are risky, and we flag those. There's no learning yet in that, but what we've worked with John on are the potential uses of machine learning to help us manage all those risks because it's difficult. There's a lot of data out there. I was just saying, "I don't want our Quants to do stupid things," 'cause there's too much stupidity happening right now. We're looking at emails, we're looking at data that doesn't make sense, so Morpheus is an attempt to make all of that understandable, and make the whole workflow efficient. >> So it's financial programming in a way, that's come with a whole scale of computing, a model gone astray could be very dangerous? >> Absolutely. >> This is what you're getting at right? >> It will cost real money to the firm. This is all the use-- >> So a model to watch the model? So policing the models, kind of watching-- >> Yes, another model. >> When you have to isolate the contribution of the model not like you saying before, "Are there market risks "or other types of risks--" >> Correct. >> You isolate it to the narrow component. >> And there's a lot of work. We work with the Model Governance Organization, another several hundred person organization, and that's all they do. They figure out, they review the models, they understand what the risk of the models are. Now, it's the job of my team to take what they say, which could be very easy to interpret or very hard, and there's a little bit of NLP that I think is potentially useful there, to convert what they say about a model, and what controls around the model are to something that we can systematize and run everyday, and possibly even in real time. >> This is really about getting it right and not letting it get out of control, but also this is where the scale comes in so when you get the model right, you can deploy it, manage it in a way that helps the business, versus if someone throws the wrong number in there, or the classic "we've got a model for that." >> Right, exactly. (laughing) There's two things here, right? There's the ability to monitor a model such that we don't pay fines, and we don't go out of compliance, and there's the ability to use the model exactly to the extreme where we're still within compliance, and make money, right? 'Cause we want to use these models and make our business stronger. >> There's consequences too, I mean, if it's an opportunity, there's upside, it's a problem, there's downside. You guys look at the quantification of those kinds of consequences where the risk management comes in? >> Yeah, absolutely. And there's real money that's at stake here, right? If the regulators decide that a model's too risky, you have to set aside a certain amount of capital so that you're basically protecting your investors and your business, and the stakeholders. If that's done incorrectly, we end up putting a lot more capital in reserve than we should be, and that's a bad thing. So quantifying the risks correctly and accurately is a very important part of what we do. >> So a lot of skillsets obviously, and I always say, "In the money business, you want the best nerds." Don't hate me for saying that... the smartest people. What are some of the challenges that are unique to model risk management that you might not see in sort of other risk management approaches? >> There are some technical challenges, right? The volume of data that you're dealing with is very large. If you are building... so at the very simplistic level, you have classification problems that you're addressing with data that might not actually be all there, so that is one. When you get into time series analysis for exposure prediction and so on, these are complex problems to handle. The training time for these models, especially deep learning models, if you are doing time series analysis, can be pretty challenging. Data volume, training time for models, how do you turn this around quickly? We use a combination of technologies for some of these use cases. Watson Studio running on power hardware with GPUs. So the idea here is you can cut down your model training time dramatically and we saw that as part of the-- >> Talk about how that works because this is something that we're seeing people move from manual to automated machine learning and deep learning, it give you augmented assistance to get this to the market. How does it actually work? >> So there is a training part of this, and then there is the operationalizing part of this, right? At the training part itself, you have a challenge, which is you're dealing with very large data volumes, you're dealing with training times that need to be shrunk down. And having a platform that allows you to do that, so you build models quickly, your data science folks can iterate through model creation very quickly is essential. But then, once the models have been built, how do you operationalize those models? How do you actually invoke the models at scale? How do you do workflow management of those models? How do you make sure that a certain exposure model is not thrashing some other models that are also essential to the business? How do you do policies and workflow management? >> And on top of that, we need to be very transparent, right? If the model is used to make certain decisions that have obvious impact financially on the bottom line, and an auditor comes back and says, "Okay, you made this trade so and so, why? What was happening at that time?" So we need to be able to capture and snapshot and understand what the model was doing at that particular instant in time, and go back and understand the inputs that went into that model and made it operate the way it did. >> It can't be a black box. >> It cannot be, yeah. >> Holistically, you got to look at the time series in real time, when things were happening and happened, happening, and then holistically tie that together. Is that kind of the impact analysis? >> We have to make our regulars happy. (laughing) That's number one, and we have to make our traders happy. We, as quantitative researchers, we're the ones that give them the hard math and the models, and then they use it. They use their own skillsets too to apply them, but-- >> What's the biggest needs that your stakeholders on the trading side want, and what's the needs on the compliance side, the traders want more, they want to move quickly? >> They're coming from different sides of it. Traders want to make more money, right? And they want to make decisions quickly. They want all the tools to tell them what to do, and for them to exercise whatever they normally exercise-- >> They want a competitive advantage. >> They want that competitive advantage, and they're also... we've got algo-trades as well, we want to have the best algo behind our trading. >> And the regulator side, we just want to make sure laws aren't broken, that there's auditing-- >> We use the phrase, "model explainability," right? Can you explain how the model came to a conclusion, right? Can you make sure that there is no bias in the model? How can you ensure the models are fair? And if you can detect there is a drift, what do you do to correct that? So that is very important. >> Do you have means of detecting sort of misuse of the model? Is that part of the governance process? >> That is exactly what Morpheus is doing. The unique thing about Morpheus is that we're tied into the risk management systems in the investment bank. We're actually running the same exact code that's pricing these trades, and what that brings is the ability to really understand pretty much the full stack trace of what's going into the price of a trade. We also have captured the restrictions and the conditions. It's in the Python script, it's essentially Python. And we can marry the two, and we can do all the checks that the governance person indicated we should be doing, and so we know, okay, if this trade is operating beyond maturity or a certain maturity, or beyond a certain expiry, we'll know that, and then we'll tag that information. >> And just for clarification, Morpheus is the name of the platform that does the-- >> Morpheus is the name of the model risk platform that I'm building out, yes. >> A final question for you, what's the biggest challenge that you guys have seen from a complexity standpoint that you're solving? What's the big complex... You don't want to just be rubber-stamping models. You want to solve big problems. What are the big problems that you guys are going after? >> I have many big problems. (laughing) >> Opportunities. >> The one that is right now facing me, is the problem of metadata, data ingestion, getting disparate sources, getting different disparate data from different sources. One source calls it a delta, this other source calls it something else. We've got a strategic data warehouse, that's supposed to take all of these exposures and make sense out of it. I'm in the middle because they're there, probably at the ten-year roadmap, who knows? And I have a one-month roadmap, I have something that was due last week and I need to come up with these regulatory reports today. So what I end up doing is a mix of a tactical strategic data ingestion, and I have to make sense of the data that I'm getting. So I need tools out there that will help support that type of data ingestion problem that will also lead the way towards the more strategic one, where we're better integrated with this-- >> John, talk about how you solve the problems? What are some of the things that you guys do? Give the plug for IBM real quick, 'cause I know you guys got the Studio. Explain how you guys are helping and working with JP Morgan Chase. >> Yeah, I touched upon this briefly earlier, which is from the model training perspective, Watson Studio running on Power hardware is very powerful, in terms of cutting down training time, right? But you've got to go beyond model building to how do you operationalize these models? How do I deploy these models at scale? How do I define workload management policies for these models, and connecting to their backbone. So that is part of this, and model explainability, we touched upon that, to eliminate this problem of how do I ingest data from different sources without having to manually oversee all of that. We need to manually apply auto-classification at the time of ingestion. Can I capture metadata around the model and reconcile data from different data sources as the data is being brought in? And can I apply ML to solve that problem, right? There is multiple applications of ML along this workflow. >> Talk about real quick, comment before we break, I want to get this in, machine learning has been around for a while now with compute and scale. It really is a renaissance in AI, it's great things are happening. But what feeds machine learning is data, the cleaner the data, the better the AI, the better the machine learning, so data cleanliness now has to be more real-time, it's less of a cleaning group, right? It used to be clean the data, bring it in, wrangle it, now you got to be much more agile, use speed of compute to make sure that you're qualifying data before it comes in, these machine learning. How do you guys see that rolling out, is that impacting you now? Are you thinking about it? How should people think about data quality as an input in machine learning? >> Well, I think the whole problem of setting up an application properly for data science and machine learning is really making sure that from the beginning, you're designing, and you're thinking about all of these problems of data quality, if it's the speed of ingestion, the speed of publication, all of that stuff. You need to think about the beginning, set yourself up to have the right elements, and it may not all be built out, and that's been a big strategy I've had with Morpheus. I've had a very small team working on it, but we think ahead and we put elements of the right components in place so data quality is just one of those things, and we're always trying to find the right tool sets that will enable use to do that better, faster, quicker. One of the things I'd like to do is to upscale and uplift the skillsets on my team, so that we are building the right things in the system from the beginning. >> A lot of that's math too, right? I mean, you talk about classification, getting that right upfront. Mathematics is-- >> And we'll continue to partner with Elenita and her team on this, and this helps us shape the direction in which our data science offerings go because we need to address complex enterprise challenges. >> I think you guys are really onto something big. I love the elite program, but I think having the small team, thinking about the model, thinking about the business model, the team model before you build the technology build-out, is super important, that seems to be the new model versus the old days, build some great technology and then, we'll put a team around it. So you see the world kind of being a little bit more... it's easier to build out and acquire technology, than to get it right, that seems to be the trend here. Congratulations. >> Thank you. >> Thanks for coming on. I appreciate it. theCUBE here, CUBE Conversations here. We're live in San Francisco, IBM Think. I'm John Furrier, Dave Vellante, stay with us for more day two coverage. Four days we'll be here in the hallway and lobby of Moscone North, stay with us.

Published Date : Feb 12 2019

SUMMARY :

covering IBM Think 2019, brought to you by IBM. and Director of the Data Science Elite Team... known for being on the bleeding edge sometimes, Absolutely. Well, first of all, thank you very much the problems, opportunities, for you guys. "what is the best thing to throw at it?" You bring in the best and you just jump in, Yeah, so the Data Science Elite Team was setup and the restrictions around what to do, and make the whole workflow efficient. This is all the use-- Now, it's the job of my team to take what they say, so when you get the model right, you can deploy it, There's the ability to monitor a model You guys look at the quantification of those kinds So quantifying the risks correctly "In the money business, you want the best nerds." So the idea here is you can cut down it give you augmented assistance to get this to the market. At the training part itself, you have a challenge, and made it operate the way it did. Is that kind of the impact analysis? and then they use it. and for them to exercise whatever they normally exercise-- and they're also... we've got algo-trades as well, what do you do to correct that? that the governance person indicated we should be doing, Morpheus is the name of the model risk platform What are the big problems that you guys are going after? I have many big problems. The one that is right now facing me, is the problem What are some of the things that you guys do? to how do you operationalize these models? is that impacting you now? One of the things I'd like to do is to upscale I mean, you talk about classification, because we need to address complex enterprise challenges. the team model before you build the technology build-out, of Moscone North, stay with us.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Elenita Elinon	PERSON	0.99+
Manuela Veloso	PERSON	0.99+
John	PERSON	0.99+
IBM	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
JP Morgan Chase	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
one-month	QUANTITY	0.99+
John Thomas	PERSON	0.99+
ten-year	QUANTITY	0.99+
Quantitative Research	ORGANIZATION	0.99+
last week	DATE	0.99+
two	QUANTITY	0.99+
two things	QUANTITY	0.99+
JP Morgan	ORGANIZATION	0.99+
Four days	QUANTITY	0.99+
Elenita	PERSON	0.99+
second phase	QUANTITY	0.99+
Moscone North	LOCATION	0.99+
Quantitative Research Analytics	ORGANIZATION	0.99+
Morpheus	PERSON	0.99+
today	DATE	0.99+
Python	TITLE	0.99+
Quantitative Group	ORGANIZATION	0.99+
IBM Think	ORGANIZATION	0.98+
Model Governance Organization	ORGANIZATION	0.98+
one	QUANTITY	0.97+
two great guests	QUANTITY	0.97+
four days	QUANTITY	0.97+
One	QUANTITY	0.96+
million dollars	QUANTITY	0.96+
millions of dollars	QUANTITY	0.95+
theCUBE	ORGANIZATION	0.95+
2019	DATE	0.95+
AI Research Organization	ORGANIZATION	0.94+
CMU	ORGANIZATION	0.94+
One source	QUANTITY	0.93+
yesterday	DATE	0.92+
Watson Studio	TITLE	0.92+
Research	ORGANIZATION	0.9+
Morpheus	TITLE	0.89+
Data Science Elite	ORGANIZATION	0.86+
hundred person	QUANTITY	0.85+
Data Science	ORGANIZATION	0.83+
two hats	QUANTITY	0.79+
about 700-plus	QUANTITY	0.79+
2019	TITLE	0.79+
first	QUANTITY	0.78+
day	QUANTITY	0.76+
Think	COMMERCIAL_ITEM	0.66+
Program	OTHER	0.65+
Think 2019	TITLE	0.56+
SWAT	ORGANIZATION	0.52+
IBM	TITLE	0.43+
Elite	TITLE	0.38+

Rob Thomas, IBM | Change the Game: Winning With AI 2018

>> [Announcer] Live from Times Square in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to theCUBE's special presentation. We're covering IBM's announcements today around AI. IBM, as theCUBE does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on theCUBE, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching theCUBE.

Published Date : Sep 18 2018

SUMMARY :

brought to you by IBM. Long time Cube alum, Rob, great to see you. But Rob, let's start with what you guys have going on, it's great when you have Strata, a lot of people in town. and kind of get ready for this era that we're in now. where you want to go, you're just going to wind around, and data science collaboration, you guys have It's hard to provide self-service if you don't have and it's an executive level, what are you seeing let's get to an outcome, and you can do this and I think we have a customer who's actually as the architecture to drive that modernization. So, just to remind people, you remember ODPI, folks? has the skills they need, so we're sponsoring a community and it can find data anywhere in the world. of processing power on the edge, where you can get data a couple billion dollar moves, to do some acquisitions This is why you see such a premium being put on things Is that the right way to think about it? to a Cloud-Native architecture if that's what they prefer. certain laws of the land, if you will, that say, for how you execute models that you've built. I mean, it's clear, Rob, from the conversation here, and it's not a lot of time, you'll see the examples tonight, Rob, we'll see you there, thanks so much for coming back. we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
one month	QUANTITY	0.99+
Germany	LOCATION	0.99+
last year	DATE	0.99+
Red Hat	ORGANIZATION	0.99+
Monday	DATE	0.99+
one	QUANTITY	0.99+
100%	QUANTITY	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
first	QUANTITY	0.99+
two	QUANTITY	0.99+
ibm.com/WinWithAI	OTHER	0.99+
Watson Studio	TITLE	0.99+
Python	TITLE	0.99+
Scala	TITLE	0.99+
First	QUANTITY	0.99+
Data Science Elite	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
one step	QUANTITY	0.99+
One	QUANTITY	0.99+
Times Square	LOCATION	0.99+
today	DATE	0.99+
Vince Lombardi	PERSON	0.98+
three	QUANTITY	0.98+
Stack Overflow	ORGANIZATION	0.98+
tonight	DATE	0.98+
Javits Center	LOCATION	0.98+
Barney	ORGANIZATION	0.98+
Terminal 5	LOCATION	0.98+
IBM Analytics	ORGANIZATION	0.98+
Watson	TITLE	0.97+
two steps	QUANTITY	0.97+
New York City	LOCATION	0.97+
Watson Applications	TITLE	0.97+
Cloud	TITLE	0.96+
This evening	DATE	0.95+
Watson Machine Learning	TITLE	0.94+
two area	QUANTITY	0.93+
seven-figure deals	QUANTITY	0.92+
Cube	PERSON	0.91+

Sreesha Rao, Niagara Bottling & Seth Dobrin, IBM | Change The Game: Winning With AI 2018

>> Live, from Times Square, in New York City, it's theCUBE covering IBM's Change the Game: Winning with AI. Brought to you by IBM. >> Welcome back to the Big Apple, everybody. I'm Dave Vellante, and you're watching theCUBE, the leader in live tech coverage, and we're here covering a special presentation of IBM's Change the Game: Winning with AI. IBM's got an analyst event going on here at the Westin today in the theater district. They've got 50-60 analysts here. They've got a partner summit going on, and then tonight, at Terminal 5 of the West Side Highway, they've got a customer event, a lot of customers there. We've talked earlier today about the hard news. Seth Dobern is here. He's the Chief Data Officer of IBM Analytics, and he's joined by Shreesha Rao who is the Senior Manager of IT Applications at California-based Niagara Bottling. Gentlemen, welcome to theCUBE. Thanks so much for coming on. >> Thank you, Dave. >> Well, thanks Dave for having us. >> Yes, always a pleasure Seth. We've known each other for a while now. I think we met in the snowstorm in Boston, sparked something a couple years ago. >> Yep. When we were both trapped there. >> Yep, and at that time, we spent a lot of time talking about your internal role as the Chief Data Officer, working closely with Inderpal Bhandari, and you guys are doing inside of IBM. I want to talk a little bit more about your other half which is working with clients and the Data Science Elite Team, and we'll get into what you're doing with Niagara Bottling, but let's start there, in terms of that side of your role, give us the update. >> Yeah, like you said, we spent a lot of time talking about how IBM is implementing the CTO role. While we were doing that internally, I spent quite a bit of time flying around the world, talking to our clients over the last 18 months since I joined IBM, and we found a consistent theme with all the clients, in that, they needed help learning how to implement data science, AI, machine learning, whatever you want to call it, in their enterprise. There's a fundamental difference between doing these things at a university or as part of a Kaggle competition than in an enterprise, so we felt really strongly that it was important for the future of IBM that all of our clients become successful at it because what we don't want to do is we don't want in two years for them to go "Oh my God, this whole data science thing was a scam. We haven't made any money from it." And it's not because the data science thing is a scam. It's because the way they're doing it is not conducive to business, and so we set up this team we call the Data Science Elite Team, and what this team does is we sit with clients around a specific use case for 30, 60, 90 days, it's really about 3 or 4 sprints, depending on the material, the client, and how long it takes, and we help them learn through this use case, how to use Python, R, Scala in our platform obviously, because we're here to make money too, to implement these projects in their enterprise. Now, because it's written in completely open-source, if they're not happy with what the product looks like, they can take their toys and go home afterwards. It's on us to prove the value as part of this, but there's a key point here. My team is not measured on sales. They're measured on adoption of AI in the enterprise, and so it creates a different behavior for them. So they're really about "Make the enterprise successful," right, not "Sell this software." >> Yeah, compensation drives behavior. >> Yeah, yeah. >> So, at this point, I ask, "Well, do you have any examples?" so Shreesha, let's turn to you. (laughing softly) Niagara Bottling -- >> As a matter of fact, Dave, we do. (laughing) >> Yeah, so you're not a bank with a trillion dollars in assets under management. Tell us about Niagara Bottling and your role. >> Well, Niagara Bottling is the biggest private label bottled water manufacturing company in the U.S. We make bottled water for Costcos, Walmarts, major national grocery retailers. These are our customers whom we service, and as with all large customers, they're demanding, and we provide bottled water at relatively low cost and high quality. >> Yeah, so I used to have a CIO consultancy. We worked with every CIO up and down the East Coast. I always observed, really got into a lot of organizations. I was always observed that it was really the heads of Application that drove AI because they were the glue between the business and IT, and that's really where you sit in the organization, right? >> Yes. My role is to support the business and business analytics as well as I support some of the distribution technologies and planning technologies at Niagara Bottling. >> So take us the through the project if you will. What were the drivers? What were the outcomes you envisioned? And we can kind of go through the case study. >> So the current project that we leveraged IBM's help was with a stretch wrapper project. Each pallet that we produce--- we produce obviously cases of bottled water. These are stacked into pallets and then shrink wrapped or stretch wrapped with a stretch wrapper, and this project is to be able to save money by trying to optimize the amount of stretch wrap that goes around a pallet. We need to be able to maintain the structural stability of the pallet while it's transported from the manufacturing location to our customer's location where it's unwrapped and then the cases are used. >> And over breakfast we were talking. You guys produce 2833 bottles of water per second. >> Wow. (everyone laughs) >> It's enormous. The manufacturing line is a high speed manufacturing line, and we have a lights-out policy where everything runs in an automated fashion with raw materials coming in from one end and the finished goods, pallets of water, going out. It's called pellets to pallets. Pellets of plastic coming in through one end and pallets of water going out through the other end. >> Are you sitting on top of an aquifer? Or are you guys using sort of some other techniques? >> Yes, in fact, we do bore wells and extract water from the aquifer. >> Okay, so the goal was to minimize the amount of material that you used but maintain its stability? Is that right? >> Yes, during transportation, yes. So if we use too much plastic, we're not optimally, I mean, we're wasting material, and cost goes up. We produce almost 16 million pallets of water every single year, so that's a lot of shrink wrap that goes around those, so what we can save in terms of maybe 15-20% of shrink wrap costs will amount to quite a bit. >> So, how does machine learning fit into all of this? >> So, machine learning is way to understand what kind of profile, if we can measure what is happening as we wrap the pallets, whether we are wrapping it too tight or by stretching it, that results in either a conservative way of wrapping the pallets or an aggressive way of wrapping the pallets. >> I.e. too much material, right? >> Too much material is conservative, and aggressive is too little material, and so we can achieve some savings if we were to alternate between the profiles. >> So, too little material means you lose product, right? >> Yes, and there's a risk of breakage, so essentially, while the pallet is being wrapped, if you are stretching it too much there's a breakage, and then it interrupts production, so we want to try and avoid that. We want a continuous production, at the same time, we want the pallet to be stable while saving material costs. >> Okay, so you're trying to find that ideal balance, and how much variability is in there? Is it a function of distance and how many touches it has? Maybe you can share with that. >> Yes, so each pallet takes about 16-18 wraps of the stretch wrapper going around it, and that's how much material is laid out. About 250 grams of plastic that goes on there. So we're trying to optimize the gram weight which is the amount of plastic that goes around each of the pallet. >> So it's about predicting how much plastic is enough without having breakage and disrupting your line. So they had labeled data that was, "if we stretch it this much, it breaks. If we don't stretch it this much, it doesn't break, but then it was about predicting what's good enough, avoiding both of those extremes, right? >> Yes. >> So it's a truly predictive and iterative model that we've built with them. >> And, you're obviously injecting data in terms of the trip to the store as well, right? You're taking that into consideration in the model, right? >> Yeah that's mainly to make sure that the pallets are stable during transportation. >> Right. >> And that is already determined how much containment force is required when your stretch and wrap each pallet. So that's one of the variables that is measured, but the inputs and outputs are-- the input is the amount of material that is being used in terms of gram weight. We are trying to minimize that. So that's what the whole machine learning exercise was. >> And the data comes from where? Is it observation, maybe instrumented? >> Yeah, the instruments. Our stretch-wrapper machines have an ignition platform, which is a Scada platform that allows us to measure all of these variables. We would be able to get machine variable information from those machines and then be able to hopefully, one day, automate that process, so the feedback loop that says "On this profile, we've not had any breaks. We can continue," or if there have been frequent breaks on a certain profile or machine setting, then we can change that dynamically as the product is moving through the manufacturing process. >> Yeah, so think of it as, it's kind of a traditional manufacturing production line optimization and prediction problem right? It's minimizing waste, right, while maximizing the output and then throughput of the production line. When you optimize a production line, the first step is to predict what's going to go wrong, and then the next step would be to include precision optimization to say "How do we maximize? Using the constraints that the predictive models give us, how do we maximize the output of the production line?" This is not a unique situation. It's a unique material that we haven't really worked with, but they had some really good data on this material, how it behaves, and that's key, as you know, Dave, and probable most of the people watching this know, labeled data is the hardest part of doing machine learning, and building those features from that labeled data, and they had some great data for us to start with. >> Okay, so you're collecting data at the edge essentially, then you're using that to feed the models, which is running, I don't know, where's it running, your data center? Your cloud? >> Yeah, in our data center, there's an instance of DSX Local. >> Okay. >> That we stood up. Most of the data is running through that. We build the models there. And then our goal is to be able to deploy to the edge where we can complete the loop in terms of the feedback that happens. >> And iterate. (Shreesha nods) >> And DSX Local, is Data Science Experience Local? >> Yes. >> Slash Watson Studio, so they're the same thing. >> Okay now, what role did IBM and the Data Science Elite Team play? You could take us through that. >> So, as we discussed earlier, adopting data science is not that easy. It requires subject matter, expertise. It requires understanding of data science itself, the tools and techniques, and IBM brought that as a part of the Data Science Elite Team. They brought both the tools and the expertise so that we could get on that journey towards AI. >> And it's not a "do the work for them." It's a "teach to fish," and so my team sat side by side with the Niagara Bottling team, and we walked them through the process, so it's not a consulting engagement in the traditional sense. It's how do we help them learn how to do it? So it's side by side with their team. Our team sat there and walked them through it. >> For how many weeks? >> We've had about two sprints already, and we're entering the third sprint. It's been about 30-45 days between sprints. >> And you have your own data science team. >> Yes. Our team is coming up to speed using this project. They've been trained but they needed help with people who have done this, been there, and have handled some of the challenges of modeling and data science. >> So it accelerates that time to --- >> Value. >> Outcome and value and is a knowledge transfer component -- >> Yes, absolutely. >> It's occurring now, and I guess it's ongoing, right? >> Yes. The engagement is unique in the sense that IBM's team came to our factory, understood what that process, the stretch-wrap process looks like so they had an understanding of the physical process and how it's modeled with the help of the variables and understand the data science modeling piece as well. Once they know both side of the equation, they can help put the physical problem and the digital equivalent together, and then be able to correlate why things are happening with the appropriate data that supports the behavior. >> Yeah and then the constraints of the one use case and up to 90 days, there's no charge for those two. Like I said, it's paramount that our clients like Niagara know how to do this successfully in their enterprise. >> It's a freebie? >> No, it's no charge. Free makes it sound too cheap. (everybody laughs) >> But it's part of obviously a broader arrangement with buying hardware and software, or whatever it is. >> Yeah, its a strategy for us to help make sure our clients are successful, and I want it to minimize the activation energy to do that, so there's no charge, and the only requirements from the client is it's a real use case, they at least match the resources I put on the ground, and they sit with us and do things like this and act as a reference and talk about the team and our offerings and their experiences. >> So you've got to have skin in the game obviously, an IBM customer. There's got to be some commitment for some kind of business relationship. How big was the collective team for each, if you will? >> So IBM had 2-3 data scientists. (Dave takes notes) Niagara matched that, 2-3 analysts. There were some working with the machines who were familiar with the machines and others who were more familiar with the data acquisition and data modeling. >> So each of these engagements, they cost us about $250,000 all in, so they're quite an investment we're making in our clients. >> I bet. I mean, 2-3 weeks over many, many weeks of super geeks time. So you're bringing in hardcore data scientists, math wizzes, stat wiz, data hackers, developer--- >> Data viz people, yeah, the whole stack. >> And the level of skills that Niagara has? >> We've got actual employees who are responsible for production, our manufacturing analysts who help aid in troubleshooting problems. If there are breakages, they go analyze why that's happening. Now they have data to tell them what to do about it, and that's the whole journey that we are in, in trying to quantify with the help of data, and be able to connect our systems with data, systems and models that help us analyze what happened and why it happened and what to do before it happens. >> Your team must love this because they're sort of elevating their skills. They're working with rock star data scientists. >> Yes. >> And we've talked about this before. A point that was made here is that it's really important in these projects to have people acting as product owners if you will, subject matter experts, that are on the front line, that do this everyday, not just for the subject matter expertise. I'm sure there's executives that understand it, but when you're done with the model, bringing it to the floor, and talking to their peers about it, there's no better way to drive this cultural change of adopting these things and having one of your peers that you respect talk about it instead of some guy or lady sitting up in the ivory tower saying "thou shalt." >> Now you don't know the outcome yet. It's still early days, but you've got a model built that you've got confidence in, and then you can iterate that model. What's your expectation for the outcome? >> We're hoping that preliminary results help us get up the learning curve of data science and how to leverage data to be able to make decisions. So that's our idea. There are obviously optimal settings that we can use, but it's going to be a trial and error process. And through that, as we collect data, we can understand what settings are optimal and what should we be using in each of the plants. And if the plants decide, hey they have a subjective preference for one profile versus another with the data we are capturing we can measure when they deviated from what we specified. We have a lot of learning coming from the approach that we're taking. You can't control things if you don't measure it first. >> Well, your objectives are to transcend this one project and to do the same thing across. >> And to do the same thing across, yes. >> Essentially pay for it, with a quick return. That's the way to do things these days, right? >> Yes. >> You've got more narrow, small projects that'll give you a quick hit, and then leverage that expertise across the organization to drive more value. >> Yes. >> Love it. What a great story, guys. Thanks so much for coming to theCUBE and sharing. >> Thank you. >> Congratulations. You must be really excited. >> No. It's a fun project. I appreciate it. >> Thanks for having us, Dave. I appreciate it. >> Pleasure, Seth. Always great talking to you, and keep it right there everybody. You're watching theCUBE. We're live from New York City here at the Westin Hotel. cubenyc #cubenyc Check out the ibm.com/winwithai Change the Game: Winning with AI Tonight. We'll be right back after a short break. (minimal upbeat music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by IBM. at Terminal 5 of the West Side Highway, I think we met in the snowstorm in Boston, sparked something When we were both trapped there. Yep, and at that time, we spent a lot of time and we found a consistent theme with all the clients, So, at this point, I ask, "Well, do you have As a matter of fact, Dave, we do. Yeah, so you're not a bank with a trillion dollars Well, Niagara Bottling is the biggest private label and that's really where you sit in the organization, right? and business analytics as well as I support some of the And we can kind of go through the case study. So the current project that we leveraged IBM's help was And over breakfast we were talking. (everyone laughs) It's called pellets to pallets. Yes, in fact, we do bore wells and So if we use too much plastic, we're not optimally, as we wrap the pallets, whether we are wrapping it too little material, and so we can achieve some savings so we want to try and avoid that. and how much variability is in there? goes around each of the pallet. So they had labeled data that was, "if we stretch it this that we've built with them. Yeah that's mainly to make sure that the pallets So that's one of the variables that is measured, one day, automate that process, so the feedback loop the predictive models give us, how do we maximize the Yeah, in our data center, Most of the data And iterate. the Data Science Elite Team play? so that we could get on that journey towards AI. And it's not a "do the work for them." and we're entering the third sprint. some of the challenges of modeling and data science. that supports the behavior. Yeah and then the constraints of the one use case No, it's no charge. with buying hardware and software, or whatever it is. minimize the activation energy to do that, There's got to be some commitment for some and others who were more familiar with the So each of these engagements, So you're bringing in hardcore data scientists, math wizzes, and that's the whole journey that we are in, in trying to Your team must love this because that are on the front line, that do this everyday, and then you can iterate that model. And if the plants decide, hey they have a subjective and to do the same thing across. That's the way to do things these days, right? across the organization to drive more value. Thanks so much for coming to theCUBE and sharing. You must be really excited. I appreciate it. I appreciate it. Change the Game: Winning with AI Tonight.

ENTITIES

Entity	Category	Confidence
Shreesha Rao	PERSON	0.99+
Seth Dobern	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave Vellante	PERSON	0.99+
Walmarts	ORGANIZATION	0.99+
Costcos	ORGANIZATION	0.99+
Dave	PERSON	0.99+
30	QUANTITY	0.99+
Boston	LOCATION	0.99+
New York City	LOCATION	0.99+
California	LOCATION	0.99+
Seth Dobrin	PERSON	0.99+
60	QUANTITY	0.99+
Niagara	ORGANIZATION	0.99+
Seth	PERSON	0.99+
Shreesha	PERSON	0.99+
U.S.	LOCATION	0.99+
Sreesha Rao	PERSON	0.99+
third sprint	QUANTITY	0.99+
90 days	QUANTITY	0.99+
two	QUANTITY	0.99+
first step	QUANTITY	0.99+
Inderpal Bhandari	PERSON	0.99+
Niagara Bottling	ORGANIZATION	0.99+
Python	TITLE	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
ibm.com/winwithai	OTHER	0.99+
one	QUANTITY	0.99+
Terminal 5	LOCATION	0.99+
two years	QUANTITY	0.99+
about $250,000	QUANTITY	0.98+
Times Square	LOCATION	0.98+
Scala	TITLE	0.98+
2018	DATE	0.98+
15-20%	QUANTITY	0.98+
IBM Analytics	ORGANIZATION	0.98+
each	QUANTITY	0.98+
today	DATE	0.98+
each pallet	QUANTITY	0.98+
Kaggle	ORGANIZATION	0.98+
West Side Highway	LOCATION	0.97+
Each pallet	QUANTITY	0.97+
4 sprints	QUANTITY	0.97+
About 250 grams	QUANTITY	0.97+
both side	QUANTITY	0.96+
Data Science Elite Team	ORGANIZATION	0.96+
one day	QUANTITY	0.95+
every single year	QUANTITY	0.95+
Niagara Bottling	PERSON	0.93+
about two sprints	QUANTITY	0.93+
one end	QUANTITY	0.93+
R	TITLE	0.92+
2-3 weeks	QUANTITY	0.91+
one profile	QUANTITY	0.91+
50-60 analysts	QUANTITY	0.91+
trillion dollars	QUANTITY	0.9+
2-3 data scientists	QUANTITY	0.9+
about 30-45 days	QUANTITY	0.88+
almost 16 million pallets of water	QUANTITY	0.88+
Big Apple	LOCATION	0.87+
couple years ago	DATE	0.87+
last 18 months	DATE	0.87+
Westin Hotel	ORGANIZATION	0.83+
pallet	QUANTITY	0.83+
#cubenyc	LOCATION	0.82+
2833 bottles of water per second	QUANTITY	0.82+
the Game: Winning with AI	TITLE	0.81+

John Thomas, IBM | Change the Game: Winning With AI

(upbeat music) >> Live from Time Square in New York City, it's The Cube. Covering IBM's change the game, winning with AI. Brought to you by IBM. >> Hi everybody, welcome back to The Big Apple. My name is Dave Vellante. We're here in the Theater District at The Westin Hotel covering a Special Cube event. IBM's got a big event today and tonight, if we can pan here to this pop-up. Change the game: winning with AI. So IBM has got an event here at The Westin, The Tide at Terminal 5 which is right up the Westside Highway. Go to IBM.com/winwithAI. Register, you can watch it online, or if you're in the city come down and see us, we'll be there. Uh, we have a bunch of customers will be there. We had Rob Thomas on earlier, he's kind of the host of the event. IBM does these events periodically throughout the year. They gather customers, they put forth some thought leadership, talk about some hard dues. So, we're very excited to have John Thomas here, he's a distinguished engineer and Director of IBM Analytics, long time Cube alum, great to see you again John >> Same here. Thanks for coming on. >> Great to have you. >> So we just heard a great case study with Niagara Bottling around the Data Science Elite Team, that's something that you've been involved in, and we're going to get into that. But give us the update since we last talked, what have you been up to?? >> Sure sure. So we're living and breathing data science these days. So the Data Science Elite Team, we are a team of practitioners. We actually work collaboratively with clients. And I stress on the word collaboratively because we're not there to just go do some work for a client. We actually sit down, expect the client to put their team to work with our team, and we build AI solutions together. Scope use cases, but sort of you know, expose them to expertise, tools, techniques, and do this together, right. And we've been very busy, (laughs) I can tell you that. You know it has been a lot of travel around the world. A lot of interest in the program. And engagements that bring us very interesting use cases. You know, use cases that you would expect to see, use cases that are hmmm, I had not thought of a use case like that. You know, but it's been an interesting journey in the last six, eight months now. >> And these are pretty small, agile teams. >> Sometimes people >> Yes. use tiger teams and they're two to three pizza teams, right? >> Yeah. And my understanding is you bring some number of resources that's called two three data scientists, >> Yes and the customer matches that resource, right? >> Exactly. That's the prerequisite. >> That is the prerequisite, because we're not there to just do the work for the client. We want to do this in a collaborative fashion, right. So, the customers Data Science Team is learning from us, we are working with them hand in hand to build a solution out. >> And that's got to resonate well with customers. >> Absolutely I mean so often the services business is like kind of, customers will say well I don't want to keep going back to a company to get these services >> Right, right. I want, teach me how to fish and that's exactly >> That's exactly! >> I was going to use that phrase. That's exactly what we do, that's exactly. So at the end of the two or three month period, when IBM leaves, my team leaves, you know, the client, the customer knows what the tools are, what the techniques are, what to watch out for, what are success criteria, they have a good handle of that. >> So we heard about the Niagara Bottling use case, which was a pretty narrow, >> Mm-hmm. How can we optimize the use of the plastic wrapping, save some money there, but at the same time maintain stability. >> Ya. You know very, quite a narrow in this case. >> Yes, yes. What are some of the other use cases? >> Yeah that's a very, like you said, a narrow one. But there are some use cases that span industries, that cut across different domains. I think I may have mentioned this on one of our previous discussions, Dave. You know customer interactions, trying to improve customer interactions is something that cuts across industry, right. Now that can be across different channels. One of the most prominent channels is a call center, I think we have talked about this previously. You know I hate calling into a call center (laughter) because I don't know Yeah, yeah. What kind of support I'm going to get. But, what if you could equip the call center agents to provide consistent service to the caller, and handle the calls in the best appropriate way. Reducing costs on the business side because call handling is expensive. And eventually lead up to can I even avoid the call, through insights on why the call is coming in in the first place. So this use case cuts across industry. Any enterprise that has got a call center is doing this. So we are looking at can we apply machine-learning techniques to understand dominant topics in the conversation. Once we understand with these have with unsupervised techniques, once we understand dominant topics in the conversation, can we drill into that and understand what are the intents, and does the intent change as the conversation progress? So you know I'm calling someone, it starts off with pleasantries, it then goes into weather, how are the kids doing? You know, complain about life in general. But then you get to something of substance why the person was calling in the first place. And then you may think that is the intent of the conversation, but you find that as the conversation progresses, the intent might actually change. And can you understand that real time? Can you understand the reasons behind the call, so that you could take proactive steps to maybe avoid the call coming in at the first place? This use case Dave, you know we are seeing so much interest in this use case. Because call centers are a big cost to most enterprises. >> Let's double down on that because I want to understand this. So you basically doing. So every time you call a call center this call may be recorded, >> (laughter) Yeah. For quality of service. >> Yeah. So you're recording the calls maybe using MLP to transcribe those calls. >> MLP is just the first step, >> Right. so you're absolutely right, when a calls come in there's already call recording systems in place. We're not getting into that space, right. So call recording systems record the voice calls. So often in offline batch mode you can take these millions of calls, pass it through a speech-to-text mechanism, which produces a text equivalent of the voice recordings. Then what we do is we apply unsupervised machine learning, and clustering, and topic-modeling techniques against it to understand what are the dominant topics in this conversation. >> You do kind of an entity extraction of those topics. >> Exactly, exactly, exactly. >> Then we find what is the most relevant, what are the relevant ones, what is the relevancy of topics in a particular conversation. That's not enough, that is just step two, if you will. Then you have to, we build what is called an intent hierarchy. So this is at top most level will be let's say payments, the call is about payments. But what about payments, right? Is it an intent to make a late payment? Or is the intent to avoid the payment or contest a payment? Or is the intent to structure a different payment mechanism? So can you get down to that level of detail? Then comes a further level of detail which is the reason that is tied to this intent. What is a reason for a late payment? Is it a job loss or job change? Is it because they are just not happy with the charges that I have coming? What is a reason? And the reason can be pretty complex, right? It may not be in the immediate vicinity of the snippet of conversation itself. So you got to go find out what the reason is and see if you can match it to this particular intent. So multiple steps off the journey, and eventually what we want to do is so we do our offers in an offline batch mode, and we are building a series of classifiers instead of classifiers. But eventually we want to get this to real time action. So think of this, if you have machine learning models, supervised models that can predict the intent, the reasons, et cetera, you can have them deployed operationalize them, so that when a call comes in real time, you can screen it in real time, do the speech to text, you can do this pass it to the supervise models that have been deployed, and the model fires and comes back and says this is the intent, take some action or guide the agent to take some action real time. >> Based on some automated discussion, so tell me what you're calling about, that kind of thing, >> Right. Is that right? >> So it's probably even gone past tell me what you're calling about. So it could be the conversation has begun to get into you know, I'm going through a tough time, my spouse had a job change. You know that is itself an indicator of some other reasons, and can that be used to prompt the CSR >> Ah, to take some action >> Ah, oh case. appropriate to the conversation. >> So I'm not talking to a machine, at first >> no no I'm talking to a human. >> Still talking to human. >> And then real time feedback to that human >> Exactly, exactly. is a good example of >> Exactly. human augmentation. >> Exactly, exactly. I wanted to go back and to process a little bit in terms of the model building. Are there humans involved in calibrating the model? >> There has to be. Yeah, there has to be. So you know, for all the hype in the industry, (laughter) you still need a (laughter). You know what it is is you need expertise to look at what these models produce, right. Because if you think about it, machine learning algorithms don't by themselves have an understanding of the domain. They are you know either statistical or similar in nature, so somebody has to marry the statistical observations with the domain expertise. So humans are definitely involved in the building of these models and claiming of these models. >> Okay. >> (inaudible). So that's who you got math, you got stats, you got some coding involved, and you >> Absolutely got humans are the last mile >> Absolutely. to really bring that >> Absolutely. expertise. And then in terms of operationalizing it, how does that actually get done? What tech behind that? >> Ah, yeah. >> It's a very good question, Dave. You build models, and what good are they if they stay inside your laptop, you know, they don't go anywhere. What you need to do is, I use a phrase, weave these models in your business processes and your applications. So you need a way to deploy these models. The models should be consumable from your business processes. Now it could be a Rest API Call could be a model. In some cases a Rest API Call is not sufficient, the latency is too high. Maybe you've got embed that model right into where your application is running. You know you've got data on a mainframe. A credit card transaction comes in, and the authorization for the credit card is happening in a four millisecond window on the mainframe on all, not all, but you know CICS COBOL Code. I don't have the time to make a Rest API call outside. I got to have the model execute in context with my CICS COBOL Code in that memory space. >> Yeah right. You know so the operationalizing is deploying, consuming these models, and then beyond that, how do the models behave over time? Because you can have the best programmer, the best data scientist build the absolute best model, which has got great accuracy, great performance today. Two weeks from now, performance is going to go down. >> Hmm. How do I monitor that? How do I trigger a loads map for below certain threshold. And, can I have a system in place that reclaims this model with new data as it comes in. >> So you got to understand where the data lives. >> Absolutely. You got to understand the physics, >> Yes. The latencies involved. >> Yes. You got to understand the economics. >> Yes. And there's also probably in many industries legal implications. >> Oh yes. >> No, the explainability of models. You know, can I prove that there is no bias here. >> Right. Now all of these are challenging but you know, doable things. >> What makes a successful engagement? Obviously you guys are outcome driven, >> Yeah. but talk about how you guys measure success. >> So um, for our team right now it is not about revenue, it's purely about adoption. Does the client, does the customer see the value of what IBM brings to the table. This is not just tools and technology, by the way. It's also expertise, right? >> Hmm. So this notion of expertise as a service, which is coupled with tools and technology to build a successful engagement. The way we measure success is has the client, have we built out the use case in a way that is useful for the business? Two, does a client see value in going further with that. So this is right now what we look at. It's not, you know yes of course everybody is scared about revenue. But that is not our key metric. Now in order to get there though, what we have found, a little bit of hard work, yes, uh, no you need different constituents of the customer to come together. It's not just me sending a bunch of awesome Python Programmers to the client. >> Yeah right. But now it is from the customer's side we need involvement from their Data Science Team. We talk about collaborating with them. We need involvement from their line of business. Because if the line of business doesn't care about the models we've produced you know, what good are they? >> Hmm. And third, people don't usually think about it, we need IT to be part of the discussion. Not just part of the discussion, part of being the stakeholder. >> Yes, so you've got, so IBM has the chops to actually bring these constituents together. >> Ya. I have actually a fair amount of experience in herding cats on large organizations. (laughter) And you know, the customer, they've got skin in the IBM game. This is to me a big differentiator between IBM, certainly some of the other technology suppliers who don't have the depth of services, expertise, and domain expertise. But on the flip side of that, differentiation from many of the a size who have that level of global expertise, but they don't have tech piece. >> Right. >> Now they would argue well we do anybodies tech. >> Ya. But you know, if you've got tech. >> Ya. >> You just got to (laughter) Ya. >> Bring those two together. >> Exactly. And that's really seems to me to be the big differentiator >> Yes, absolutely. for IBM. Well John, thanks so much for stopping by theCube and explaining sort of what you've been up to, the Data Science Elite Team, very exciting. Six to nine months in, >> Yes. are you declaring success yet? Still too early? >> Uh, well we're declaring success and we are growing, >> Ya. >> Growth is good. >> A lot of lot of attention. >> Alright, great to see you again, John. >> Absolutely, thanks you Dave. Thanks very much. Okay, keep it right there everybody. You're watching theCube. We're here at The Westin in midtown and we'll be right back after this short break. I'm Dave Vellante. (tech music)

Published Date : Sep 13 2018

SUMMARY :

Brought to you by IBM. he's kind of the host of the event. Thanks for coming on. last talked, what have you been up to?? We actually sit down, expect the client to use tiger teams and they're two to three And my understanding is you bring some That's the prerequisite. That is the prerequisite, because we're not And that's got to resonate and that's exactly So at the end of the two or three month period, How can we optimize the use of the plastic wrapping, Ya. You know very, What are some of the other use cases? intent of the conversation, but you So every time you call a call center (laughter) Yeah. So you're recording the calls maybe So call recording systems record the voice calls. You do kind of an entity do the speech to text, you can do this Is that right? has begun to get into you know, appropriate to the conversation. I'm talking to a human. is a good example of Exactly. a little bit in terms of the model building. You know what it is is you need So that's who you got math, you got stats, to really bring that how does that actually get done? I don't have the time to make a Rest API call outside. You know so the operationalizing is deploying, that reclaims this model with new data as it comes in. So you got to understand where You got to understand Yes. You got to understand And there's also probably in many industries No, the explainability of models. but you know, doable things. but talk about how you guys measure success. the value of what IBM brings to the table. constituents of the customer to come together. about the models we've produced you know, Not just part of the discussion, to actually bring these differentiation from many of the a size Now they would argue Ya. But you know, And that's really seems to me to be Six to nine months in, are you declaring success yet? Alright, great to see you Absolutely, thanks you Dave.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
John	PERSON	0.99+
Dave	PERSON	0.99+
Rob Thomas	PERSON	0.99+
two	QUANTITY	0.99+
John Thomas	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Six	QUANTITY	0.99+
Time Square	LOCATION	0.99+
tonight	DATE	0.99+
first step	QUANTITY	0.99+
three	QUANTITY	0.99+
three month	QUANTITY	0.99+
nine months	QUANTITY	0.99+
third	QUANTITY	0.98+
Two	QUANTITY	0.98+
One	QUANTITY	0.98+
New York City	LOCATION	0.98+
today	DATE	0.98+
Python	TITLE	0.98+
IBM Analytics	ORGANIZATION	0.97+
Terminal 5	LOCATION	0.97+
Data Science Elite Team	ORGANIZATION	0.96+
Niagara	ORGANIZATION	0.96+
one	QUANTITY	0.96+
IBM.com/winwithAI	OTHER	0.96+
first place	QUANTITY	0.95+
eight months	QUANTITY	0.94+
Change the Game: Winning With AI	TITLE	0.89+
The Westin	ORGANIZATION	0.89+
Niagara Bottling	PERSON	0.89+
Theater District	LOCATION	0.88+
four millisecond window	QUANTITY	0.87+
step two	QUANTITY	0.86+
Cube	PERSON	0.85+
Westside Highway	LOCATION	0.83+
first	QUANTITY	0.83+
Two weeks	DATE	0.82+
millions of calls	QUANTITY	0.79+
two three data scientists	QUANTITY	0.78+
CICS	TITLE	0.77+
COBOL	OTHER	0.69+
Rest API call	OTHER	0.68+
The Tide	LOCATION	0.68+
theCube	ORGANIZATION	0.67+
The Westin	LOCATION	0.66+
Rest API	OTHER	0.66+
Apple	LOCATION	0.63+
Big	ORGANIZATION	0.62+
Westin	LOCATION	0.51+
last six	DATE	0.48+
Hotel	ORGANIZATION	0.45+
theCube	TITLE	0.33+
Bottling	COMMERCIAL_ITEM	0.3+

Rob Thomas, IBM | Change the Game: Winning With AI

>> Live from Times Square in New York City, it's The Cube covering IBM's Change the Game: Winning with AI, brought to you by IBM. >> Hello everybody, welcome to The Cube's special presentation. We're covering IBM's announcements today around AI. IBM, as The Cube does, runs of sessions and programs in conjunction with Strata, which is down at the Javits, and we're Rob Thomas, who's the General Manager of IBM Analytics. Long time Cube alum, Rob, great to see you. >> Dave, great to see you. >> So you guys got a lot going on today. We're here at the Westin Hotel, you've got an analyst event, you've got a partner meeting, you've got an event tonight, Change the game: winning with AI at Terminal 5, check that out, ibm.com/WinWithAI, go register there. But Rob, let's start with what you guys have going on, give us the run down. >> Yeah, it's a big week for us, and like many others, it's great when you have Strata, a lot of people in town. So, we've structured a week where, today, we're going to spend a lot of time with analysts and our business partners, talking about where we're going with data and AI. This evening, we've got a broadcast, it's called Winning with AI. What's unique about that broadcast is it's all clients. We've got clients on stage doing demonstrations, how they're using IBM technology to get to unique outcomes in their business. So I think it's going to be a pretty unique event, which should be a lot of fun. >> So this place, it looks like a cool event, a venue, Terminal 5, it's just up the street on the west side highway, probably a mile from the Javits Center, so definitely check that out. Alright, let's talk about, Rob, we've known each other for a long time, we've seen the early Hadoop days, you guys were very careful about diving in, you kind of let things settle and watched very carefully, and then came in at the right time. But we saw the evolution of so-called Big Data go from a phase of really reducing investments, cheaper data warehousing, and what that did is allowed people to collect a lot more data, and kind of get ready for this era that we're in now. But maybe you can give us your perspective on the phases, the waves that we've seen of data, and where we are today and where we're going. >> I kind of think of it as a maturity curve. So when I go talk to clients, I say, look, you need to be on a journey towards AI. I think probably nobody disagrees that they need something there, the question is, how do you get there? So you think about the steps, it's about, a lot of people started with, we're going to reduce the cost of our operations, we're going to use data to take out cost, that was kind of the Hadoop thrust, I would say. Then they moved to, well, now we need to see more about our data, we need higher performance data, BI data warehousing. So, everybody, I would say, has dabbled in those two area. The next leap forward is self-service analytics, so how do you actually empower everybody in your organization to use and access data? And the next step beyond that is, can I use AI to drive new business models, new levers of growth, for my business? So, I ask clients, pin yourself on this journey, most are, depends on the division or the part of the company, they're at different areas, but as I tell everybody, if you don't know where you are and you don't know where you want to go, you're just going to wind around, so I try to get them to pin down, where are you versus where do you want to go? >> So four phases, basically, the sort of cheap data store, the BI data warehouse modernization, self-service analytics, a big part of that is data science and data science collaboration, you guys have a lot of investments there, and then new business models with AI automation running on top. Where are we today? Would you say we're kind of in-between BI/DW modernization and on our way to self-service analytics, or what's your sense? >> I'd say most are right in the middle between BI data warehousing and self-service analytics. Self-service analytics is hard, because it requires you, sometimes to take a couple steps back, and look at your data. It's hard to provide self-service if you don't have a data catalog, if you don't have data security, if you haven't gone through the processes around data governance. So, sometimes you have to take one step back to go two steps forward, that's why I see a lot of people, I'd say, stuck in the middle right now. And the examples that you're going to see tonight as part of the broadcast are clients that have figured out how to break through that wall, and I think that's pretty illustrative of what's possible. >> Okay, so you're saying that, got to maybe take a step back and get the infrastructure right with, let's say a catalog, to give some basic things that they have to do, some x's and o's, you've got the Vince Lombardi played out here, and also, skillsets, I imagine, is a key part of that. So, that's what they've got to do to get prepared, and then, what's next? They start creating new business models, imagining this is where the cheap data officer comes in and it's an executive level, what are you seeing clients as part of digital transformation, what's the conversation like with customers? >> The biggest change, the great thing about the times we live in, is technology's become so accessible, you can do things very quickly. We created a team last year called Data Science Elite, and we've hired what we think are some of the best data scientists in the world. Their only job is to go work with clients and help them get to a first success with data science. So, we put a team in. Normally, one month, two months, normally a team of two or three people, our investment, and we say, let's go build a model, let's get to an outcome, and you can do this incredibly quickly now. I tell clients, I see somebody that says, we're going to spend six months evaluating and thinking about this, I was like, why would you spend six months thinking about this when you could actually do it in one month? So you just need to get over the edge and go try it. >> So we're going to learn more about the Data Science Elite team. We've got John Thomas coming on today, who is a distinguished engineer at IBM, and he's very much involved in that team, and I think we have a customer who's actually gone through that, so we're going to talk about what their experience was with the Data Science Elite team. Alright, you've got some hard news coming up, you've actually made some news earlier with Hortonworks and Red Hat, I want to talk about that, but you've also got some hard news today. Take us through that. >> Yeah, let's talk about all three. First, Monday we announced the expanded relationship with both Hortonworks and Red Hat. This goes back to one of the core beliefs I talked about, every enterprise is modernizing their data and application of states, I don't think there's any debate about that. We are big believers in Kubernetes and containers as the architecture to drive that modernization. The announcement on Monday was, we're working closer with Red Hat to take all of our data services as part of Cloud Private for Data, which are basically microservice for data, and we're running those on OpenShift, and we're starting to see great customer traction with that. And where does Hortonworks come in? Hadoop has been the outlier on moving to microservices containers, we're working with Hortonworks to help them make that move as well. So, it's really about the three of us getting together and helping clients with this modernization journey. >> So, just to remind people, you remember ODPI, folks? It was all this kerfuffle about, why do we even need this? Well, what's interesting to me about this triumvirate is, well, first of all, Red Hat and Hortonworks are hardcore opensource, IBM's always been a big supporter of open source. You three got together and you're proving now the productivity for customers of this relationship. You guys don't talk about this, but Hortonworks had to, when it's public call, that the relationship with IBM drove many, many seven-figure deals, which, obviously means that customers are getting value out of this, so it's great to see that come to fruition, and it wasn't just a Barney announcement a couple years ago, so congratulations on that. Now, there's this other news that you guys announced this morning, talk about that. >> Yeah, two other things. One is, we announced a relationship with Stack Overflow. 50 million developers go to Stack Overflow a month, it's an amazing environment for developers that are looking to do new things, and we're sponsoring a community around AI. Back to your point before, you said, is there a skills gap in enterprises, there absolutely is, I don't think that's a surprise. Data science, AI developers, not every company has the skills they need, so we're sponsoring a community to help drive the growth of skills in and around data science and AI. So things like Python, R, Scala, these are the languages of data science, and it's a great relationship with us and Stack Overflow to build a community to get things going on skills. >> Okay, and then there was one more. >> Last one's a product announcement. This is one of the most interesting product annoucements we've had in quite a while. Imagine this, you write a sequel query, and traditional approach is, I've got a server, I point it as that server, I get the data, it's pretty limited. We're announcing technology where I write a query, and it can find data anywhere in the world. I think of it as wide-area sequel. So it can find data on an automotive device, a telematics device, an IoT device, it could be a mobile device, we think of it as sequel the whole world. You write a query, you can find the data anywhere it is, and we take advantage of the processing power on the edge. The biggest problem with IoT is, it's been the old mantra of, go find the data, bring it all back to a centralized warehouse, that makes it impossible to do it real time. We're enabling real time because we can write a query once, find data anywhere, this is technology we've had in preview for the last year. We've been working with a lot of clients to prove out used cases to do it, we're integrating as the capability inside of IBM Cloud Private for Data. So if you buy IBM Cloud for Data, it's there. >> Interesting, so when you've been around as long as I have, long enough to see some of the pendulums swings, and it's clearly a pendulum swing back toward decentralization in the edge, but the key is, from what you just described, is you're sort of redefining the boundary, so I presume it's the edge, any Cloud, or on premises, where you can find that data, is that correct? >> Yeah, so it's multi-Cloud. I mean, look, every organization is going to be multi-Cloud, like 100%, that's going to happen, and that could be private, it could be multiple public Cloud providers, but the key point is, data on the edge is not just limited to what's in those Clouds. It could be anywhere that you're collecting data. And, we're enabling an architecture which performs incredibly well, because you take advantage of processing power on the edge, where you can get data anywhere that it sits. >> Okay, so, then, I'm setting up a Cloud, I'll call it a Cloud architecture, that encompasses the edge, where essentially, there are no boundaries, and you're bringing security. We talked about containers before, we've been talking about Kubernetes all week here at a Big Data show. And then of course, Cloud, and what's interesting, I think many of the Hadoop distral vendors kind of missed Cloud early on, and then now are sort of saying, oh wow, it's a hybrid world and we've got a part, you guys obviously made some moves, a couple billion dollar moves, to do some acquisitions and get hardcore into Cloud, so that becomes a critical component. You're not just limiting your scope to the IBM Cloud. You're recognizing that it's a multi-Cloud world, that' what customers want to do. Your comments. >> It's multi-Cloud, and it's not just the IBM Cloud, I think the most predominant Cloud that's emerging is every client's private Cloud. Every client I talk to is building out a containerized architecture. They need their own Cloud, and they need seamless connectivity to any public Cloud that they may be using. This is why you see such a premium being put on things like data ingestion, data curation. It's not popular, it's not exciting, people don't want to talk about it, but we're the biggest inhibitors, to this AI point, comes back to data curation, data ingestion, because if you're dealing with multiple Clouds, suddenly your data's in a bunch of different spots. >> Well, so you're basically, and we talked about this a lot on The Cube, you're bringing the Cloud model to the data, wherever the data lives. Is that the right way to think about it? >> I think organizations have spoken, set aside what they say, look at their actions. Their actions say, we don't want to move all of our data to any particular Cloud, we'll move some of our data. We need to give them seamless connectivity so that they can leave their data where they want, we can bring Cloud-Native Architecture to their data, we could also help move their data to a Cloud-Native architecture if that's what they prefer. >> Well, it makes sense, because you've got physics, latency, you've got economics, moving all the data into a public Cloud is expensive and just doesn't make economic sense, and then you've got things like GDPR, which says, well, you have to keep the data, certain laws of the land, if you will, that say, you've got to keep the data in whatever it is, in Germany, or whatever country. So those sort of edicts dictate how you approach managing workloads and what you put where, right? Okay, what's going on with Watson? Give us the update there. >> I get a lot of questions, people trying to peel back the onion of what exactly is it? So, I want to make that super clear here. Watson is a few things, start at the bottom. You need a runtime for models that you've built. So we have a product called Watson Machine Learning, runs anywhere you want, that is the runtime for how you execute models that you've built. Anytime you have a runtime, you need somewhere where you can build models, you need a development environment. That is called Watson Studio. So, we had a product called Data Science Experience, we've evolved that into Watson Studio, connecting in some of those features. So we have Watson Studio, that's the development environment, Watson Machine Learning, that's the runtime. Now you move further up the stack. We have a set of APIs that bring in human features, vision, natural language processing, audio analytics, those types of things. You can integrate those as part of a model that you build. And then on top of that, we've got things like Watson Applications, we've got Watson for call centers, doing customer service and chatbots, and then we've got a lot of clients who've taken pieces of that stack and built their own AI solutions. They've taken some of the APIs, they've taken some of the design time, the studio, they've taken some of the Watson Machine Learning. So, it is really a stack of capabilities, and where we're driving the greatest productivity, this is in a lot of the examples you'll see tonight for clients, is clients that have bought into this idea of, I need a development environment, I need a runtime, where I can deploy models anywhere. We're getting a lot of momentum on that, and then that raises the question of, well, do I have expandability, do I have trust in transparency, and that's another thing that we're working on. >> Okay, so there's API oriented architecture, exposing all these services make it very easy for people to consume. Okay, so we've been talking all week at Cube NYC, is Big Data is in AI, is this old wine, new bottle? I mean, it's clear, Rob, from the conversation here, there's a lot of substantive innovation, and early adoption, anyway, of some of these innovations, but a lot of potential going forward. Last thoughts? >> What people have to realize is AI is not magic, it's still computer science. So it actually requires some hard work. You need to roll up your sleeves, you need to understand how I get from point A to point B, you need a development environment, you need a runtime. I want people to really think about this, it's not magic. I think for a while, people have gotten the impression that there's some magic button. There's not, but if you put in the time, and it's not a lot of time, you'll see the examples tonight, most of them have been done in one or two months, there's great business value in starting to leverage AI in your business. >> Awesome, alright, so if you're in this city or you're at Strata, go to ibm.com/WinWithAI, register for the event tonight. Rob, we'll see you there, thanks so much for coming back. >> Yeah, it's going to be fun, thanks Dave, great to see you. >> Alright, keep it right there everybody, we'll be back with our next guest right after this short break, you're watching The Cube.

Published Date : Sep 13 2018

SUMMARY :

brought to you by IBM. Rob, great to see you. what you guys have going on, it's great when you have on the phases, the waves that we've seen where you want to go, you're the BI data warehouse modernization, a data catalog, if you and get the infrastructure right with, and help them get to a first and I think we have a as the architecture to news that you guys announced that are looking to do new things, I point it as that server, I get the data, of processing power on the the edge, where essentially, it's not just the IBM Cloud, Is that the right way to think about it? We need to give them seamless connectivity certain laws of the land, that is the runtime for people to consume. and it's not a lot of time, register for the event tonight. Yeah, it's going to be fun, we'll be back with our next guest

ENTITIES

Entity	Category	Confidence
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
John Thomas	PERSON	0.99+
two months	QUANTITY	0.99+
six months	QUANTITY	0.99+
six months	QUANTITY	0.99+
Rob	PERSON	0.99+
Rob Thomas	PERSON	0.99+
Monday	DATE	0.99+
last year	DATE	0.99+
one month	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Germany	LOCATION	0.99+
New York City	LOCATION	0.99+
one	QUANTITY	0.99+
Vince Lombardi	PERSON	0.99+
GDPR	TITLE	0.99+
three people	QUANTITY	0.99+
Watson Studio	TITLE	0.99+
Cube	ORGANIZATION	0.99+
ibm.com/WinWithAI	OTHER	0.99+
two	QUANTITY	0.99+
Times Square	LOCATION	0.99+
both	QUANTITY	0.99+
tonight	DATE	0.99+
First	QUANTITY	0.99+
today	DATE	0.98+
Data Science Elite	ORGANIZATION	0.98+
The Cube	TITLE	0.98+
two steps	QUANTITY	0.98+
Scala	TITLE	0.98+
Python	TITLE	0.98+
One	QUANTITY	0.98+
three	QUANTITY	0.98+
Barney	ORGANIZATION	0.98+
Javits Center	LOCATION	0.98+
Watson	TITLE	0.98+
This evening	DATE	0.98+
IBM Analytics	ORGANIZATION	0.97+
one step	QUANTITY	0.97+
Stack Overflow	ORGANIZATION	0.96+
Cloud	TITLE	0.96+
seven-figure deals	QUANTITY	0.96+
Terminal 5	LOCATION	0.96+
Watson Applications	TITLE	0.95+
Watson Machine Learning	TITLE	0.94+
a month	QUANTITY	0.94+
50 million developers	QUANTITY	0.92+

Data Science for All: It's a Whole New Game

>> There's a movement that's sweeping across businesses everywhere here in this country and around the world. And it's all about data. Today businesses are being inundated with data. To the tune of over two and a half million gigabytes that'll be generated in the next 60 seconds alone. What do you do with all that data? To extract insights you typically turn to a data scientist. But not necessarily anymore. At least not exclusively. Today the ability to extract value from data is becoming a shared mission. A team effort that spans the organization extending far more widely than ever before. Today, data science is being democratized. >> Data Sciences for All: It's a Whole New Game. >> Welcome everyone, I'm Katie Linendoll. I'm a technology expert writer and I love reporting on all things tech. My fascination with tech started very young. I began coding when I was 12. Received my networking certs by 18 and a degree in IT and new media from Rochester Institute of Technology. So as you can tell, technology has always been a sure passion of mine. Having grown up in the digital age, I love having a career that keeps me at the forefront of science and technology innovations. I spend equal time in the field being hands on as I do on my laptop conducting in depth research. Whether I'm diving underwater with NASA astronauts, witnessing the new ways which mobile technology can help rebuild the Philippine's economy in the wake of super typhoons, or sharing a first look at the newest iPhones on The Today Show, yesterday, I'm always on the hunt for the latest and greatest tech stories. And that's what brought me here. I'll be your host for the next hour and as we explore the new phenomenon that is taking businesses around the world by storm. And data science continues to become democratized and extends beyond the domain of the data scientist. And why there's also a mandate for all of us to become data literate. Now that data science for all drives our AI culture. And we're going to be able to take to the streets and go behind the scenes as we uncover the factors that are fueling this phenomenon and giving rise to a movement that is reshaping how businesses leverage data. And putting organizations on the road to AI. So coming up, I'll be doing interviews with data scientists. We'll see real world demos and take a look at how IBM is changing the game with an open data science platform. We'll also be joined by legendary statistician Nate Silver, founder and editor-in-chief of FiveThirtyEight. Who will shed light on how a data driven mindset is changing everything from business to our culture. We also have a few people who are joining us in our studio, so thank you guys for joining us. Come on, I can do better than that, right? Live studio audience, the fun stuff. And for all of you during the program, I want to remind you to join that conversation on social media using the hashtag DSforAll, it's data science for all. Share your thoughts on what data science and AI means to you and your business. And, let's dive into a whole new game of data science. Now I'd like to welcome my co-host General Manager IBM Analytics, Rob Thomas. >> Hello, Katie. >> Come on guys. >> Yeah, seriously. >> No one's allowed to be quiet during this show, okay? >> Right. >> Or, I'll start calling people out. So Rob, thank you so much. I think you know this conversation, we're calling it a data explosion happening right now. And it's nothing new. And when you and I chatted about it. You've been talking about this for years. You have to ask, is this old news at this point? >> Yeah, I mean, well first of all, the data explosion is not coming, it's here. And everybody's in the middle of it right now. What is different is the economics have changed. And the scale and complexity of the data that organizations are having to deal with has changed. And to this day, 80% of the data in the world still sits behind corporate firewalls. So, that's becoming a problem. It's becoming unmanageable. IT struggles to manage it. The business can't get everything they need. Consumers can't consume it when they want. So we have a challenge here. >> It's challenging in the world of unmanageable. Crazy complexity. If I'm sitting here as an IT manager of my business, I'm probably thinking to myself, this is incredibly frustrating. How in the world am I going to get control of all this data? And probably not just me thinking it. Many individuals here as well. >> Yeah, indeed. Everybody's thinking about how am I going to put data to work in my organization in a way I haven't done before. Look, you've got to have the right expertise, the right tools. The other thing that's happening in the market right now is clients are dealing with multi cloud environments. So data behind the firewall in private cloud, multiple public clouds. And they have to find a way. How am I going to pull meaning out of this data? And that brings us to data science and AI. That's how you get there. >> I understand the data science part but I think we're all starting to hear more about AI. And it's incredible that this buzz word is happening. How do businesses adopt to this AI growth and boom and trend that's happening in this world right now? >> Well, let me define it this way. Data science is a discipline. And machine learning is one technique. And then AI puts both machine learning into practice and applies it to the business. So this is really about how getting your business where it needs to go. And to get to an AI future, you have to lay a data foundation today. I love the phrase, "there's no AI without IA." That means you're not going to get to AI unless you have the right information architecture to start with. >> Can you elaborate though in terms of how businesses can really adopt AI and get started. >> Look, I think there's four things you have to do if you're serious about AI. One is you need a strategy for data acquisition. Two is you need a modern data architecture. Three is you need pervasive automation. And four is you got to expand job roles in the organization. >> Data acquisition. First pillar in this you just discussed. Can we start there and explain why it's so critical in this process? >> Yeah, so let's think about how data acquisition has evolved through the years. 15 years ago, data acquisition was about how do I get data in and out of my ERP system? And that was pretty much solved. Then the mobile revolution happens. And suddenly you've got structured and non-structured data. More than you've ever dealt with. And now you get to where we are today. You're talking terabytes, petabytes of data. >> [Katie] Yottabytes, I heard that word the other day. >> I heard that too. >> Didn't even know what it meant. >> You know how many zeros that is? >> I thought we were in Star Wars. >> Yeah, I think it's a lot of zeroes. >> Yodabytes, it's new. >> So, it's becoming more and more complex in terms of how you acquire data. So that's the new data landscape that every client is dealing with. And if you don't have a strategy for how you acquire that and manage it, you're not going to get to that AI future. >> So a natural segue, if you are one of these businesses, how do you build for the data landscape? >> Yeah, so the question I always hear from customers is we need to evolve our data architecture to be ready for AI. And the way I think about that is it's really about moving from static data repositories to more of a fluid data layer. >> And we continue with the architecture. New data architecture is an interesting buzz word to hear. But it's also one of the four pillars. So if you could dive in there. >> Yeah, I mean it's a new twist on what I would call some core data science concepts. For example, you have to leverage tools with a modern, centralized data warehouse. But your data warehouse can't be stagnant to just what's right there. So you need a way to federate data across different environments. You need to be able to bring your analytics to the data because it's most efficient that way. And ultimately, it's about building an optimized data platform that is designed for data science and AI. Which means it has to be a lot more flexible than what clients have had in the past. >> All right. So we've laid out what you need for driving automation. But where does the machine learning kick in? >> Machine learning is what gives you the ability to automate tasks. And I think about machine learning. It's about predicting and automating. And this will really change the roles of data professionals and IT professionals. For example, a data scientist cannot possibly know every algorithm or every model that they could use. So we can automate the process of algorithm selection. Another example is things like automated data matching. Or metadata creation. Some of these things may not be exciting but they're hugely practical. And so when you think about the real use cases that are driving return on investment today, it's things like that. It's automating the mundane tasks. >> Let's go ahead and come back to something that you mentioned earlier because it's fascinating to be talking about this AI journey, but also significant is the new job roles. And what are those other participants in the analytics pipeline? >> Yeah I think we're just at the start of this idea of new job roles. We have data scientists. We have data engineers. Now you see machine learning engineers. Application developers. What's really happening is that data scientists are no longer allowed to work in their own silo. And so the new job roles is about how does everybody have data first in their mind? And then they're using tools to automate data science, to automate building machine learning into applications. So roles are going to change dramatically in organizations. >> I think that's confusing though because we have several organizations who saying is that highly specialized roles, just for data science? Or is it applicable to everybody across the board? >> Yeah, and that's the big question, right? Cause everybody's thinking how will this apply? Do I want this to be just a small set of people in the organization that will do this? But, our view is data science has to for everybody. It's about bring data science to everybody as a shared mission across the organization. Everybody in the company has to be data literate. And participate in this journey. >> So overall, group effort, has to be a common goal, and we all need to be data literate across the board. >> Absolutely. >> Done deal. But at the end of the day, it's kind of not an easy task. >> It's not. It's not easy but it's maybe not as big of a shift as you would think. Because you have to put data in the hands of people that can do something with it. So, it's very basic. Give access to data. Data's often locked up in a lot of organizations today. Give people the right tools. Embrace the idea of choice or diversity in terms of those tools. That gets you started on this path. >> It's interesting to hear you say essentially you need to train everyone though across the board when it comes to data literacy. And I think people that are coming into the work force don't necessarily have a background or a degree in data science. So how do you manage? >> Yeah, so in many cases that's true. I will tell you some universities are doing amazing work here. One example, University of California Berkeley. They offer a course for all majors. So no matter what you're majoring in, you have a course on foundations of data science. How do you bring data science to every role? So it's starting to happen. We at IBM provide data science courses through CognitiveClass.ai. It's for everybody. It's free. And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. The key point is this though. It's more about attitude than it is aptitude. I think anybody can figure this out. But it's about the attitude to say we're putting data first and we're going to figure out how to make this real in our organization. >> I also have to give a shout out to my alma mater because I have heard that there is an offering in MS in data analytics. And they are always on the forefront of new technologies and new majors and on trend. And I've heard that the placement behind those jobs, people graduating with the MS is high. >> I'm sure it's very high. >> So go Tigers. All right, tangential. Let me get back to something else you touched on earlier because you mentioned that a number of customers ask you how in the world do I get started with AI? It's an overwhelming question. Where do you even begin? What do you tell them? >> Yeah, well things are moving really fast. But the good thing is most organizations I see, they're already on the path, even if they don't know it. They might have a BI practice in place. They've got data warehouses. They've got data lakes. Let me give you an example. AMC Networks. They produce a lot of the shows that I'm sure you watch Katie. >> [Katie] Yes, Breaking Bad, Walking Dead, any fans? >> [Rob] Yeah, we've got a few. >> [Katie] Well you taught me something I didn't even know. Because it's amazing how we have all these different industries, but yet media in itself is impacted too. And this is a good example. >> Absolutely. So, AMC Networks, think about it. They've got ads to place. They want to track viewer behavior. What do people like? What do they dislike? So they have to optimize every aspect of their business from marketing campaigns to promotions to scheduling to ads. And their goal was transform data into business insights and really take the burden off of their IT team that was heavily burdened by obviously a huge increase in data. So their VP of BI took the approach of using machine learning to process large volumes of data. They used a platform that was designed for AI and data processing. It's the IBM analytics system where it's a data warehouse, data science tools are built in. It has in memory data processing. And just like that, they were ready for AI. And they're already seeing that impact in their business. >> Do you think a movement of that nature kind of presses other media conglomerates and organizations to say we need to be doing this too? >> I think it's inevitable that everybody, you're either going to be playing, you're either going to be leading, or you'll be playing catch up. And so, as we talk to clients we think about how do you start down this path now, even if you have to iterate over time? Because otherwise you're going to wake up and you're going to be behind. >> One thing worth noting is we've talked about analytics to the data. It's analytics first to the data, not the other way around. >> Right. So, look. We as a practice, we say you want to bring data to where the data sits. Because it's a lot more efficient that way. It gets you better outcomes in terms of how you train models and it's more efficient. And we think that leads to better outcomes. Other organization will say, "Hey move the data around." And everything becomes a big data movement exercise. But once an organization has started down this path, they're starting to get predictions, they want to do it where it's really easy. And that means analytics applied right where the data sits. >> And worth talking about the role of the data scientist in all of this. It's been called the hot job of the decade. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. >> Yes. >> I want to see this on the cover of Vogue. Like I want to see the first data scientist. Female preferred, on the cover of Vogue. That would be amazing. >> Perhaps you can. >> People agree. So what changes for them? Is this challenging in terms of we talk data science for all. Where do all the data science, is it data science for everyone? And how does it change everything? >> Well, I think of it this way. AI gives software super powers. It really does. It changes the nature of software. And at the center of that is data scientists. So, a data scientist has a set of powers that they've never had before in any organization. And that's why it's a hot profession. Now, on one hand, this has been around for a while. We've had actuaries. We've had statisticians that have really transformed industries. But there are a few things that are new now. We have new tools. New languages. Broader recognition of this need. And while it's important to recognize this critical skill set, you can't just limit it to a few people. This is about scaling it across the organization. And truly making it accessible to all. >> So then do we need more data scientists? Or is this something you train like you said, across the board? >> Well, I think you want to do a little bit of both. We want more. But, we can also train more and make the ones we have more productive. The way I think about it is there's kind of two markets here. And we call it clickers and coders. >> [Katie] I like that. That's good. >> So, let's talk about what that means. So clickers are basically somebody that wants to use tools. Create models visually. It's drag and drop. Something that's very intuitive. Those are the clickers. Nothing wrong with that. It's been valuable for years. There's a new crop of data scientists. They want to code. They want to build with the latest open source tools. They want to write in Python or R. These are the coders. And both approaches are viable. Both approaches are critical. Organizations have to have a way to meet the needs of both of those types. And there's not a lot of things available today that do that. >> Well let's keep going on that. Because I hear you talking about the data scientists role and how it's critical to success, but with the new tools, data science and analytics skills can extend beyond the domain of just the data scientist. >> That's right. So look, we're unifying coders and clickers into a single platform, which we call IBM Data Science Experience. And as the demand for data science expertise grows, so does the need for these kind of tools. To bring them into the same environment. And my view is if you have the right platform, it enables the organization to collaborate. And suddenly you've changed the nature of data science from an individual sport to a team sport. >> So as somebody that, my background is in IT, the question is really is this an additional piece of what IT needs to do in 2017 and beyond? Or is it just another line item to the budget? >> So I'm afraid that some people might view it that way. As just another line item. But, I would challenge that and say data science is going to reinvent IT. It's going to change the nature of IT. And every organization needs to think about what are the skills that are critical? How do we engage a broader team to do this? Because once they get there, this is the chance to reinvent how they're performing IT. >> [Katie] Challenging or not? >> Look it's all a big challenge. Think about everything IT organizations have been through. Some of them were late to things like mobile, but then they caught up. Some were late to cloud, but then they caught up. I would just urge people, don't be late to data science. Use this as your chance to reinvent IT. Start with this notion of clickers and coders. This is a seminal moment. Much like mobile and cloud was. So don't be late. >> And I think it's critical because it could be so costly to wait. And Rob and I were even chatting earlier how data analytics is just moving into all different kinds of industries. And I can tell you even personally being effected by how important the analysis is in working in pediatric cancer for the last seven years. I personally implement virtual reality headsets to pediatric cancer hospitals across the country. And it's great. And it's working phenomenally. And the kids are amazed. And the staff is amazed. But the phase two of this project is putting in little metrics in the hardware that gather the breathing, the heart rate to show that we have data. Proof that we can hand over to the hospitals to continue making this program a success. So just in-- >> That's a great example. >> An interesting example. >> Saving lives? >> Yes. >> That's also applying a lot of what we talked about. >> Exciting stuff in the world of data science. >> Yes. Look, I just add this is an existential moment for every organization. Because what you do in this area is probably going to define how competitive you are going forward. And think about if you don't do something. What if one of your competitors goes and creates an application that's more engaging with clients? So my recommendation is start small. Experiment. Learn. Iterate on projects. Define the business outcomes. Then scale up. It's very doable. But you've got to take the first step. >> First step always critical. And now we're going to get to the fun hands on part of our story. Because in just a moment we're going to take a closer look at what data science can deliver. And where organizations are trying to get to. All right. Thank you Rob and now we've been joined by Siva Anne who is going to help us navigate this demo. First, welcome Siva. Give him a big round of applause. Yeah. All right, Rob break down what we're going to be looking at. You take over this demo. >> All right. So this is going to be pretty interesting. So Siva is going to take us through. So he's going to play the role of a financial adviser. Who wants to help better serve clients through recommendations. And I'm going to really illustrate three things. One is how do you federate data from multiple data sources? Inside the firewall, outside the firewall. How do you apply machine learning to predict and to automate? And then how do you move analytics closer to your data? So, what you're seeing here is a custom application for an investment firm. So, Siva, our financial adviser, welcome. So you can see at the top, we've got market data. We pulled that from an external source. And then we've got Siva's calendar in the middle. He's got clients on the right side. So page down, what else do you see down there Siva? >> [Siva] I can see the recent market news. And in here I can see that JP Morgan is calling for a US dollar rebound in the second half of the year. And, I have upcoming meeting with Leo Rakes. I can get-- >> [Rob] So let's go in there. Why don't you click on Leo Rakes. So, you're sitting at your desk, you're deciding how you're going to spend the day. You know you have a meeting with Leo. So you click on it. You immediately see, all right, so what do we know about him? We've got data governance implemented. So we know his age, we know his degree. We can see he's not that aggressive of a trader. Only six trades in the last few years. But then where it gets interesting is you go to the bottom. You start to see predicted industry affinity. Where did that come from? How do we have that? >> [Siva] So these green lines and red arrows here indicate the trending affinity of Leo Rakes for particular industry stocks. What we've done here is we've built machine learning models using customer's demographic data, his stock portfolios, and browsing behavior to build a model which can predict his affinity for a particular industry. >> [Rob] Interesting. So, I like to think of this, we call it celebrity experiences. So how do you treat every customer like they're a celebrity? So to some extent, we're reading his mind. Because without asking him, we know that he's going to have an affinity for auto stocks. So we go down. Now we look at his portfolio. You can see okay, he's got some different holdings. He's got Amazon, Google, Apple, and then he's got RACE, which is the ticker for Ferrari. You can see that's done incredibly well. And so, as a financial adviser, you look at this and you say, all right, we know he loves auto stocks. Ferrari's done very well. Let's create a hedge. Like what kind of security would interest him as a hedge against his position for Ferrari? Could we go figure that out? >> [Siva] Yes. Given I know that he's gotten an affinity for auto stocks, and I also see that Ferrari has got some terminus gains, I want to lock in these gains by hedging. And I want to do that by picking a auto stock which has got negative correlation with Ferrari. >> [Rob] So this is where we get to the idea of in database analytics. Cause you start clicking that and immediately we're getting instant answers of what's happening. So what did we find here? We're going to compare Ferrari and Honda. >> [Siva] I'm going to compare Ferrari with Honda. And what I see here instantly is that Honda has got a negative correlation with Ferrari, which makes it a perfect mix for his stock portfolio. Given he has an affinity for auto stocks and it correlates negatively with Ferrari. >> [Rob] These are very powerful tools at the hand of a financial adviser. You think about it. As a financial adviser, you wouldn't think about federating data, machine learning, pretty powerful. >> [Siva] Yes. So what we have seen here is that using the common SQL engine, we've been able to federate queries across multiple data sources. Db2 Warehouse in the cloud, IBM's Integrated Analytic System, and Hortonworks powered Hadoop platform for the new speeds. We've been able to use machine learning to derive innovative insights about his stock affinities. And drive the machine learning into the appliance. Closer to where the data resides to deliver high performance analytics. >> [Rob] At scale? >> [Siva] We're able to run millions of these correlations across stocks, currency, other factors. And even score hundreds of customers for their affinities on a daily basis. >> That's great. Siva, thank you for playing the role of financial adviser. So I just want to recap briefly. Cause this really powerful technology that's really simple. So we federated, we aggregated multiple data sources from all over the web and internal systems. And public cloud systems. Machine learning models were built that predicted Leo's affinity for a certain industry. In this case, automotive. And then you see when you deploy analytics next to your data, even a financial adviser, just with the click of a button is getting instant answers so they can go be more productive in their next meeting. This whole idea of celebrity experiences for your customer, that's available for everybody, if you take advantage of these types of capabilities. Katie, I'll hand it back to you. >> Good stuff. Thank you Rob. Thank you Siva. Powerful demonstration on what we've been talking about all afternoon. And thank you again to Siva for helping us navigate. Should be give him one more round of applause? We're going to be back in just a moment to look at how we operationalize all of this data. But in first, here's a message from me. If you're a part of a line of business, your main fear is disruption. You know data is the new goal that can create huge amounts of value. So does your competition. And they may be beating you to it. You're convinced there are new business models and revenue sources hidden in all the data. You just need to figure out how to leverage it. But with the scarcity of data scientists, you really can't rely solely on them. You may need more people throughout the organization that have the ability to extract value from data. And as a data science leader or data scientist, you have a lot of the same concerns. You spend way too much time looking for, prepping, and interpreting data and waiting for models to train. You know you need to operationalize the work you do to provide business value faster. What you want is an easier way to do data prep. And rapidly build models that can be easily deployed, monitored and automatically updated. So whether you're a data scientist, data science leader, or in a line of business, what's the solution? What'll it take to transform the way you work? That's what we're going to explore next. All right, now it's time to delve deeper into the nuts and bolts. The nitty gritty of operationalizing data science and creating a data driven culture. How do you actually do that? Well that's what these experts are here to share with us. I'm joined by Nir Kaldero, who's head of data science at Galvanize, which is an education and training organization. Tricia Wang, who is co-founder of Sudden Compass, a consultancy that helps companies understand people with data. And last, but certainly not least, Michael Li, founder and CEO of Data Incubator, which is a data science train company. All right guys. Shall we get right to it? >> All right. >> So data explosion happening right now. And we are seeing it across the board. I just shared an example of how it's impacting my philanthropic work in pediatric cancer. But you guys each have so many unique roles in your business life. How are you seeing it just blow up in your fields? Nir, your thing? >> Yeah, for example like in Galvanize we train many Fortune 500 companies. And just by looking at the demand of companies that wants us to help them go through this digital transformation is mind-blowing. Data point by itself. >> Okay. Well what we're seeing what's going on is that data science like as a theme, is that it's actually for everyone now. But what's happening is that it's actually meeting non technical people. But what we're seeing is that when non technical people are implementing these tools or coming at these tools without a base line of data literacy, they're often times using it in ways that distance themselves from the customer. Because they're implementing data science tools without a clear purpose, without a clear problem. And so what we do at Sudden Compass is that we work with companies to help them embrace and understand the complexity of their customers. Because often times they are misusing data science to try and flatten their understanding of the customer. As if you can just do more traditional marketing. Where you're putting people into boxes. And I think the whole ROI of data is that you can now understand people's relationships at a much more complex level at a greater scale before. But we have to do this with basic data literacy. And this has to involve technical and non technical people. >> Well you can have all the data in the world, and I think it speaks to, if you're not doing the proper movement with it, forget it. It means nothing at the same time. >> No absolutely. I mean, I think that when you look at the huge explosion in data, that comes with it a huge explosion in data experts. Right, we call them data scientists, data analysts. And sometimes they're people who are very, very talented, like the people here. But sometimes you have people who are maybe re-branding themselves, right? Trying to move up their title one notch to try to attract that higher salary. And I think that that's one of the things that customers are coming to us for, right? They're saying, hey look, there are a lot of people that call themselves data scientists, but we can't really distinguish. So, we have sort of run a fellowship where you help companies hire from a really talented group of folks, who are also truly data scientists and who know all those kind of really important data science tools. And we also help companies internally. Fortune 500 companies who are looking to grow that data science practice that they have. And we help clients like McKinsey, BCG, Bain, train up their customers, also their clients, also their workers to be more data talented. And to build up that data science capabilities. >> And Nir, this is something you work with a lot. A lot of Fortune 500 companies. And when we were speaking earlier, you were saying many of these companies can be in a panic. >> Yeah. >> Explain that. >> Yeah, so you know, not all Fortune 500 companies are fully data driven. And we know that the winners in this fourth industrial revolution, which I like to call the machine intelligence revolution, will be companies who navigate and transform their organization to unlock the power of data science and machine learning. And the companies that are not like that. Or not utilize data science and predictive power well, will pretty much get shredded. So they are in a panic. >> Tricia, companies have to deal with data behind the firewall and in the new multi cloud world. How do organizations start to become driven right to the core? >> I think the most urgent question to become data driven that companies should be asking is how do I bring the complex reality that our customers are experiencing on the ground in to a corporate office? Into the data models. So that question is critical because that's how you actually prevent any big data disasters. And that's how you leverage big data. Because when your data models are really far from your human models, that's when you're going to do things that are really far off from how, it's going to not feel right. That's when Tesco had their terrible big data disaster that they're still recovering from. And so that's why I think it's really important to understand that when you implement big data, you have to further embrace thick data. The qualitative, the emotional stuff, that is difficult to quantify. But then comes the difficult art and science that I think is the next level of data science. Which is that getting non technical and technical people together to ask how do we find those unknown nuggets of insights that are difficult to quantify? Then, how do we do the next step of figuring out how do you mathematically scale those insights into a data model? So that actually is reflective of human understanding? And then we can start making decisions at scale. But you have to have that first. >> That's absolutely right. And I think that when we think about what it means to be a data scientist, right? I always think about it in these sort of three pillars. You have the math side. You have to have that kind of stats, hardcore machine learning background. You have the programming side. You don't work with small amounts of data. You work with large amounts of data. You've got to be able to type the code to make those computers run. But then the last part is that human element. You have to understand the domain expertise. You have to understand what it is that I'm actually analyzing. What's the business proposition? And how are the clients, how are the users actually interacting with the system? That human element that you were talking about. And I think having somebody who understands all of those and not just in isolation, but is able to marry that understanding across those different topics, that's what makes a data scientist. >> But I find that we don't have people with those skill sets. And right now the way I see teams being set up inside companies is that they're creating these isolated data unicorns. These data scientists that have graduated from your programs, which are great. But, they don't involve the people who are the domain experts. They don't involve the designers, the consumer insight people, the people, the salespeople. The people who spend time with the customers day in and day out. Somehow they're left out of the room. They're consulted, but they're not a stakeholder. >> Can I actually >> Yeah, yeah please. >> Can I actually give a quick example? So for example, we at Galvanize train the executives and the managers. And then the technical people, the data scientists and the analysts. But in order to actually see all of the RY behind the data, you also have to have a creative fluid conversation between non technical and technical people. And this is a major trend now. And there's a major gap. And we need to increase awareness and kind of like create a new, kind of like environment where technical people also talks seamlessly with non technical ones. >> [Tricia] We call-- >> That's one of the things that we see a lot. Is one of the trends in-- >> A major trend. >> data science training is it's not just for the data science technical experts. It's not just for one type of person. So a lot of the training we do is sort of data engineers. People who are more on the software engineering side learning more about the stats of math. And then people who are sort of traditionally on the stat side learning more about the engineering. And then managers and people who are data analysts learning about both. >> Michael, I think you said something that was of interest too because I think we can look at IBM Watson as an example. And working in healthcare. The human component. Because often times we talk about machine learning and AI, and data and you get worried that you still need that human component. Especially in the world of healthcare. And I think that's a very strong point when it comes to the data analysis side. Is there any particular example you can speak to of that? >> So I think that there was this really excellent paper a while ago talking about all the neuro net stuff and trained on textual data. So looking at sort of different corpuses. And they found that these models were highly, highly sexist. They would read these corpuses and it's not because neuro nets themselves are sexist. It's because they're reading the things that we write. And it turns out that we write kind of sexist things. And they would sort of find all these patterns in there that were sort of latent, that had a lot of sort of things that maybe we would cringe at if we sort of saw. And I think that's one of the really important aspects of the human element, right? It's being able to come in and sort of say like, okay, I know what the biases of the system are, I know what the biases of the tools are. I need to figure out how to use that to make the tools, make the world a better place. And like another area where this comes up all the time is lending, right? So the federal government has said, and we have a lot of clients in the financial services space, so they're constantly under these kind of rules that they can't make discriminatory lending practices based on a whole set of protected categories. Race, sex, gender, things like that. But, it's very easy when you train a model on credit scores to pick that up. And then to have a model that's inadvertently sexist or racist. And that's where you need the human element to come back in and say okay, look, you're using the classic example would be zip code, you're using zip code as a variable. But when you look at it, zip codes actually highly correlated with race. And you can't do that. So you may inadvertently by sort of following the math and being a little naive about the problem, inadvertently introduce something really horrible into a model and that's where you need a human element to sort of step in and say, okay hold on. Slow things down. This isn't the right way to go. >> And the people who have -- >> I feel like, I can feel her ready to respond. >> Yes, I'm ready. >> She's like let me have at it. >> And the people here it is. And the people who are really great at providing that human intelligence are social scientists. We are trained to look for bias and to understand bias in data. Whether it's quantitative or qualitative. And I really think that we're going to have less of these kind of problems if we had more integrated teams. If it was a mandate from leadership to say no data science team should be without a social scientist, ethnographer, or qualitative researcher of some kind, to be able to help see these biases. >> The talent piece is actually the most crucial-- >> Yeah. >> one here. If you look about how to enable machine intelligence in organization there are the pillars that I have in my head which is the culture, the talent and the technology infrastructure. And I believe and I saw in working very closely with the Fortune 100 and 200 companies that the talent piece is actually the most important crucial hard to get. >> [Tricia] I totally agree. >> It's absolutely true. Yeah, no I mean I think that's sort of like how we came up with our business model. Companies were basically saying hey, I can't hire data scientists. And so we have a fellowship where we get 2,000 applicants each quarter. We take the top 2% and then we sort of train them up. And we work with hiring companies who then want to hire from that population. And so we're sort of helping them solve that problem. And the other half of it is really around training. Cause with a lot of industries, especially if you're sort of in a more regulated industry, there's a lot of nuances to what you're doing. And the fastest way to develop that data science or AI talent may not necessarily be to hire folks who are coming out of a PhD program. It may be to take folks internally who have a lot of that domain knowledge that you have and get them trained up on those data science techniques. So we've had large insurance companies come to us and say hey look, we hire three or four folks from you a quarter. That doesn't move the needle for us. What we really need is take the thousand actuaries and statisticians that we have and get all of them trained up to become a data scientist and become data literate in this new open source world. >> [Katie] Go ahead. >> All right, ladies first. >> Go ahead. >> Are you sure? >> No please, fight first. >> Go ahead. >> Go ahead Nir. >> So this is actually a trend that we have been seeing in the past year or so that companies kind of like start to look how to upscale and look for talent within the organization. So they can actually move them to become more literate and navigate 'em from analyst to data scientist. And from data scientist to machine learner. So this is actually a trend that is happening already for a year or so. >> Yeah, but I also find that after they've gone through that training in getting people skilled up in data science, the next problem that I get is executives coming to say we've invested in all of this. We're still not moving the needle. We've already invested in the right tools. We've gotten the right skills. We have enough scale of people who have these skills. Why are we not moving the needle? And what I explain to them is look, you're still making decisions in the same way. And you're still not involving enough of the non technical people. Especially from marketing, which is now, the CMO's are much more responsible for driving growth in their companies now. But often times it's so hard to change the old way of marketing, which is still like very segmentation. You know, demographic variable based, and we're trying to move people to say no, you have to understand the complexity of customers and not put them in boxes. >> And I think underlying a lot of this discussion is this question of culture, right? >> Yes. >> Absolutely. >> How do you build a data driven culture? And I think that that culture question, one of the ways that comes up quite often in especially in large, Fortune 500 enterprises, is that they are very, they're not very comfortable with sort of example, open source architecture. Open source tools. And there is some sort of residual bias that that's somehow dangerous. So security vulnerability. And I think that that's part of the cultural challenge that they often have in terms of how do I build a more data driven organization? Well a lot of the talent really wants to use these kind of tools. And I mean, just to give you an example, we are partnering with one of the major cloud providers to sort of help make open source tools more user friendly on their platform. So trying to help them attract the best technologists to use their platform because they want and they understand the value of having that kind of open source technology work seamlessly on their platforms. So I think that just sort of goes to show you how important open source is in this movement. And how much large companies and Fortune 500 companies and a lot of the ones we work with have to embrace that. >> Yeah, and I'm seeing it in our work. Even when we're working with Fortune 500 companies, is that they've already gone through the first phase of data science work. Where I explain it was all about the tools and getting the right tools and architecture in place. And then companies started moving into getting the right skill set in place. Getting the right talent. And what you're talking about with culture is really where I think we're talking about the third phase of data science, which is looking at communication of these technical frameworks so that we can get non technical people really comfortable in the same room with data scientists. That is going to be the phase, that's really where I see the pain point. And that's why at Sudden Compass, we're really dedicated to working with each other to figure out how do we solve this problem now? >> And I think that communication between the technical stakeholders and management and leadership. That's a very critical piece of this. You can't have a successful data science organization without that. >> Absolutely. >> And I think that actually some of the most popular trainings we've had recently are from managers and executives who are looking to say, how do I become more data savvy? How do I figure out what is this data science thing and how do I communicate with my data scientists? >> You guys made this way too easy. I was just going to get some popcorn and watch it play out. >> Nir, last 30 seconds. I want to leave you with an opportunity to, anything you want to add to this conversation? >> I think one thing to conclude is to say that companies that are not data driven is about time to hit refresh and figure how they transition the organization to become data driven. To become agile and nimble so they can actually see what opportunities from this important industrial revolution. Otherwise, unfortunately they will have hard time to survive. >> [Katie] All agreed? >> [Tricia] Absolutely, you're right. >> Michael, Trish, Nir, thank you so much. Fascinating discussion. And thank you guys again for joining us. We will be right back with another great demo. Right after this. >> Thank you Katie. >> Once again, thank you for an excellent discussion. Weren't they great guys? And thank you for everyone who's tuning in on the live webcast. As you can hear, we have an amazing studio audience here. And we're going to keep things moving. I'm now joined by Daniel Hernandez and Siva Anne. And we're going to turn our attention to how you can deliver on what they're talking about using data science experience to do data science faster. >> Thank you Katie. Siva and I are going to spend the next 10 minutes showing you how you can deliver on what they were saying using the IBM Data Science Experience to do data science faster. We'll demonstrate through new features we introduced this week how teams can work together more effectively across the entire analytics life cycle. How you can take advantage of any and all data no matter where it is and what it is. How you could use your favorite tools from open source. And finally how you could build models anywhere and employ them close to where your data is. Remember the financial adviser app Rob showed you? To build an app like that, we needed a team of data scientists, developers, data engineers, and IT staff to collaborate. We do this in the Data Science Experience through a concept we call projects. When I create a new project, I can now use the new Github integration feature. We're doing for data science what we've been doing for developers for years. Distributed teams can work together on analytics projects. And take advantage of Github's version management and change management features. This is a huge deal. Let's explore the project we created for the financial adviser app. As you can see, our data engineer Joane, our developer Rob, and others are collaborating this project. Joane got things started by bringing together the trusted data sources we need to build the app. Taking a closer look at the data, we see that our customer and profile data is stored on our recently announced IBM Integrated Analytics System, which runs safely behind our firewall. We also needed macro economic data, which she was able to find in the Federal Reserve. And she stored it in our Db2 Warehouse on Cloud. And finally, she selected stock news data from NASDAQ.com and landed that in a Hadoop cluster, which happens to be powered by Hortonworks. We added a new feature to the Data Science Experience so that when it's installed with Hortonworks, it automatically uses a need of security and governance controls within the cluster so your data is always secure and safe. Now we want to show you the news data we stored in the Hortonworks cluster. This is the mean administrative console. It's powered by an open source project called Ambari. And here's the news data. It's in parquet files stored in HDFS, which happens to be a distributive file system. To get the data from NASDAQ into our cluster, we used IBM's BigIntegrate and BigQuality to create automatic data pipelines that acquire, cleanse, and ingest that news data. Once the data's available, we use IBM's Big SQL to query that data using SQL statements that are much like the ones we would use for any relation of data, including the data that we have in the Integrated Analytics System and Db2 Warehouse on Cloud. This and the federation capabilities that Big SQL offers dramatically simplifies data acquisition. Now we want to show you how we support a brand new tool that we're excited about. Since we launched last summer, the Data Science Experience has supported Jupyter and R for data analysis and visualization. In this week's update, we deeply integrated another great open source project called Apache Zeppelin. It's known for having great visualization support, advanced collaboration features, and is growing in popularity amongst the data science community. This is an example of Apache Zeppelin and the notebook we created through it to explore some of our data. Notice how wonderful and easy the data visualizations are. Now we want to walk you through the Jupyter notebook we created to explore our customer preference for stocks. We use notebooks to understand and explore data. To identify the features that have some predictive power. Ultimately, we're trying to assess what ultimately is driving customer stock preference. Here we did the analysis to identify the attributes of customers that are likely to purchase auto stocks. We used this understanding to build our machine learning model. For building machine learning models, we've always had tools integrated into the Data Science Experience. But sometimes you need to use tools you already invested in. Like our very own SPSS as well as SAS. Through new import feature, you can easily import those models created with those tools. This helps you avoid vendor lock-in, and simplify the development, training, deployment, and management of all your models. To build the models we used in app, we could have coded, but we prefer a visual experience. We used our customer profile data in the Integrated Analytic System. Used the Auto Data Preparation to cleanse our data. Choose the binary classification algorithms. Let the Data Science Experience evaluate between logistic regression and gradient boosted tree. It's doing the heavy work for us. As you can see here, the Data Science Experience generated performance metrics that show us that the gradient boosted tree is the best performing algorithm for the data we gave it. Once we save this model, it's automatically deployed and available for developers to use. Any application developer can take this endpoint and consume it like they would any other API inside of the apps they built. We've made training and creating machine learning models super simple. But what about the operations? A lot of companies are struggling to ensure their model performance remains high over time. In our financial adviser app, we know that customer data changes constantly, so we need to always monitor model performance and ensure that our models are retrained as is necessary. This is a dashboard that shows the performance of our models and lets our teams monitor and retrain those models so that they're always performing to our standards. So far we've been showing you the Data Science Experience available behind the firewall that we're using to build and train models. Through a new publish feature, you can build models and deploy them anywhere. In another environment, private, public, or anywhere else with just a few clicks. So here we're publishing our model to the Watson machine learning service. It happens to be in the IBM cloud. And also deeply integrated with our Data Science Experience. After publishing and switching to the Watson machine learning service, you can see that our stock affinity and model that we just published is there and ready for use. So this is incredibly important. I just want to say it again. The Data Science Experience allows you to train models behind your own firewall, take advantage of your proprietary and sensitive data, and then deploy those models wherever you want with ease. So summarize what we just showed you. First, IBM's Data Science Experience supports all teams. You saw how our data engineer populated our project with trusted data sets. Our data scientists developed, trained, and tested a machine learning model. Our developers used APIs to integrate machine learning into their apps. And how IT can use our Integrated Model Management dashboard to monitor and manage model performance. Second, we support all data. On premises, in the cloud, structured, unstructured, inside of your firewall, and outside of it. We help you bring analytics and governance to where your data is. Third, we support all tools. The data science tools that you depend on are readily available and deeply integrated. This includes capabilities from great partners like Hortonworks. And powerful tools like our very own IBM SPSS. And fourth, and finally, we support all deployments. You can build your models anywhere, and deploy them right next to where your data is. Whether that's in the public cloud, private cloud, or even on the world's most reliable transaction platform, IBM z. So see for yourself. Go to the Data Science Experience website, take us for a spin. And if you happen to be ready right now, our recently created Data Science Elite Team can help you get started and run experiments alongside you with no charge. Thank you very much. >> Thank you very much Daniel. It seems like a great time to get started. And thanks to Siva for taking us through it. Rob and I will be back in just a moment to add some perspective right after this. All right, once again joined by Rob Thomas. And Rob obviously we got a lot of information here. >> Yes, we've covered a lot of ground. >> This is intense. You got to break it down for me cause I think we zoom out and see the big picture. What better data science can deliver to a business? Why is this so important? I mean we've heard it through and through. >> Yeah, well, I heard it a couple times. But it starts with businesses have to embrace a data driven culture. And it is a change. And we need to make data accessible with the right tools in a collaborative culture because we've got diverse skill sets in every organization. But data driven companies succeed when data science tools are in the hands of everyone. And I think that's a new thought. I think most companies think just get your data scientist some tools, you'll be fine. This is about tools in the hands of everyone. I think the panel did a great job of describing about how we get to data science for all. Building a data culture, making it a part of your everyday operations, and the highlights of what Daniel just showed us, that's some pretty cool features for how organizations can get to this, which is you can see IBM's Data Science Experience, how that supports all teams. You saw data analysts, data scientists, application developer, IT staff, all working together. Second, you saw how we support all tools. And your choice of tools. So the most popular data science libraries integrated into one platform. And we saw some new capabilities that help companies avoid lock-in, where you can import existing models created from specialist tools like SPSS or others. And then deploy them and manage them inside of Data Science Experience. That's pretty interesting. And lastly, you see we continue to build on this best of open tools. Partnering with companies like H2O, Hortonworks, and others. Third, you can see how you use all data no matter where it lives. That's a key challenge every organization's going to face. Private, public, federating all data sources. We announced new integration with the Hortonworks data platform where we deploy machine learning models where your data resides. That's been a key theme. Analytics where the data is. And lastly, supporting all types of deployments. Deploy them in your Hadoop cluster. Deploy them in your Integrated Analytic System. Or deploy them in z, just to name a few. A lot of different options here. But look, don't believe anything I say. Go try it for yourself. Data Science Experience, anybody can use it. Go to datascience.ibm.com and look, if you want to start right now, we just created a team that we call Data Science Elite. These are the best data scientists in the world that will come sit down with you and co-create solutions, models, and prove out a proof of concept. >> Good stuff. Thank you Rob. So you might be asking what does an organization look like that embraces data science for all? And how could it transform your role? I'm going to head back to the office and check it out. Let's start with the perspective of the line of business. What's changed? Well, now you're starting to explore new business models. You've uncovered opportunities for new revenue sources and all that hidden data. And being disrupted is no longer keeping you up at night. As a data science leader, you're beginning to collaborate with a line of business to better understand and translate the objectives into the models that are being built. Your data scientists are also starting to collaborate with the less technical team members and analysts who are working closest to the business problem. And as a data scientist, you stop feeling like you're falling behind. Open source tools are keeping you current. You're also starting to operationalize the work that you do. And you get to do more of what you love. Explore data, build models, put your models into production, and create business impact. All in all, it's not a bad scenario. Thanks. All right. We are back and coming up next, oh this is a special time right now. Cause we got a great guest speaker. New York Magazine called him the spreadsheet psychic and number crunching prodigy who went from correctly forecasting baseball games to correctly forecasting presidential elections. He even invented a proprietary algorithm called PECOTA for predicting future performance by baseball players and teams. And his New York Times bestselling book, The Signal and the Noise was named by Amazon.com as the number one best non-fiction book of 2012. He's currently the Editor in Chief of the award winning website, FiveThirtyEight and appears on ESPN as an on air commentator. Big round of applause. My pleasure to welcome Nate Silver. >> Thank you. We met backstage. >> Yes. >> It feels weird to re-shake your hand, but you know, for the audience. >> I had to give the intense firm grip. >> Definitely. >> The ninja grip. So you and I have crossed paths kind of digitally in the past, which it really interesting, is I started my career at ESPN. And I started as a production assistant, then later back on air for sports technology. And I go to you to talk about sports because-- >> Yeah. >> Wow, has ESPN upped their game in terms of understanding the importance of data and analytics. And what it brings. Not just to MLB, but across the board. >> No, it's really infused into the way they present the broadcast. You'll have win probability on the bottom line. And they'll incorporate FiveThirtyEight metrics into how they cover college football for example. So, ESPN ... Sports is maybe the perfect, if you're a data scientist, like the perfect kind of test case. And the reason being that sports consists of problems that have rules. And have structure. And when problems have rules and structure, then it's a lot easier to work with. So it's a great way to kind of improve your skills as a data scientist. Of course, there are also important real world problems that are more open ended, and those present different types of challenges. But it's such a natural fit. The teams. Think about the teams playing the World Series tonight. The Dodgers and the Astros are both like very data driven, especially Houston. Golden State Warriors, the NBA Champions, extremely data driven. New England Patriots, relative to an NFL team, it's shifted a little bit, the NFL bar is lower. But the Patriots are certainly very analytical in how they make decisions. So, you can't talk about sports without talking about analytics. >> And I was going to save the baseball question for later. Cause we are moments away from game seven. >> Yeah. >> Is everyone else watching game seven? It's been an incredible series. Probably one of the best of all time. >> Yeah, I mean-- >> You have a prediction here? >> You can mention that too. So I don't have a prediction. FiveThirtyEight has the Dodgers with a 60% chance of winning. >> [Katie] LA Fans. >> So you have two teams that are about equal. But the Dodgers pitching staff is in better shape at the moment. The end of a seven game series. And they're at home. >> But the statistics behind the two teams is pretty incredible. >> Yeah. It's like the first World Series in I think 56 years or something where you have two 100 win teams facing one another. There have been a lot of parity in baseball for a lot of years. Not that many offensive overall juggernauts. But this year, and last year with the Cubs and the Indians too really. But this year, you have really spectacular teams in the World Series. It kind of is a showcase of modern baseball. Lots of home runs. Lots of strikeouts. >> [Katie] Lots of extra innings. >> Lots of extra innings. Good defense. Lots of pitching changes. So if you love the modern baseball game, it's been about the best example that you've had. If you like a little bit more contact, and fewer strikeouts, maybe not so much. But it's been a spectacular and very exciting World Series. It's amazing to talk. MLB is huge with analysis. I mean, hands down. But across the board, if you can provide a few examples. Because there's so many teams in front offices putting such an, just a heavy intensity on the analysis side. And where the teams are going. And if you could provide any specific examples of teams that have really blown your mind. Especially over the last year or two. Because every year it gets more exciting if you will. I mean, so a big thing in baseball is defensive shifts. So if you watch tonight, you'll probably see a couple of plays where if you're used to watching baseball, a guy makes really solid contact. And there's a fielder there that you don't think should be there. But that's really very data driven where you analyze where's this guy hit the ball. That part's not so hard. But also there's game theory involved. Because you have to adjust for the fact that he knows where you're positioning the defenders. He's trying therefore to make adjustments to his own swing and so that's been a major innovation in how baseball is played. You know, how bullpens are used too. Where teams have realized that actually having a guy, across all sports pretty much, realizing the importance of rest. And of fatigue. And that you can be the best pitcher in the world, but guess what? After four or five innings, you're probably not as good as a guy who has a fresh arm necessarily. So I mean, it really is like, these are not subtle things anymore. It's not just oh, on base percentage is valuable. It really effects kind of every strategic decision in baseball. The NBA, if you watch an NBA game tonight, see how many three point shots are taken. That's in part because of data. And teams realizing hey, three points is worth more than two, once you're more than about five feet from the basket, the shooting percentage gets really flat. And so it's revolutionary, right? Like teams that will shoot almost half their shots from the three point range nowadays. Larry Bird, who wound up being one of the greatest three point shooters of all time, took only eight three pointers his first year in the NBA. It's quite noticeable if you watch baseball or basketball in particular. >> Not to focus too much on sports. One final question. In terms of Major League Soccer, and now in NFL, we're having the analysis and having wearables where it can now showcase if they wanted to on screen, heart rate and breathing and how much exertion. How much data is too much data? And when does it ruin the sport? >> So, I don't think, I mean, again, it goes sport by sport a little bit. I think in basketball you actually have a more exciting game. I think the game is more open now. You have more three pointers. You have guys getting higher assist totals. But you know, I don't know. I'm not one of those people who thinks look, if you love baseball or basketball, and you go in to work for the Astros, the Yankees or the Knicks, they probably need some help, right? You really have to be passionate about that sport. Because it's all based on what questions am I asking? As I'm a fan or I guess an employee of the team. Or a player watching the game. And there isn't really any substitute I don't think for the insight and intuition that a curious human has to kind of ask the right questions. So we can talk at great length about what tools do you then apply when you have those questions, but that still comes from people. I don't think machine learning could help with what questions do I want to ask of the data. It might help you get the answers. >> If you have a mid-fielder in a soccer game though, not exerting, only 80%, and you're seeing that on a screen as a fan, and you're saying could that person get fired at the end of the day? One day, with the data? >> So we found that actually some in soccer in particular, some of the better players are actually more still. So Leo Messi, maybe the best player in the world, doesn't move as much as other soccer players do. And the reason being that A) he kind of knows how to position himself in the first place. B) he realizes that you make a run, and you're out of position. That's quite fatiguing. And particularly soccer, like basketball, is a sport where it's incredibly fatiguing. And so, sometimes the guys who conserve their energy, that kind of old school mentality, you have to hustle at every moment. That is not helpful to the team if you're hustling on an irrelevant play. And therefore, on a critical play, can't get back on defense, for example. >> Sports, but also data is moving exponentially as we're just speaking about today. Tech, healthcare, every different industry. Is there any particular that's a favorite of yours to cover? And I imagine they're all different as well. >> I mean, I do like sports. We cover a lot of politics too. Which is different. I mean in politics I think people aren't intuitively as data driven as they might be in sports for example. It's impressive to follow the breakthroughs in artificial intelligence. It started out just as kind of playing games and playing chess and poker and Go and things like that. But you really have seen a lot of breakthroughs in the last couple of years. But yeah, it's kind of infused into everything really. >> You're known for your work in politics though. Especially presidential campaigns. >> Yeah. >> This year, in particular. Was it insanely challenging? What was the most notable thing that came out of any of your predictions? >> I mean, in some ways, looking at the polling was the easiest lens to look at it. So I think there's kind of a myth that last year's result was a big shock and it wasn't really. If you did the modeling in the right way, then you realized that number one, polls have a margin of error. And so when a candidate has a three point lead, that's not particularly safe. Number two, the outcome between different states is correlated. Meaning that it's not that much of a surprise that Clinton lost Wisconsin and Michigan and Pennsylvania and Ohio. You know I'm from Michigan. Have friends from all those states. Kind of the same types of people in those states. Those outcomes are all correlated. So what people thought was a big upset for the polls I think was an example of how data science done carefully and correctly where you understand probabilities, understand correlations. Our model gave Trump a 30% chance of winning. Others models gave him a 1% chance. And so that was interesting in that it showed that number one, that modeling strategies and skill do matter quite a lot. When you have someone saying 30% versus 1%. I mean, that's a very very big spread. And number two, that these aren't like solved problems necessarily. Although again, the problem with elections is that you only have one election every four years. So I can be very confident that I have a better model. Even one year of data doesn't really prove very much. Even five or 10 years doesn't really prove very much. And so, being aware of the limitations to some extent intrinsically in elections when you only get one kind of new training example every four years, there's not really any way around that. There are ways to be more robust to sparce data environments. But if you're identifying different types of business problems to solve, figuring out what's a solvable problem where I can add value with data science is a really key part of what you're doing. >> You're such a leader in this space. In data and analysis. It would be interesting to kind of peek back the curtain, understand how you operate but also how large is your team? How you're putting together information. How quickly you're putting it out. Cause I think in this right now world where everybody wants things instantly-- >> Yeah. >> There's also, you want to be first too in the world of journalism. But you don't want to be inaccurate because that's your credibility. >> We talked about this before, right? I think on average, speed is a little bit overrated in journalism. >> [Katie] I think it's a big problem in journalism. >> Yeah. >> Especially in the tech world. You have to be first. You have to be first. And it's just pumping out, pumping out. And there's got to be more time spent on stories if I can speak subjectively. >> Yeah, for sure. But at the same time, we are reacting to the news. And so we have people that come in, we hire most of our people actually from journalism. >> [Katie] How many people do you have on your team? >> About 35. But, if you get someone who comes in from an academic track for example, they might be surprised at how fast journalism is. That even though we might be slower than the average website, the fact that there's a tragic event in New York, are there things we have to say about that? A candidate drops out of the presidential race, are things we have to say about that. In periods ranging from minutes to days as opposed to kind of weeks to months to years in the academic world. The corporate world moves faster. What is a little different about journalism is that you are expected to have more precision where people notice when you make a mistake. In corporations, you have maybe less transparency. If you make 10 investments and seven of them turn out well, then you'll get a lot of profit from that, right? In journalism, it's a little different. If you make kind of seven predictions or say seven things, and seven of them are very accurate and three of them aren't, you'll still get criticized a lot for the three. Just because that's kind of the way that journalism is. And so the kind of combination of needing, not having that much tolerance for mistakes, but also needing to be fast. That is tricky. And I criticize other journalists sometimes including for not being data driven enough, but the best excuse any journalist has, this is happening really fast and it's my job to kind of figure out in real time what's going on and provide useful information to the readers. And that's really difficult. Especially in a world where literally, I'll probably get off the stage and check my phone and who knows what President Trump will have tweeted or what things will have happened. But it really is a kind of 24/7. >> Well because it's 24/7 with FiveThirtyEight, one of the most well known sites for data, are you feeling micromanagey on your people? Because you do have to hit this balance. You can't have something come out four or five days later. >> Yeah, I'm not -- >> Are you overseeing everything? >> I'm not by nature a micromanager. And so you try to hire well. You try and let people make mistakes. And the flip side of this is that if a news organization that never had any mistakes, never had any corrections, that's raw, right? You have to have some tolerance for error because you are trying to decide things in real time. And figure things out. I think transparency's a big part of that. Say here's what we think, and here's why we think it. If we have a model to say it's not just the final number, here's a lot of detail about how that's calculated. In some case we release the code and the raw data. Sometimes we don't because there's a proprietary advantage. But quite often we're saying we want you to trust us and it's so important that you trust us, here's the model. Go play around with it yourself. Here's the data. And that's also I think an important value. >> That speaks to open source. And your perspective on that in general. >> Yeah, I mean, look, I'm a big fan of open source. I worry that I think sometimes the trends are a little bit away from open source. But by the way, one thing that happens when you share your data or you share your thinking at least in lieu of the data, and you can definitely do both is that readers will catch embarrassing mistakes that you made. By the way, even having open sourceness within your team, I mean we have editors and copy editors who often save you from really embarrassing mistakes. And by the way, it's not necessarily people who have a training in data science. I would guess that of our 35 people, maybe only five to 10 have a kind of formal background in what you would call data science. >> [Katie] I think that speaks to the theme here. >> Yeah. >> [Katie] That everybody's kind of got to be data literate. >> But yeah, it is like you have a good intuition. You have a good BS detector basically. And you have a good intuition for hey, this looks a little bit out of line to me. And sometimes that can be based on domain knowledge, right? We have one of our copy editors, she's a big college football fan. And we had an algorithm we released that tries to predict what the human being selection committee will do, and she was like, why is LSU rated so high? Cause I know that LSU sucks this year. And we looked at it, and she was right. There was a bug where it had forgotten to account for their last game where they lost to Troy or something and so -- >> That also speaks to the human element as well. >> It does. In general as a rule, if you're designing a kind of regression based model, it's different in machine learning where you have more, when you kind of build in the tolerance for error. But if you're trying to do something more precise, then so much of it is just debugging. It's saying that looks wrong to me. And I'm going to investigate that. And sometimes it's not wrong. Sometimes your model actually has an insight that you didn't have yourself. But fairly often, it is. And I think kind of what you learn is like, hey if there's something that bothers me, I want to go investigate that now and debug that now. Because the last thing you want is where all of a sudden, the answer you're putting out there in the world hinges on a mistake that you made. Cause you never know if you have so to speak, 1,000 lines of code and they all perform something differently. You never know when you get in a weird edge case where this one decision you made winds up being the difference between your having a good forecast and a bad one. In a defensible position and a indefensible one. So we definitely are quite diligent and careful. But it's also kind of knowing like, hey, where is an approximation good enough and where do I need more precision? Cause you could also drive yourself crazy in the other direction where you know, it doesn't matter if the answer is 91.2 versus 90. And so you can kind of go 91.2, three, four and it's like kind of A) false precision and B) not a good use of your time. So that's where I do still spend a lot of time is thinking about which problems are "solvable" or approachable with data and which ones aren't. And when they're not by the way, you're still allowed to report on them. We are a news organization so we do traditional reporting as well. And then kind of figuring out when do you need precision versus when is being pointed in the right direction good enough? >> I would love to get inside your brain and see how you operate on just like an everyday walking to Walgreens movement. It's like oh, if I cross the street in .2-- >> It's not, I mean-- >> Is it like maddening in there? >> No, not really. I mean, I'm like-- >> This is an honest question. >> If I'm looking for airfares, I'm a little more careful. But no, part of it's like you don't want to waste time on unimportant decisions, right? I will sometimes, if I can't decide what to eat at a restaurant, I'll flip a coin. If the chicken and the pasta both sound really good-- >> That's not high tech Nate. We want better. >> But that's the point, right? It's like both the chicken and the pasta are going to be really darn good, right? So I'm not going to waste my time trying to figure it out. I'm just going to have an arbitrary way to decide. >> Serious and business, how organizations in the last three to five years have just evolved with this data boom. How are you seeing it as from a consultant point of view? Do you think it's an exciting time? Do you think it's a you must act now time? >> I mean, we do know that you definitely see a lot of talent among the younger generation now. That so FiveThirtyEight has been at ESPN for four years now. And man, the quality of the interns we get has improved so much in four years. The quality of the kind of young hires that we make straight out of college has improved so much in four years. So you definitely do see a younger generation for which this is just part of their bloodstream and part of their DNA. And also, particular fields that we're interested in. So we're interested in people who have both a data and a journalism background. We're interested in people who have a visualization and a coding background. A lot of what we do is very much interactive graphics and so forth. And so we do see those skill sets coming into play a lot more. And so the kind of shortage of talent that had I think frankly been a problem for a long time, I'm optimistic based on the young people in our office, it's a little anecdotal but you can tell that there are so many more programs that are kind of teaching students the right set of skills that maybe weren't taught as much a few years ago. >> But when you're seeing these big organizations, ESPN as perfect example, moving more towards data and analytics than ever before. >> Yeah. >> You would say that's obviously true. >> Oh for sure. >> If you're not moving that direction, you're going to fall behind quickly. >> Yeah and the thing is, if you read my book or I guess people have a copy of the book. In some ways it's saying hey, there are lot of ways to screw up when you're using data. And we've built bad models. We've had models that were bad and got good results. Good models that got bad results and everything else. But the point is that the reason to be out in front of the problem is so you give yourself more runway to make errors and mistakes. And to learn kind of what works and what doesn't and which people to put on the problem. I sometimes do worry that a company says oh we need data. And everyone kind of agrees on that now. We need data science. Then they have some big test case. And they have a failure. And they maybe have a failure because they didn't know really how to use it well enough. But learning from that and iterating on that. And so by the time that you're on the third generation of kind of a problem that you're trying to solve, and you're watching everyone else make the mistake that you made five years ago, I mean, that's really powerful. But that doesn't mean that getting invested in it now, getting invested both in technology and the human capital side is important. >> Final question for you as we run out of time. 2018 beyond, what is your biggest project in terms of data gathering that you're working on? >> There's a midterm election coming up. That's a big thing for us. We're also doing a lot of work with NBA data. So for four years now, the NBA has been collecting player tracking data. So they have 3D cameras in every arena. So they can actually kind of quantify for example how fast a fast break is, for example. Or literally where a player is and where the ball is. For every NBA game now for the past four or five years. And there hasn't really been an overall metric of player value that's taken advantage of that. The teams do it. But in the NBA, the teams are a little bit ahead of journalists and analysts. So we're trying to have a really truly next generation stat. It's a lot of data. Sometimes I now more oversee things than I once did myself. And so you're parsing through many, many, many lines of code. But yeah, so we hope to have that out at some point in the next few months. >> Anything you've personally been passionate about that you've wanted to work on and kind of solve? >> I mean, the NBA thing, I am a pretty big basketball fan. >> You can do better than that. Come on, I want something real personal that you're like I got to crunch the numbers. >> You know, we tried to figure out where the best burrito in America was a few years ago. >> I'm going to end it there. >> Okay. >> Nate, thank you so much for joining us. It's been an absolute pleasure. Thank you. >> Cool, thank you. >> I thought we were going to chat World Series, you know. Burritos, important. I want to thank everybody here in our audience. Let's give him a big round of applause. >> [Nate] Thank you everyone. >> Perfect way to end the day. And for a replay of today's program, just head on over to ibm.com/dsforall. I'm Katie Linendoll. And this has been Data Science for All: It's a Whole New Game. Test one, two. One, two, three. Hi guys, I just want to quickly let you know as you're exiting. A few heads up. Downstairs right now there's going to be a meet and greet with Nate. And we're going to be doing that with clients and customers who are interested. So I would recommend before the game starts, and you lose Nate, head on downstairs. And also the gallery is open until eight p.m. with demos and activations. And tomorrow, make sure to come back too. Because we have exciting stuff. I'll be joining you as your host. And we're kicking off at nine a.m. So bye everybody, thank you so much. >> [Announcer] Ladies and gentlemen, thank you for attending this evening's webcast. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your name badge at the registration desk. Thank you. Also, please note there are two exits on the back of the room on either side of the room. Have a good evening. Ladies and gentlemen, the meet and greet will be on stage. Thank you.

Published Date : Nov 1 2017

SUMMARY :

Today the ability to extract value from data is becoming a shared mission. And for all of you during the program, I want to remind you to join that conversation on And when you and I chatted about it. And the scale and complexity of the data that organizations are having to deal with has It's challenging in the world of unmanageable. And they have to find a way. AI. And it's incredible that this buzz word is happening. And to get to an AI future, you have to lay a data foundation today. And four is you got to expand job roles in the organization. First pillar in this you just discussed. And now you get to where we are today. And if you don't have a strategy for how you acquire that and manage it, you're not going And the way I think about that is it's really about moving from static data repositories And we continue with the architecture. So you need a way to federate data across different environments. So we've laid out what you need for driving automation. And so when you think about the real use cases that are driving return on investment today, Let's go ahead and come back to something that you mentioned earlier because it's fascinating And so the new job roles is about how does everybody have data first in their mind? Everybody in the company has to be data literate. So overall, group effort, has to be a common goal, and we all need to be data literate But at the end of the day, it's kind of not an easy task. It's not easy but it's maybe not as big of a shift as you would think. It's interesting to hear you say essentially you need to train everyone though across the And look, if you want to get your hands on code and just dive right in, you go to datascience.ibm.com. And I've heard that the placement behind those jobs, people graduating with the MS is high. Let me get back to something else you touched on earlier because you mentioned that a number They produce a lot of the shows that I'm sure you watch Katie. And this is a good example. So they have to optimize every aspect of their business from marketing campaigns to promotions And so, as we talk to clients we think about how do you start down this path now, even It's analytics first to the data, not the other way around. We as a practice, we say you want to bring data to where the data sits. And a Harvard Business Review even dubbed it the sexiest job of the 21st century. Female preferred, on the cover of Vogue. And how does it change everything? And while it's important to recognize this critical skill set, you can't just limit it And we call it clickers and coders. [Katie] I like that. And there's not a lot of things available today that do that. Because I hear you talking about the data scientists role and how it's critical to success, And my view is if you have the right platform, it enables the organization to collaborate. And every organization needs to think about what are the skills that are critical? Use this as your chance to reinvent IT. And I can tell you even personally being effected by how important the analysis is in working And think about if you don't do something. And now we're going to get to the fun hands on part of our story. And then how do you move analytics closer to your data? And in here I can see that JP Morgan is calling for a US dollar rebound in the second half But then where it gets interesting is you go to the bottom. data, his stock portfolios, and browsing behavior to build a model which can predict his affinity And so, as a financial adviser, you look at this and you say, all right, we know he loves And I want to do that by picking a auto stock which has got negative correlation with Ferrari. Cause you start clicking that and immediately we're getting instant answers of what's happening. And what I see here instantly is that Honda has got a negative correlation with Ferrari, As a financial adviser, you wouldn't think about federating data, machine learning, pretty And drive the machine learning into the appliance. And even score hundreds of customers for their affinities on a daily basis. And then you see when you deploy analytics next to your data, even a financial adviser, And as a data science leader or data scientist, you have a lot of the same concerns. But you guys each have so many unique roles in your business life. And just by looking at the demand of companies that wants us to help them go through this And I think the whole ROI of data is that you can now understand people's relationships Well you can have all the data in the world, and I think it speaks to, if you're not doing And I think that that's one of the things that customers are coming to us for, right? And Nir, this is something you work with a lot. And the companies that are not like that. Tricia, companies have to deal with data behind the firewall and in the new multi cloud And so that's why I think it's really important to understand that when you implement big And how are the clients, how are the users actually interacting with the system? And right now the way I see teams being set up inside companies is that they're creating But in order to actually see all of the RY behind the data, you also have to have a creative That's one of the things that we see a lot. So a lot of the training we do is sort of data engineers. And I think that's a very strong point when it comes to the data analysis side. And that's where you need the human element to come back in and say okay, look, you're And the people who are really great at providing that human intelligence are social scientists. the talent piece is actually the most important crucial hard to get. It may be to take folks internally who have a lot of that domain knowledge that you have And from data scientist to machine learner. And what I explain to them is look, you're still making decisions in the same way. And I mean, just to give you an example, we are partnering with one of the major cloud And what you're talking about with culture is really where I think we're talking about And I think that communication between the technical stakeholders and management You guys made this way too easy. I want to leave you with an opportunity to, anything you want to add to this conversation? I think one thing to conclude is to say that companies that are not data driven is And thank you guys again for joining us. And we're going to turn our attention to how you can deliver on what they're talking about And finally how you could build models anywhere and employ them close to where your data is. And thanks to Siva for taking us through it. You got to break it down for me cause I think we zoom out and see the big picture. And we saw some new capabilities that help companies avoid lock-in, where you can import And as a data scientist, you stop feeling like you're falling behind. We met backstage. And I go to you to talk about sports because-- And what it brings. And the reason being that sports consists of problems that have rules. And I was going to save the baseball question for later. Probably one of the best of all time. FiveThirtyEight has the Dodgers with a 60% chance of winning. So you have two teams that are about equal. It's like the first World Series in I think 56 years or something where you have two 100 And that you can be the best pitcher in the world, but guess what? And when does it ruin the sport? So we can talk at great length about what tools do you then apply when you have those And the reason being that A) he kind of knows how to position himself in the first place. And I imagine they're all different as well. But you really have seen a lot of breakthroughs in the last couple of years. You're known for your work in politics though. What was the most notable thing that came out of any of your predictions? And so, being aware of the limitations to some extent intrinsically in elections when It would be interesting to kind of peek back the curtain, understand how you operate but But you don't want to be inaccurate because that's your credibility. I think on average, speed is a little bit overrated in journalism. And there's got to be more time spent on stories if I can speak subjectively. And so we have people that come in, we hire most of our people actually from journalism. And so the kind of combination of needing, not having that much tolerance for mistakes, Because you do have to hit this balance. And so you try to hire well. And your perspective on that in general. But by the way, one thing that happens when you share your data or you share your thinking And you have a good intuition for hey, this looks a little bit out of line to me. And I think kind of what you learn is like, hey if there's something that bothers me, It's like oh, if I cross the street in .2-- I mean, I'm like-- But no, part of it's like you don't want to waste time on unimportant decisions, right? We want better. It's like both the chicken and the pasta are going to be really darn good, right? Serious and business, how organizations in the last three to five years have just And man, the quality of the interns we get has improved so much in four years. But when you're seeing these big organizations, ESPN as perfect example, moving more towards But the point is that the reason to be out in front of the problem is so you give yourself Final question for you as we run out of time. And so you're parsing through many, many, many lines of code. You can do better than that. You know, we tried to figure out where the best burrito in America was a few years Nate, thank you so much for joining us. I thought we were going to chat World Series, you know. And also the gallery is open until eight p.m. with demos and activations. If you are not attending all cloud and cognitive summit tomorrow, we ask that you recycle your

ENTITIES

Entity	Category	Confidence
Tricia Wang	PERSON	0.99+
Katie	PERSON	0.99+
Katie Linendoll	PERSON	0.99+
Rob	PERSON	0.99+
Google	ORGANIZATION	0.99+
Joane	PERSON	0.99+
Daniel	PERSON	0.99+
Michael Li	PERSON	0.99+
Nate Silver	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Hortonworks	ORGANIZATION	0.99+
Trump	PERSON	0.99+
Nate	PERSON	0.99+
Honda	ORGANIZATION	0.99+
Siva	PERSON	0.99+
McKinsey	ORGANIZATION	0.99+
Amazon	ORGANIZATION	0.99+
Larry Bird	PERSON	0.99+
2017	DATE	0.99+
Rob Thomas	PERSON	0.99+
Michigan	LOCATION	0.99+
Yankees	ORGANIZATION	0.99+
New York	LOCATION	0.99+
Clinton	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Tesco	ORGANIZATION	0.99+
Michael	PERSON	0.99+
America	LOCATION	0.99+
Leo	PERSON	0.99+
four years	QUANTITY	0.99+
five	QUANTITY	0.99+
30%	QUANTITY	0.99+
Astros	ORGANIZATION	0.99+
Trish	PERSON	0.99+
Sudden Compass	ORGANIZATION	0.99+
Leo Messi	PERSON	0.99+
two teams	QUANTITY	0.99+
1,000 lines	QUANTITY	0.99+
one year	QUANTITY	0.99+
10 investments	QUANTITY	0.99+
NASDAQ	ORGANIZATION	0.99+
The Signal and the Noise	TITLE	0.99+
Tricia	PERSON	0.99+
Nir Kaldero	PERSON	0.99+
80%	QUANTITY	0.99+
BCG	ORGANIZATION	0.99+
Daniel Hernandez	PERSON	0.99+
ESPN	ORGANIZATION	0.99+
H2O	ORGANIZATION	0.99+
Ferrari	ORGANIZATION	0.99+
last year	DATE	0.99+
18	QUANTITY	0.99+
three	QUANTITY	0.99+
Data Incubator	ORGANIZATION	0.99+
Patriots	ORGANIZATION	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Data Science Elite Team: