John Thomas & Steven Eliuk, IBM | IBM CDO Summit 2019
>> Live from San Francisco, California, it's theCUBE, covering the IBM Chief Data Officer Summit. Brought to you by IBM. >> We're back at San Francisco. We're here at Fisherman's Wharf covering the IBM Chief Data Officer event #IBMCDO. This is the tenth year of this event. They tend to bookend them both in San Francisco and in Boston, and you're watching theCUBE, the leader in live tech coverage. My name is Dave Valante. John Thomas is here, Cube alum and distinguished engineer, Director of Analytics at IBM, and somebody who provides technical direction to the data science elite team. John, good to see you again. Steve Aliouk is back. He is the Vice President of Deep Learning in the Global Chief Data Office, thanks for comin' on again. >> No problem. >> Let's get into it. So John, you and I have talked over the years at this event. What's new these days, what are you working on? >> So Dave, still working with clients on implementing data science and AI data use cases, mostly enterprise clients, and seeing a variety of different things developing in that space. Things have moved into broader discussions around AI and how to actually get value out of that. >> Okay, so I know one of the things that you've talked about is operationalizing machine intelligence and AI and cognitive and that's always a challenge, right. Sounds good, we see this potential but unless you change the operating model, you're not going to get the type of business value, so how do you operationalize AI? >> Yeah, this is a good question Dave. So, enterprises, many of them, are beginning to realize that it is not enough to focus on just the coding and development of the models, right. So they can hire super-talented Python TensorFlow programmers and get the model building done, but there's no value in it until these models actually are operationalized in the context of the business. So one aspect of this is, actually we know, we are thinking of this in a very systematic way and talking about this in a prescriptive way. So, you've got to scope your use cases out. You got to understand what is involved in implementing the use case. Then the steps are build, run, manage, and each of these have technical aspects and business aspects around, right. So most people jump right into the build aspect, which is writing the code. Yeah, that's great, but once you build the code, build the models by writing code, how do you actually deploy these models? Whether that is for online invocation or back storing or whatever, how do you manage the performance of these models over time, how do you retrain these models, and most importantly, when these models are in production, how do I actually understand the business metrics around them? 'Cause this goes back to that first step of scoping. What are the business KPI's that the line of business cares about? The data scientist talks about data science metrics, position and recall and Area Under the ROC Curve and accuracy and so on. But how do these relate to business KPI's. >> All right, so we're going to get into each of those steps in a moment, but Steve I want to ask you, so part of your charter, Inderpal, Global Chief Data Officer, you guys have to do this for IBM, right, drink your own champagne, dog footing, whatever you call it. But there's real business reasons for you to do that. So how is IBM operationalizing AI? What kind of learnings can you share? >> Well, the beauty is I got a wide portfolio of products that I can pull from, so that's nice. Like things like AI open to Watson, some of the hardware components, all that stuffs kind of being baked in. But part of the reason that John and I want to do this interview together, is because what he's producing, what his thoughts are kind of resonates very well for our own practices internally. We've got so many enterprise use cases, how are we deciding, you know, which ones to work on, which ones have the data, potentially which ones have the biggest business impact, all those KPI's etcetera, also, in addition to, for the practitioners, once we decide on a specific enterprise use case to work on, when have they reached the level where the enterprise is having a return on investment? They don't need to keep refining and refining and refining, or maybe they do, but they don't know these practitioners. So we have to clearly justify it, and scope it accordingly, or these practitioners are left in this kind of limbo, where they're producing things, but not able to iterate effectively for the business, right? So that process is a big problem I'm facing internally. We got hundreds of internal use cases, and we're trying to iterate through them. There's an immense amount of scoping, understanding, etcetera, but at the same time, we're building more and more technical debt, as the process evolves, being able to move from project to project, my team is ballooning, we can't do this, we can't keep growing, they're not going to give me another hundred head count, another hundred head count, so we're definitely need to manage it more appropriately. And that's where this mentality comes in there's-- >> All right, so I got a lot of questions. I want to start unpacking this stuff. So the scope piece, that's we're setting goals, identifying the metrics, success metrics, KPI's, and the like, okay, reasonable starting point. But then you go into this, I think you call it, the explore or understanding phase. What's that all about, is that where governance comes in? >> That's exactly where governance comes in. Right, so because it is, you know, we all know the expression, garbage in, garbage out, if you don't know what data you're working with for your machine learning and deep learning enterprise projects, you will not have the resource that you want. And you might think this is obvious, but in an enterprise setting, understanding where the data comes from, who owns the data, who work on the data, the lineage of that data, who is allowed access to the data, policies and rules around that, it's all important. Because without all of these things in place, the models will be questioned later on, and the value of the models will not realized, right? So that part of exploration or understanding, whatever you want to call it, is about understanding data that has to be used by the ML process, but then at a point in time, the models themselves need to be cataloged, need to be published, because the business as a whole needs to understand what models have been produced out of this data. So who built these models? Just as you have lineage of data, you need lineage of models. You need to understand what API's are associated with the models that are being produced. What are the business KPI's that are linked to model metrics? So all of that is part of this understand and explore path. >> Okay, and then you go to build. I think people understand that, everybody wants to start there, just start the dessert, and then you get into the sort of run and manage piece. Run, you want a time to value, and then when you get to the management phase, you really want to be efficient, cost-effective, and then iterative. Okay, so here's the hard question here is. What you just described, some of the folks, particularly the builders are going to say, "Aw, such a waterfall approach. Just start coding." Remember 15 years ago, it was like, "Okay, how do we "write better software, just start building! "Forget about the requirements, "Just start writing code." Okay, but then what happens, is you have to bolt on governance and security and everything else so, talk about how you are able to maintain agility in this model. >> Yeah, I was going to use the word agile, right? So even in each of these phases, it is an agile approach. So the mindset is about agile sprints and our two week long sprints, with very specific metrics at the end of each sprint that is validated against the line of business requirements. So although it might sound waterfall, you're actually taking an agile approach to each of these steps. And if you are going through this, you have also the option to course correct as it goes along, because think of this, the first step was scoping. The line of business gave you a bunch of business metrics or business KPI's they care about, but somewhere in the build phase, past sprint one or sprint 2, you realize, oh well, you know what, that business KPI is not directly achievable or it needs to be refined or tweaked. And there is that circle back with the line of business and a course correction as it was. So it's a very agile approach that you have to take. >> Are they, are they, That's I think right on, because again, if you go and bolt on compliance and governance and security after the fact, we know from years of experience, that it really doesn't work well. You build up technical debt faster. But are these quasi-parallel? I mean there's somethings that you can do in build as the scoping is going on. Is there collaboration so you can describe, can you describe that a little bit? >> Absolutely, so for example, if I know the domain of the problem, I can actually get started with templates that help me accelerate the build process. So I think in your group, for example, IBM internally, there are many, many templates these guys are using. Want to talk a little bit about that? >> Well, we can't just start building up every single time. You know, that's again, I'm going to use this word and really resonate it, you know it's not extensible. Each project, we have to get to the point of using templates, so we had to look at those initiatives and invest in those initiatives, 'cause initially it's harder. But at least once we have some of those cookie-cutter templates and some of them, they might have to have abstractions around certain parts of them, but that's the only way we're ever able to kind of tackle so many problems. So no, without a doubt, it's an important consideration, but at the same time, you have to appreciate there's a lot of projects that are fundamentally different. And that's when you have to have very senior people kind of looking at how to abstract those templates to make them reusable and consumable by others. >> But the team structure, it's not a single amoeba going through all these steps right? These are smaller teams that are, and then there's some threading between each step? >> This is important. >> Yeah, that's tough. We were just talking about that concept. >> Just talking about skills and >> The bind between those groups is something that we're trying to figure out how to break down. 'Cause that's something he recognizes, I recognize internally, but understanding that those peoples tasks, they're never going to be able to iterate through different enterprise problems, unless they break down those borders and really invest in the communication and building those tools. >> Exactly, you talk about full stack teams. So you, it is not enough to have coding skills obviously. >> Right. What is the skill needed to get this into a run environment, right? What is the skill needed to take metrics like not metrics, but explainability, fairness in the moderates, and map that to business metrics. That's a very different skill from Python coding skills. So full stack teams are important, and at the beginning of this process where someone, line of business throws 100 different ideas at you, and you have to go through the scoping exercise, that is a very specific skill that is needed, working together with your coders and runtime administrators. Because how do you define the business KPI's and how do you refine them later on in the life cycle? And how do you translate between line of business lingo and what the coders are going to call it? So it's a full stack team concept. It may not necessarily all be in one group, it may be, but they have to work together across these different side loads to make it successful. >> All right guys, we got to leave it there, the trains are backing up here at IBM CDO conference. Thanks so much for sharing the perspectives on this. All right, keep it right there everybody. You're watchin' "theCUBE" from San Francisco, we're here at Fisherman's Wharf. The IBM Chief Data Officer event. Right back. (bubbly electronic music)
SUMMARY :
Brought to you by IBM. John, good to see you again. So John, you and I have talked over the years at this event. and how to actually get value out of that. Okay, so I know one of the things that you've talked about and development of the models, right. What kind of learnings can you share? as the process evolves, being able to move KPI's, and the like, okay, reasonable starting point. the models themselves need to be cataloged, just start the dessert, and then you get into So it's a very agile approach that you have to take. can do in build as the scoping is going on. that help me accelerate the build process. but at the same time, you have to appreciate Yeah, that's tough. and really invest in the communication Exactly, you talk about full stack teams. What is the skill needed to take metrics like Thanks so much for sharing the perspectives on this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve Aliouk | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
Dave Valante | PERSON | 0.99+ |
Boston | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Dave | PERSON | 0.99+ |
John Thomas | PERSON | 0.99+ |
tenth year | QUANTITY | 0.99+ |
first step | QUANTITY | 0.99+ |
San Francisco, California | LOCATION | 0.99+ |
each | QUANTITY | 0.99+ |
two week | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
100 different ideas | QUANTITY | 0.99+ |
hundreds | QUANTITY | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Each project | QUANTITY | 0.99+ |
each step | QUANTITY | 0.98+ |
each sprint | QUANTITY | 0.98+ |
15 years ago | DATE | 0.98+ |
one aspect | QUANTITY | 0.98+ |
Fisherman's Wharf | LOCATION | 0.98+ |
IBM Chief Data Officer Summit | EVENT | 0.97+ |
Chief Data Officer | EVENT | 0.96+ |
both | QUANTITY | 0.96+ |
one group | QUANTITY | 0.96+ |
single | QUANTITY | 0.95+ |
IBM CDO | EVENT | 0.95+ |
one | QUANTITY | 0.95+ |
theCUBE | TITLE | 0.94+ |
hundred head count | QUANTITY | 0.94+ |
IBM CDO Summit 2019 | EVENT | 0.94+ |
Global Chief Data Office | ORGANIZATION | 0.9+ |
Vice President | PERSON | 0.88+ |
#IBMCDO | EVENT | 0.84+ |
single time | QUANTITY | 0.83+ |
agile | TITLE | 0.81+ |
Inderpal | PERSON | 0.8+ |
Deep Learning | ORGANIZATION | 0.76+ |
Chief | EVENT | 0.72+ |
Watson | TITLE | 0.69+ |
Officer | EVENT | 0.69+ |
sprint 2 | OTHER | 0.65+ |
use cases | QUANTITY | 0.62+ |
Global | PERSON | 0.57+ |
once | QUANTITY | 0.56+ |
Chief Data Officer | PERSON | 0.53+ |
Cube | ORGANIZATION | 0.49+ |
theCUBE | EVENT | 0.45+ |
Steven Eliuk & Timothy Humphrey, IBM | IBM CDO 2019
>> Live from San Francisco, California, it's the Cube, covering the IBM Chief Data Officer Summit, brought to you by IBM. >> Hello, everyone. Welcome to historic Fisherman's Wharf in San Francisco. We're covering the IBM Chief Data Officer event, #IBMCDO. This is the Cube's, I think, eighth time covering this event. This is the tenth year anniversary of the IBM CDO event, and it's a little different format today. We're here at day one. It's like a half day. They start at noon, and then the keynotes. We're starting a little bit early. We're going to go all day today. My name is Dave Volante. Steve Eliuk is here. He's a Cube alum and Vice President of Deep Learning and the Global Chief Data Officer at IBM. And Tim Humphrey, the VP at the Chief Data Office at IBM. Gents, welcome to the Cube. >> Welcome, glad to be here. >> So, couple years ago, Ginni Rometty, at a big conference, talked about incumbent disruptors, and the whole notion was that you've got established businesses that need to transform into data businesses. Well, that struck me, that well, if IBM's going to sell that to its customers, it has to go through its own transformation, Steve. So let's start there. What is IBM doing to transform into a data company? >> Well, I've been at IBM for, you know, two years now, and luckily I'm benefiting from a lot of that transformation that's taken place over the past three or four years. So, internally, getting (mumbling) in order, understanding it, going through various different foundation stones, building those building blocks so that we can gather new insights and traverse through the cognitive journey. One of the nice things though, is that we have such a wide, diverse set of data within the company. So for different types of enterprise use cases that have benefits from AI, we have a lot of data assets that we can pull from. Now, keeping those data assets in good order is a challenging task in itself. And I'm able to pull from a lot of different tools that IBM's building for our customers. I get to use them internally, look at them, evaluate them, give them real practitioner's point of view to ultimately get insight for our internal business practices, but also for our customers in turn. >> Okay, so, when you think about a data business, they've got data at the core. I'm going to draw a, like, simple conceptual picture, and you've got people around it, maybe you've got processes around it. IBM, hundred-plus-year-old company, you've got different things at the core. It's products. It's people. It's business process. So maybe you could talk, Tim, about how you guys have gone about putting data at the center of the universe. Is that the right way to think about it? >> It is the right way to think about it, and I like how you were describing it. Because when you think about IBM, we've been around over a hundred years, and we do business in roughly over 170 countries. And we have businesses that span hardware, software, services, financing. And along the way, we've also acquired and divested a lot of companies and a lot of businesses. So what that leaves you with is a very fragmented data landscape, right? You know, to support regulations in this country, taxes, tax rules in another country, and having all these different types of businesses. Some you inherit. Some are born from within your company. It just leaves a lot of data silos. And as we see transformations being so important, and data is at the heart of that transformation, it was important for us to really be able to organize ourselves such that access to data is not a problem. Such that being able to combine data across disciplines from finance to HR to sales to marketing to procurement. That was the big challenge, right? And to do this in a way that really unlocks the value of the data, right? It's very easy to use somebody like one of my good, smart friends here, Steven Eliuk to develop models within a domain. But when you talk about cross-functional, complex data coming together to enable models, that's like the Holy Grail of transformation. Then we can deliver real business value. Then you're not waiting to make decisions. Then you can actually be ahead of trends. And so that's what we've been trying to do And the thought and the journey that we have been on is build a enterprise data platform. So, take the concept of a data lake. Bring in all your data sources into one place, but on top of that, make it more than just a data lake. Bring the services and capabilities that allow you to deliver insights from data together with the data so we have a data platform. And our Cognitive Enterprise data platform sort of enables that transformation, and it makes people like my good friend here much more productive and much more valuable to the business. >> This sounds like just a massive challenge. It's not just a technology challenge, obviously. You've got cultural. I mean, people, "This is my data." >> Yes. >> (laughs) And I'm referring, Tim, you're talking like you're largely through this process, right? So it first of all is... Can you talk about-- >> Basically, I will say this. This is a journey. You're never done, right? And one of the reasons why it is a journey is, if you're going to have a successful business, your business is going to keep transforming. Things are going to keep changing. And even in our landscape today, regulations are going to come. So there's always going to be some type of challenge. So I like to say, we're in a journey. We're not finished. (laughing) We're well down the path, and we've learned a lot. And one of the things we have learned, you hit on it, is culture, right? And it's a little hard to say, okay, I'm opening things up. I don't own the data. The company owns the data. There is that sort of cultural change that has to go along with this transformation. >> And there are technology challenges. I mean, when I first started in this business, AI was a hot concept, but you needed, like, massive supercomputers to actually make them work. Today, you now see their sort of rebirth. You know, (mumbling) talks about the AI winter, and now it's like the AI spring. >> Yeah. >> So how are you guys applying machine intelligence to make IBM a better business? >> Well, ultimately, the technology is really, basically transitioned us from the Dark Ages forward. Previously in the supercomputer mentality, didn't fit well for a lot of AI tasks. Now with GPUs and accelerators and FBGAs and things like that, we're definitely able, along with the data and the curated data that we need, to just fast-track. You know, the practitioners would spend an amazing amount of time gathering, crowdsourcing data, getting it in good order, and then the computational challenges were tough. Now, IBM came to the market with a very interesting computer. The POWER8 and POWER9 architecture has NVLink, which is a proprietary Nvidia, interconnect directly to the CPU. So we can feed GPUs a lot quicker for certain types of tasks. And for certain types of tasks that could mean, you know, you get to market quicker, or we get insights for enterprise problems quicker. So technology's a big deal, but it doesn't just center around GPUs. If you're slow to get access to the data, then that's a big problem. So the governance (mumbling) aspects are just as important, in addition to that, security, privacy, et cetera, also important. The quality of the data, where the data is. So it's and end-to-end system, and if there's any sort of impedance on any of it, it slows down the entire process. But then you have very expensive practitioners who are trying to do their job that are waiting on data or waiting on results. So it's really an end-to-end process. >> Okay, so let's assume for a second the technology box is checked. And again, as you say, Tim, it's a journey, and technology's going to continue to evolve. But we're at a point in technology now where this stuff actually can work. But what about data quality? What about compliance and governance? How are you dealing with the natural data quality problem? Because I'm a PNL manager. I'm saying, well, we're making data decisions, but if I don't like the decision, I'm going to attack the quality of the data. (laughing) So who adjudicates all that, and how have you resolved those challenges? >> Well, I like to think of... I'm an engineer by study, and I just like to think of simple formulas. Garbage in, garbage out. It applies to everything, and it definitely applies to data. >> (laughs) >> Your insights, the models, anything that you build is only going to be as good as the data foundation you have. So one of the key things that we've embarked on a journey on is, how do we standardize all aspects of data across the company? Now, you might say, hey, that's not a hard challenge, but it's really easy to do standards in a silo. For this organization, this is how we're going to call terms like geography, and this is how we'll represent these other terms. But when you do that across functions, it becomes conflict, right? Because people want to do it their own way. So we're on the path of standardizing data across the enterprise. That's going to allow us to have good definitions. And then, as you mentioned earlier, we are trying to use AI to be able to improve our data quality. One of the most important things about data is the metadata, the data that describes the data. >> Mm-hm. >> And we're trying to use AI to enhance our metadata. I'd love for Steven to talk a little bit about this, 'cause this is sort of his brainchild. But it's fascinating to me that we can be on a AI transformation, data can be at the heart of it, and we can use AI (laughs) to help improve the quality of our data. >> Right. >> It's fascinating. >> So the metadata problem is (mumbling) because you've talked about data length before. Then in this day and age, you're talking schema lists. Throw it into a data lake and figure out because you have to be agile for your business. So you can't do that with just human categorization, and you know, it's got to-- >> It could take hours, maybe years. >> For a company the size of IBM, the market would shift so fast, right? So how do you deal with that problem? >> That's exactly it. We're not patient enough to do the normative kind of mentality where you just throw a whole bunch of bodies at it. We're definitely moving from that non-extensible man count, full-time-employee type situation, to looking for ways that we can utilize automation. So around the metadata, quality and understanding of that data was incredibly problematic, and we were just hiring people left, right, and center. And then it's a really tough job that they have dealing with so many different business islands, et cetera. So looking for ways that we could automate that process, we finally found away to do it. So there's a lot of curated data. Now we're looking at data quality in addition to looking at regulatory and governance issues, in addition to automating the labeling of business metadata. And the business metadata is the taxonomy that everything is linked together. We understand it under the same normative umbrella. So then when one of the enterprise use cases says, "Hey, we're looking for additional data assets," oh, it's (snaps) in the cloud here, or it's in a private instance here. But we know it's there, and you can grab it, right? So we're definitely at probably the tail end of that curve now, and it started off really hard, but it's getting easier. So that's-- >> Guys, we got to leave it there. Awesome discussion. I hope we can pick it up in the future when maybe we have more metadata than data. >> (laughs) >> And metadata's going to become more and more valuable. But thank you so much for sharing a little bit about IBM's transformation. It was great having you guys on. >> Thank you. >> Alright, keep it right there, everybody. We'll be back with our next guest right after this short break. You're watching the Cube at IBM CDO in San Francisco. Right back. (electronic music) >> Alright, long clear. Alright, thank you guys. Appreciate it, I wish we had more time.
SUMMARY :
brought to you by IBM. and the Global Chief Data Officer at IBM. and the whole notion was One of the nice things though, Is that the right way to think about it? and data is at the heart It's not just a technology So it first of all is... And one of the things we have learned, and now it's like the AI spring. and the curated data that we need, but if I don't like the decision, and I just like to think as the data foundation you have. But it's fascinating to me So the metadata problem is (mumbling) It could take hours, So around the metadata, I hope we can pick it up in the future And metadata's going to IBM CDO in San Francisco. Alright, thank you guys.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steven | PERSON | 0.99+ |
Ginni Rometty | PERSON | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Steve Eliuk | PERSON | 0.99+ |
Dave Volante | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
Tim Humphrey | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Timothy Humphrey | PERSON | 0.99+ |
Tim | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
San Francisco, California | LOCATION | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Today | DATE | 0.99+ |
couple years ago | DATE | 0.99+ |
Fisherman's Wharf | LOCATION | 0.98+ |
two years | QUANTITY | 0.98+ |
over 170 countries | QUANTITY | 0.98+ |
IBM Chief Data Officer Summit | EVENT | 0.98+ |
One | QUANTITY | 0.97+ |
one | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
eighth time | QUANTITY | 0.97+ |
over a hundred years | QUANTITY | 0.97+ |
POWER9 | OTHER | 0.96+ |
POWER8 | OTHER | 0.96+ |
hundred-plus-year- | QUANTITY | 0.95+ |
first | QUANTITY | 0.93+ |
Deep Learning | ORGANIZATION | 0.93+ |
Dark Ages | DATE | 0.92+ |
Chief Data Officer | EVENT | 0.89+ |
Global Chief Data Officer | PERSON | 0.87+ |
tenth year anniversary | QUANTITY | 0.87+ |
#IBMCDO | EVENT | 0.84+ |
one place | QUANTITY | 0.84+ |
NVLink | OTHER | 0.82+ |
day one | QUANTITY | 0.8+ |
Vice President | PERSON | 0.77+ |
IBM CDO | EVENT | 0.77+ |
second | QUANTITY | 0.71+ |
four years | QUANTITY | 0.71+ |
2019 | DATE | 0.64+ |
Cube | PERSON | 0.61+ |
Cube | ORGANIZATION | 0.6+ |
three | QUANTITY | 0.58+ |
day | QUANTITY | 0.53+ |
noon | DATE | 0.51+ |
Cube | COMMERCIAL_ITEM | 0.45+ |
Data | PERSON | 0.43+ |
past | DATE | 0.43+ |
Chris Bannocks, ING & Steven Eliuk, IBM | IBM CDO Fall Summit 2018
(light music) >> Live from Boston. It's theCUBE. Covering IBM Chief Data Officer Summit. Brought to you by IBM. >> Welcome back everyone, to theCUBE's live coverage of the IBM CDO Summit here in Boston, Massachusetts. I'm your host, Rebecca Night. And I'm joined by my co-host, Paul Gillen. We have two guests for this segment. We have Steven Eliuk, who is the Vice President of Deep Learning Global Chief Data Officer at IBM. And Christopher Bannocks, Group Chief Data Officer at IMG. Thanks so much for coming on theCUBE. >> My pleasure. >> Before we get started, Steve, I know you have some very important CUBE fans that you need-- >> I do. >> To give a shout out to. Please. >> For sure. So I missed them on the last three runs of CUBE, so I'd like to just shout out to Santiago, my son. Five years old. And the shortest one, which is Elana. Miss you guys tons and now you're on the air. (all laughing) >> Excellent. To get that important piece of business out. >> Absolutely. >> So, let's talk about Metadata. What's the problem with Metadata? >> The one problem, or the many (chuckles)? >> (laughing) There are a multitude of problems. >> How long ya got? The problem is, it's everywhere. And there's lots of it. And bringing context to that and understanding it from enterprise-wide perspective is a huge challenge. Just connecting to it finding it, or collecting centrally and then understanding the context and what it means. So, the standardization of it or the lack of standardization of it across the board. >> Yeah, it's incredibly challenging. Just the immense scale of metadata at the same time dealing with metadata as Chris mentioned. Just coming up with your own company's glossary of terms to describe your own data. It's kind of step one in the journey of making your data discoverable and governed. Alright, so it's challenging and it's not well understood and I think we're very early on in these stages of describing our data. >> Yeah. >> But we're getting there. Slowly but surely. >> And perhaps in that context it's not only the fact that it's everywhere but actually we've not created structural solutions in a consistent way across industries to be able to structure it and manage it in an appropriate way. >> So, help people do it better. What are some of the best practices for creating, managing metadata? >> Well you can look at diff, I mean, it's such a broad space you can look at different ones. Let's just take the work we do around describing our data and we do that for for the purposes of regulation. For the purposes of GDPR et cetera et cetera. It's really about discovering and providing context to the data that we have in the organization today. So, in that respect it's creating a catalog and making sure that we have the descriptions and the structures of the data that we manage and use in the organization and to give you perhaps a practical example when you have a data quality problem you need to know how to fix it. So, you store, so you create and structure metadata around well, where does it come from, first of all. So what's the journey it's taken to get to the point where you've identified that there's a problem. But also then, who do we go to to fix it? Where did it go wrong in the chain? And who's responsible for it? Those are very simple examples of the metadata around, the transformations the data might have come through to get to its heading point. The quality metrics associated with it. And then, the owner or the data steward that it has to be routed back to to get fixed. >> Now all of those are metadata elements >> All of those, yeah. >> Right? >> 'Cause we're not really talking about the data. The data might be a debit or a credit. Something very simple like that in banking terms. But actually it's got lots of other attributes associated with it which essentially describe that data. So, what is it? Who owns it? What are the data quality metrics? How do I know whether what it's quality is? >> So where do organizations make mistakes? Do they create too much metadata? Do they create poor, is it poorly labeled? Is it not federated? >> Yes. (all laughing) >> I think it's a mix of all of them. One of the things that you know Chris alluded to and you might of understood is that it's incredibly labor-intensive task. There's a lot of people involved. And when you get a lot of people involved in sadly a quite time-consuming, slightly boring job there's errors and there's problem. And that's data quality, that's GDPR, that's government owned entities, regulatory issues. Likewise, if you can't discover the data 'cause it's labeled wrong, that's potential insight that you've now lost. Because that data's not discoverable to a potential project that's looking for similar types of data. Alright, so, kind of step one is trying to scribe your metadata to the organization. Creating a taxonomy of metadata. And getting everybody on board to label that data whether it be short and long descriptions, having good tools et cetera. >> I mean look, the simple thing is... we struggle as... As a capability in any organization we struggle with these terms, right? Metadata, well ya know, if you're talking to the business they have no idea what you're talking about. You've already confused them the minute you mentioned meta. >> Hashtag. >> Yeah (laughs) >> It's a hashtag. >> That's basically what it is. >> Essentially what it is it's just data about data. It's the descriptive components that tell you what it is you're dealing with. If you just take a simple example from finance; An interest rate on it's own tells you nothing. It could be the interest rate on a savings account. It can the interest rate on a bond. But on its own you have no clue, what you're talking about. A maturity date, or a date in general. You have to provide the context. And that is it's relationships to other data and the contexts that it's in. But also the description of what it is you're looking at. And if that comes from two different systems in an organization, let's say one in Spain and one in France and you just receive a date. You don't know what you're looking at. You have not context of what you're looking at. And simply you have to have that context. So, you have to be able to label it there and then map it to a generic standard that you implement across the organization in order to create that control that you need in order to govern your data. >> Are there standards? I'm sorry Rebecca. >> Yes. >> Are there standards efforts underway industry standard why difference? >> There are open metadata standards that are underway and gaining great deal of traction. There are an internally use that you have to standardize anyway. Irrespective of what's happening across the industry. You don't have the time to wait for external standards to exist in order to make sure you standardize internally. >> Another difficult point is it can be region or country specific. >> Yeah. >> Right, so, it makes it incredibly challenging 'cause every region you might work in you might have to have a own sub-glossary of terms for that specific region. And you might have to control the export of certain data with certain terms between regions and between countries. It gets very very challenging. >> Yeah. And then somehow you have to connect to it all to be able to see what it all is because the usefulness of this is if one system calls exactly the same, maps to let's say date. And it's local definition of that is maturity date. Whereas someone else's map date to birthdate you know you've got a problem. You just know you've got a problem. And exposing the problem is part of the process. Understanding hey that mapping's wrong guys. >> So, where do you begin? If your mission is to transform your organization to be one that is data-centric and the business side is sort of eyes glazing over at the mention of metadata. What kind of communication needs to happen? What kind of teamwork, collaboration? >> So, I mean teamwork and collaboration are absolutely key. The communication takes time. Don't expect one blast of communication to solve the problem. It is going to take education and working with people to actually get 'em to realize the importance of things. And to do that you need to start something. Just the communication of the theory doesn't work. No one can ever connect to it. You have to have people who are working on the data for a reason that is business critical. And you need have them experience the problem to recognize that metadata is important. Until they experience the problem you don't get the right amount of traction. So you have to start small and grow. >> And you can use potentially the whip as well. Governance, the regulatory requirements that's a nice one to push things along. That's often helpful. >> It's helpful, but not necessarily popular. >> No, no. >> So you have to give-- >> Balance. >> We're always struggling with that balance. There's a lot of regulation that drives the need for this. But equally, that same regulation essentially drives all of the same needs that you need for analytics. For good measurement of the data. For growth of customers. For delivering better services to customers. All of these things are important. Just the web click information you have that's all essentially metadata. The way we interact with our clients online and through mobile. That's all metadata. So it's not all whip or stick. There's some real value that is in there as well. >> These would seem to be a domain that is ideal for automation. That through machine learning contextualization machines should be able to figure a lot of this stuff out. Am I wrong? >> No, absolutely right. And I think there's, we're working on proof of concepts to prove that case. And we have IBM AMG as well. The automatic metadata generation capability using machine learning and AI to be able to start to auto-generate some of this insight by using existing catalogs, et cetera et cetera. And we're starting to see real value through that. It's still very early days but I think we're really starting to see that one of the solutions can be machine learning and AI. For sure. >> I think there's various degrees of automation that will come in waves for the next, immediately right now we have certain degrees where we have a very small term set that is very high confidence predictions. But then you want to get specific to the specificity of a company which have 30,000 terms sometimes. Internally, we have 6,000 terms at IBM. And that level of specificity to have complete automation we're not there yet. But it's coming. It's a trial. >> It takes time because the machine is learning. And you have to give the machine enough inputs and gradually take time. Humans are involved as well. It's not about just throwing the machine at something and letting it churn. You have to have that human involvement. It takes time to have the machine continue to learn and grow and give it more terms. And give it more context. But over time I think we're going to see good results. >> I want to ask about that human-in-the-loop as IBM so often calls it. One of the things that Nander Paul Bendery was talking about is how the CDO needs to be a change engine in chief. So how are the rank and file interpreting this move to automation and increase in machine learning in their organizations? Is it accepted? It is (chuckles) it is a source of paranoia and worry? >> I think it's a mix. I think we're kind of blessed at least in the CDO at IBM, the global CDO. Is that everyone's kind of on board for that mission. That's what we're doing >> Right, right. >> There's team members 25, 30 years on IMBs roster and they're just as excited as I am and I've only been there for 16 months. But it kind of depends on the project too. Ones that have a high impact. Everyone's really gung ho because we've seen process times go from 90 days down to a couple of days. That's a huge reduction. And that's the governance regulatory aspects but more for us it's a little bit about we're looking for the linkage and availability of data. So that we can get more insights from that data and better outcomes for different types of enterprise use cases. >> And a more satisfying work day. >> Yeah it's fun. >> That's a key point. Much better to be involved in this than doing the job itself. The job of tagging and creating metadata associated with the vast number of data elements is very hard work. >> Yeah. >> It's very difficult. And it's much better to be working with machine learning to do it and dealing with the outliers or the exceptions than it is chugging through. Realistically it just doesn't scale. You can't do this across 30,000 elements in any meaningful way or a way that really makes sense from a financial perspective. So you really do need to be able to scale this quickly and machine learning is the way to do it. >> Have you found a way to make data governance fun? Can you gamify it? >> Are you suggesting that data governance isn't fun? (all laughing) Yes. >> But can you gamify it? Can you compete? >> We're using gamification in various in many ways. We haven't been using it in terms of data governance yet. Governance is just a horrible word, right? People have really negative connotations associated with it. But actually if you just step one degree away we're talking about quality. Quality means better decisions. And that's actually all governance is. Governance is knowing where your data is. Knowing who's responsible for fixing if it goes wrong. And being able to measure whether it's right or wrong in the first place. And it being better means we make better decisions. Our customers have better engagement with us. We please our customers more and therefore they hopefully engage with us more and buy more services. I think we should that your governance is something we invented through the need for regulation. And the need for control. And from that background. But realistically it's just, we should be proud about the data that we use in the organization. And we should want the best results from it. And it's not about governance. It's about us being proud about what we do. >> Yeah, a great note to end on. Thank you so much Christopher and Steven. >> Thank you. >> Cheers. >> I'm Rebecca Night for Paul Gillen we will have more from the IBM CDO Summit here in Boston coming up just after this. (electronic music)
SUMMARY :
Brought to you by IBM. of the IBM CDO Summit here in Boston, Massachusetts. To give a shout out to. And the shortest one, which is Elana. To get that important piece of business out. What's the problem with Metadata? And bringing context to that It's kind of step one in the journey But we're getting there. it's not only the fact that What are some of the best practices and the structures of the data that we manage and use What are the data quality metrics? (all laughing) One of the things that you know Chris alluded to I mean look, the simple thing is... It's the descriptive components that tell you Are there standards? You don't have the time to wait it can be region or country specific. And you might have to control the export And then somehow you have to connect to it all What kind of communication needs to happen? And to do that you need to start something. And you can use potentially the whip as well. but not necessarily popular. essentially drives all of the same needs that you need machines should be able to figure a lot of this stuff out. And we have IBM AMG as well. And that level of specificity And you have to give the machine enough inputs is how the CDO needs to be a change engine in chief. in the CDO at IBM, the global CDO. But it kind of depends on the project too. Much better to be involved in this And it's much better to be Are you suggesting And the need for control. Yeah, a great note to end on. we will have more from the IBM CDO Summit here in Boston
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve | PERSON | 0.99+ |
Chris | PERSON | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Paul Gillen | PERSON | 0.99+ |
Christopher Bannocks | PERSON | 0.99+ |
Spain | LOCATION | 0.99+ |
France | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Rebecca | PERSON | 0.99+ |
Rebecca Night | PERSON | 0.99+ |
Five years | QUANTITY | 0.99+ |
90 days | QUANTITY | 0.99+ |
16 months | QUANTITY | 0.99+ |
30,000 elements | QUANTITY | 0.99+ |
6,000 terms | QUANTITY | 0.99+ |
30,000 terms | QUANTITY | 0.99+ |
Boston | LOCATION | 0.99+ |
one | QUANTITY | 0.99+ |
Chris Bannocks | PERSON | 0.99+ |
One | QUANTITY | 0.99+ |
two guests | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
Christopher | PERSON | 0.99+ |
25 | QUANTITY | 0.99+ |
Nander Paul Bendery | PERSON | 0.99+ |
GDPR | TITLE | 0.99+ |
Steven | PERSON | 0.99+ |
two different systems | QUANTITY | 0.98+ |
Elana | PERSON | 0.98+ |
ING | ORGANIZATION | 0.98+ |
IBM CDO Summit | EVENT | 0.97+ |
Santiago | PERSON | 0.96+ |
Vice President | PERSON | 0.95+ |
30 years | QUANTITY | 0.94+ |
step one | QUANTITY | 0.94+ |
IBM Chief Data Officer Summit | EVENT | 0.93+ |
one degree | QUANTITY | 0.93+ |
first | QUANTITY | 0.93+ |
IBM CDO Fall Summit 2018 | EVENT | 0.93+ |
today | DATE | 0.93+ |
one problem | QUANTITY | 0.92+ |
IBM AMG | ORGANIZATION | 0.92+ |
theCUBE | ORGANIZATION | 0.89+ |
days | QUANTITY | 0.88+ |
one system | QUANTITY | 0.82+ |
CUBE | ORGANIZATION | 0.81+ |
three runs | QUANTITY | 0.8+ |
Chief Data Officer | PERSON | 0.75+ |
Deep Learning | ORGANIZATION | 0.64+ |
of people | QUANTITY | 0.62+ |
Global | PERSON | 0.58+ |
IMG | ORGANIZATION | 0.57+ |
couple | QUANTITY | 0.56+ |
Data | PERSON | 0.49+ |
Sumit Gupta & Steven Eliuk, IBM | IBM CDO Summit Spring 2018
(music playing) >> Narrator: Live, from downtown San Francisco It's the Cube. Covering IBM Chief Data Officer Startegy Summit 2018. Brought to you by: IBM >> Welcome back to San Francisco everybody we're at the Parc 55 in Union Square. My name is Dave Vellante, and you're watching the Cube. The leader in live tech coverage and this is our exclusive coverage of IBM's Chief Data Officer Strategy Summit. They hold these both in San Francisco and in Boston. It's an intimate event, about 150 Chief Data Officers really absorbing what IBM has done internally and IBM transferring knowledge to its clients. Steven Eluk is here. He is one of those internal practitioners at IBM. He's the Vice President of Deep Learning and the Global Chief Data Office at IBM. We just heard from him and some of his strategies and used cases. He's joined by Sumit Gupta, a Cube alum. Who is the Vice President of Machine Learning and deep learning within IBM's cognitive systems group. Sumit. >> Thank you. >> Good to see you, welcome back Steven, lets get into it. So, I was um paying close attention when Bob Picciano took over the cognitive systems group. I said, "Hmm, that's interesting". Recently a software guy, of course I know he's got some hardware expertise. But bringing in someone who's deep into software and machine learning, and deep learning, and AI, and cognitive systems into a systems organization. So you guys specifically set out to develop solutions to solve problems like Steven's trying to solve. Right, explain that. >> Yeah, so I think ugh there's a revolution going on in the market the computing market where we have all these new machine learning, and deep learning technologies that are having meaningful impact or promise of having meaningful impact. But these new technologies, are actually significantly I would say complex and they require very complex and high performance computing systems. You know I think Bob and I think in particular IBM saw the opportunity and realized that we really need to architect a new class of infrastructure. Both software and hardware to address what data scientist like Steve are trying to do in the space, right? The open source software that's out there: Denzoflo, Cafe, Torch - These things are truly game changing. But they also require GPU accelerators. They also require multiple systems like... In fact interestingly enough you know some of the super computers that we've been building for the scientific computing world, those same technologies are now coming into the AI world and the enterprise. >> So, the infrastructure for AI, if I can use that term? It's got to be flexible, Steven we were sort of talking about that elastic versus I'm even extending it to plastic. As Sumit you just said, it's got to have that tooling, got to have that modern tooling, you've got to accommodate alternative processor capabilities um, and so, that forms what you've used Steven to sort of create new capabilities new business capabilities within IBM. I wanted to, we didn't touch upon this before, but we touched upon your data strategy before but tie it back to the line of business. You essentially are a presume a liaison between the line of business and the chief data office >> Steven: Yeah. >> Officer office. How did that all work out, and shake out? Did you defining the business outcomes, the requirements, how did you go about that? >> Well, actually, surprisingly, we have very little new use cases that we're generating internally from my organization. Because there's so many to pick from already throughout the organization, right? There's all these business units coming to us and saying, "Hey, now the data is in the data lake and now we know there's more data, now we want to do this. How do we do it?" You know, so that's where we come in, that's where we start touching and massaging and enabling them. And that's the main efforts that we have. We do have some derivative works that have come out, that have been like new offerings that you'll see here. But mostly we already have so many use cases that from those businesses units that we're really trying to heighten and bring extra value to those domains first. >> So, a lot of organizations sounds like IBM was similar you created the data lake you know, things like "a doop" made a lower cost to just put stuff in the data lake. But then, it's like "okay, now what?" >> Steven: Yeah. >> So is that right? So you've got the data and this bog of data and you're trying to make more sense out of it but get more value out of it? >> Steven: Absolutely. >> That's what they were pushing you to do? >> Yeah, absolutely. And with that, with more data you need more computational power. And actually Sumit and I go pretty far back and I can tell you from my previous roles I heightened to him many years ago some of the deficiencies in the current architecture in X86 etc and I said, "If you hit these points, I will buy these products." And what they went back and they did is they, they addressed all of the issues that I had. Like there's certain issues... >> That's when you were, sorry to interrupt, that's when you were a customer, right? >> Steven: That's when I was... >> An external customer >> Outside. I'm still an internal customer, so I've always been a customer I guess in that role right? >> Yep, yep. >> But, I need to get data to the computational device as quickly as possible. And with certain older gen technologies, like PTI Gen3 and certain issues around um x86. I couldn't get that data there for like high fidelity imaging for autonomous vehicles for ya know, high fidelity image analysis. But, with certain technologies in power we have like envy link and directly to the CPU. And we also have PTI Gen4, right? So, so these are big enablers for me so that I can really keep the utilization of those very expensive compute devices higher. Because they're not starved for data. >> And you've also put a lot of emphasis on IO, right? I mean that's... >> Yeah, you know if I may break it down right there's actually I would say three different pieces to the puzzle here right? The highest level from Steve's perspective, from Steven's teams perspective or any data scientist perspective is they need to just do their data science and not worry about the infrastructure, right? They actually don't want to know that there's an infrastructure. They want to say, "launch job" - right? That's the level of grand clarity we want, right? In the background, they want our schedulers, our software, our hardware to just seamlessly use either one system or scale to 100 systems, right? To use one GPU or to use 1,000 GPUs, right? So that's where our offerings come in, right. We went and built this offering called Powder and Powder essentially is open source software like TensorFlow, like Efi, like Torch. But performace and capabilities add it to make it much easier to use. So for example, we have an extremely terrific scheduling software that manages jobs called Spectrum Conductor for Spark. So as the name suggests, it uses Apache Spark. But again the data scientist doesn't know that. They say, "launch job". And the software actually goes and scales that job across tens of servers or hundreds of servers. The IT team can determine how many servers their going to allocate for data scientist. They can have all kinds of user management, data management, model management software. We take the open source software, we package it. You know surprisingly ugh most people don't realize this, the open source software like TensorFlow has primarily been built on a (mumbles). And most of our enterprise clients, including Steven, are on Redhat. So we, we engineered Redhat to be able to manage TensorFlow. And you know I chose those words carefully, there was a little bit of engineering both on Redhat and on TensorFlow to make that whole thing work together. Sounds trivial, took several months and huge value proposition to the enterprise clients. And then the last piece I think that Steven was referencing too, is we also trying to go and make the eye more accessible for non data scientist or I would say even data engineers. So we for example, have a software called Powder Vision. This takes images and videos, and automatically creates a trained deep learning model for them, right. So we analyze the images, you of course have to tell us in these images, for these hundred images here are the most important things. For example, you've identified: here are people, here are cars, here are traffic signs. But if you give us some of that labeled data, we automatically do the work that a data scientist would have done, and create this pre trained AI model for you. This really enables many rapid prototyping for a lot of clients who either kind of fought to have data scientists or don't want to have data scientists. >> So just to summarize that, the three pieces: It's making it simpler for the data scientists, just run the job - Um, the backend piece which is the schedulers, the hardware, the software doing its thing - and then its making that data science capability more accessible. >> Right, right, right. >> Those are the three layers. >> So you know, I'll resay it in my words maybe >> Yeah please. >> Ease of use right, hardware software optimized for performance and capability, and point and click AI, right. AI for non data scientists, right. It's like the three levels that I think of when I'm engaging with data scientists and clients. >> And essentially it's embedded AI right? I've been making the point today that a lot of the AI is going to be purchased from companies like IBM, and I'm just going to apply it. I'm not going to try to go build my own, own AI right? I mean, is that... >> No absolutely. >> Is that the right way to think about it as a practitioner >> I think, I think we talked about it a little bit about it on the panel earlier but if we can, if we can leverage these pre built models and just apply a little bit of training data it makes it so much easier for the organizations and so much cheaper. They don't have to invest in a crazy amount of infrastructure, all the labeling of data, they don't have to do that. So, I think it's definitely steering that way. It's going to take a little bit of time, we have some of them there. But as we as we iterate, we are going to get more and more of these types of you know, commodity type models that people could utilize. >> I'll give you an example, so we have a software called Intelligent Analytics at IBM. It's very good at taking any surveillance data and for example recognizing anomalies or you know if people aren't suppose to be in a zone. Ugh and we had a client who wanted to do worker safety compliance. So they want to make sure workers are wearing their safety jackets and their helmets when they're in a construction site. So we use surveillance data created a new AI model using Powder AI vision. We were then able to plug into this IVA - Intelligence Analytic Software. So they have the nice gooey base software for the dashboards and the alerts, yet we were able to do incremental training on their specific use case, which by the way, with their specific you know equipment and jackets and stuff like that. And create a new AI model, very quickly. For them to be able to apply and make sure their workers are actually complaint to all of the safety requirements they have on the construction site. >> Hmm interesting. So when I, Sometimes it's like a new form of capture says identify "all the pictures with bridges", right that's the kind of thing you're capable to do with these video analytics. >> That's exactly right. You, every, clients will have all kinds of uses I was at a, talking to a client, who's a major car manufacturer in the world and he was saying it would be great if I could identify the make and model of what cars people are driving into my dealership. Because I bet I can draw a ugh corelation between what they drive into and what they going to drive out of, right. Marketing insights, right. And, ugh, so there's a lot of things that people want to do with which would really be spoke in their use cases. And build on top of existing AI models that we have already. >> And you mentioned, X86 before. And not to start a food fight but um >> Steven: And we use both internally too, right. >> So lets talk about that a little bit, I mean where do you use X86 where do you use IBM Cognitive and Power Systems? >> I have a mix of both, >> Why, how do you decide? >> There's certain of work loads. I will delegate that over to Power, just because ya know they're data starved and we are noticing a complication is being impacted by it. Um, but because we deal with so many different organizations certain organizations optimize for X86 and some of them optimize for power and I can't pick, I have to have everything. Just like I mentioned earlier, I also have to support cloud on prim, I can't pick just to be on prim right, it so. >> I imagine the big cloud providers are in the same boat which I know some are your customers. You're betting on data, you're betting on digital and it's a good bet. >> Steven: Yeah, 100 percent. >> We're betting on data and AI, right. So I think data, you got to do something with the data, right? And analytics and AI is what people are doing with that data we have an advantage both at the hardware level and at the software level in these two I would say workloads or segments - which is data and AI, right. And we fundamentally have invested in the processor architecture to improve the performance and capabilities, right. You could offer a much larger AI models on a power system that you use than you can on an X86 system that you use. Right, that's one advantage. You can train and AI model four times faster on a power system than you can on an Intel Based System. So the clients who have a lot of data, who care about how fast their training runs, are the ones who are committing to power systems today. >> Mmm.Hmm. >> Latency requirements, things like that, really really big deal. >> So what that means for you as a practitioner is you can do more with less or is it I mean >> I can definitely do more with less, but the real value is that I'm able to get an outcome quicker. Everyone says, "Okay, you can just roll our more GPU's more GPU's, but run more experiments run more experiments". No no that's not actually it. I want to reduce the time for a an experiment Get it done as quickly as possible so I get that insight. 'Cause then what I can do I can get possibly cancel out a bunch of those jobs that are already running cause I already have the insight, knowing that that model is not doing anything. Alright, so it's very important to get the time down. Jeff Dean said it a few years ago, he uses the same slide often. But, you know, when things are taking months you know that's what happened basically from the 80's up until you know 2010. >> Right >> We didn't have the computation we didn't have the data. Once we were able to get that experimentation time down, we're able to iterate very very quickly on this. >> And throwing GPU's at the problem doesn't solve it because it's too much complexity or? >> It it helps the problem, there's no question. But when my GPU utilization goes from 95% down to 60% ya know I'm getting only a two-thirds return on investment there. It's a really really big deal, yeah. >> Sumit: I mean the key here I think Steven, and I'll draw it out again is this time to insight. Because time to insight actually is time to dollars, right. People are using AI either to make more money, right by providing better customer products, better products to the customers, giving better recommendations. Or they're saving on their operational costs right, they're improving their efficiencies. Maybe their routing their trucks in the right way, their routing their inventory in the right place, they're reducing the amount of inventory that they need. So in all cases you can actually coordinate AI to a revenue outcome or a dollar outcome. So the faster you can do that, you know, I tell most people that I engage with the hardware and software they get from us pays for itself very quickly. Because they make that much more money or they save that much more money, using power systems. >> We, we even see this internally I've heard stories and all that, Sumit kind of commented on this but - There's actually sales people that take this software & hardware out and they're able to get an outcome sometimes in certain situations where they just take the clients data and they're sales people they're not data scientists they train it it's so simple to use then they present the client with the outcomes the next day and the client is just like blown away. This isn't just a one time occurrence, like sales people are actually using this right. So it's getting to the area that it's so simple to use you're able to get those outcomes that we're even seeing it you know deals close quicker. >> Yeah, that's powerful. And Sumit to your point, the business case is actually really easy to make. You can say, "Okay, this initiative that you're driving what's your forecast for how much revenue?" Now lets make an assumption for how much faster we're going to be able to deliver it. And if I can show them a one day turn around, on a corpus of data, okay lets say two months times whatever, my time to break. I can run the business case very easily and communicate to the CFO or whomever the line of business head so. >> That's right. I mean just, I was at a retailer, at a grocery store a local grocery store in the bay area recently and he was telling me how In California we've passed legislation that does not allow plastic bags anymore. You have to pay for it. So people are bringing their own bags. But that's actually increased theft for them. Because people bring their own bag, put stuff in it and walk out. And he didn't want to have an analytic system that can detect if someone puts something in a bag and then did not buy it at purchase. So it's, in many ways they want to use the existing camera systems they have but automatically be able to detect fraudulent behavior or you know anomalies. And it's actually quite easy to do with a lot of the software we have around Power AI Vision, around video analytics from IBM right. And that's what we were talking about right? Take existing trained AI models on vision and enhance them for your specific use case and the scenarios you're looking for. >> Excellent. Guys we got to go. Thanks Steven, thanks Sumit for coming back on and appreciate the insights. >> Thank you >> Glad to be here >> You're welcome. Alright, keep it right there buddy we'll be back with our next guest. You're watching "The Cube" at IBM's CDO Strategy Summit from San Francisco. We'll be right back. (music playing)
SUMMARY :
Brought to you by: IBM and the Global Chief Data Office at IBM. So you guys specifically set out to develop solutions and realized that we really need to architect between the line of business and the chief data office how did you go about that? And that's the main efforts that we have. to just put stuff in the data lake. and I can tell you from my previous roles so I've always been a customer I guess in that role right? so that I can really keep the utilization And you've also put a lot of emphasis on IO, right? That's the level of grand clarity we want, right? So just to summarize that, the three pieces: It's like the three levels that I think of a lot of the AI is going to be purchased about it on the panel earlier but if we can, and for example recognizing anomalies or you know that's the kind of thing you're capable to do And build on top of existing AI models that we have And not to start a food fight but um and I can't pick, I have to have everything. I imagine the big cloud providers are in the same boat and at the software level in these two I would say really really big deal. but the real value is that We didn't have the computation we didn't have the data. It it helps the problem, there's no question. So the faster you can do that, you know, and they're able to get an outcome sometimes and communicate to the CFO or whomever and the scenarios you're looking for. appreciate the insights. with our next guest.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dave Vellante | PERSON | 0.99+ |
Steven Eluk | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Bob Picciano | PERSON | 0.99+ |
Steven | PERSON | 0.99+ |
Sumit | PERSON | 0.99+ |
Jeff Dean | PERSON | 0.99+ |
Sumit Gupta | PERSON | 0.99+ |
California | LOCATION | 0.99+ |
Boston | LOCATION | 0.99+ |
Bob | PERSON | 0.99+ |
San Francisco | LOCATION | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
three pieces | QUANTITY | 0.99+ |
100 systems | QUANTITY | 0.99+ |
two months | QUANTITY | 0.99+ |
100 percent | QUANTITY | 0.99+ |
2010 | DATE | 0.99+ |
hundred images | QUANTITY | 0.99+ |
1,000 GPUs | QUANTITY | 0.99+ |
95% | QUANTITY | 0.99+ |
The Cube | TITLE | 0.99+ |
one GPU | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
60% | QUANTITY | 0.99+ |
Denzoflo | ORGANIZATION | 0.99+ |
one system | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
tens of servers | QUANTITY | 0.99+ |
two-thirds | QUANTITY | 0.99+ |
Parc 55 | LOCATION | 0.99+ |
one day | QUANTITY | 0.98+ |
hundreds of servers | QUANTITY | 0.98+ |
one time | QUANTITY | 0.98+ |
X86 | COMMERCIAL_ITEM | 0.98+ |
IBM Cognitive | ORGANIZATION | 0.98+ |
80's | DATE | 0.98+ |
three levels | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
Both | QUANTITY | 0.97+ |
CDO Strategy Summit | EVENT | 0.97+ |
Spark | TITLE | 0.96+ |
one advantage | QUANTITY | 0.96+ |
Spectrum Conductor | TITLE | 0.96+ |
Torch | TITLE | 0.96+ |
X86 | TITLE | 0.96+ |
Vice President | PERSON | 0.95+ |
three different pieces | QUANTITY | 0.95+ |
PTI Gen4 | COMMERCIAL_ITEM | 0.94+ |
three layers | QUANTITY | 0.94+ |
Union Square | LOCATION | 0.93+ |
TensorFlow | TITLE | 0.93+ |
Torch | ORGANIZATION | 0.93+ |
PTI Gen3 | COMMERCIAL_ITEM | 0.92+ |
Efi | TITLE | 0.92+ |
Startegy Summit 2018 | EVENT | 0.9+ |
Ed Walsh & Steven Eliuk, IBM | IBM CDO Summit Spring 2018
>> Announcer: Live from downtown San Francisco, it's theCUBE covering IBM Chief Data Officer Strategy Summit 2018, brought to you by IBM. (upbeat music) >> Welcome back to San Francisco, everybody. You're watching theCUBE, the leader in live tech coverage. We're covering the IBM Chief Data Officer Strategy Summit #ibmcdo. Ed Walsh is here. He's the General Manager of IBM Storage, and Steven Eliuk who's the Vice President of Deep Learning in the Global Chief Data Office at IBM, Steven. >> Yes, sir. >> Good to see you again. Welcome to The CUBE. >> Pleasure to be here. So there's a great story. We heard Inderpal Bhandari this morning talk about the enterprise data blueprint and laying out to the practitioners how to get started, how to implement, and we're going to have a little case study as to actually how you're doing this. But Ed, set it up for us. >> Okay, so we're at this Chief Data Officer Summit in the Spring, we do it twice a year and really get just Chief Data Officers together to think through their different challenges and actually share. So that's where we're at the Summit. And what we've, as IBM, as kind of try to be a foot forward, be that cognitive enterprise and showing very transparently what we're doing at our organization be more data-driven. And we've talked a bunch of different times. Everyone needs to be data-driven. Everyone wants to be data-driven, but it's really challenging for organizations. So what we're doing is with this blueprint which we're showing as a showcase, in fact you can actually physically come in and see our environment. But more importantly we're being very transparent on all the different components, high-level processes, what we did in governance, but also down to the Lilly Technology level and sharing that with our... Not because they want to do all of it, but maybe they want to do some of it or half of it, but it would be a blueprint that's worked. And then we're being transparent about what we're getting internally for our own transformation as IBM. Because really if we looked at this as a platform, it's really an enterprise cognitive data platform that all of IBM uses on all our transformation work. So our client, in fact, is Steven, and I think you can give what are we doing. By the way, it also, same type of infrastructure allows you to do what we did in the national labs, the largest supercomputers in the world, same infrastructure and the same thing we're trying to do, is make it easier for people to get insights from the data at scale in the enterprise. So that's why I want to bring Steven on. >> I joked with Inderpal. I said, "Well, if you can do it at IBM, "if you can do it there you can do it anywhere," (Ed laughing) because he's point oh. We're at a highly complex organization. So Steven, take us through how you got started and what you're doing. >> For sure, so I'm what's referred to probably as a difficult customer. So because we're so multifaceted we have so many different use cases internally in the orders of hundreds, it doesn't mean that I can just say, "Hey, this is a specific pattern that I need, Ed. "You need to make sure your hardware is sufficient in this area," because the next day I'm going to be hitting him and say, "Hey Ed, I need you to make sure "that it's also efficient in terms of bandwidth as well." And that's the beauty of working in this domain, is that I have those hundreds of use cases and it means that I'm hitting low latency requirements, bandwidth requirements, extensibility requirements because I have a huge number of headcount that I'm bringing on as well. And if I'm good now I don't have to worry about in six months to be stating, "Hey, I need to roll out new infrastructure "so I can support these new data scientists "and effectively so that they can get outcomes quicker." And I'd need to make sure that all the infrastructure behind the scenes is extensible and supports my users. And what I don't want them to have to worry about specifically is how that infrastructure works. I want them to focus on those use cases, those enterprise use cases, and I want them to touch as many of those use cases as possible. >> So Inderpal laid out sort of his five things that a CDO should do. He starts with develop a clear data strategy. So as the doer in the organization, how'd you go about doing that? Presumably you participated in that data strategy, but you're representing the lines of business presumably to make sure that it's of value to them. You can accelerate business value, but how did you start? I mean that's a big challenge, chewy. >> For sure, yeah, it's a huge challenge. And I think effectively curating, locating, governing, and quality aspects of that data is one of the first aspects. And where does that data reside, though, and how do we access it quickly? How does it support structured and unstructured data effectively? Those are all really important questions that had to come to light. And that's some of the approaches that we took. We look at the various business units and we look at are they curating the data correctly? Is it the data that we need? Maybe we have to augment that curation process before we actually are able to kind of apply new techniques, new machine-learning techniques, to that use case. There's a number of different aspects that kind of get rolled into that, and bringing effective storage and effective compute to the table really accelerates us in that journey. >> So Ed, what are the fundamental aspects of the infrastructure that supports this sort of emerging workload? >> Yeah, no, good question. And some of it is what we're going to talk about, what's a storage layer and what's a compute layer, but also what are the tools we're putting in place to use a lot of these open-source toolsets and make it easier for people to use but also use that underlying infrastructure better. So if you look at the high level, we use a storage infrastructure that is built for these AI workloads which is closer to an HPC workload. So the same infrastructure we use, we use the term ESS or elastic storage server. It's a combination. It's a turnkey solution, half rack, full rack. But it can start very small and grow to the biggest supercomputers in the world like what we're doing in the national labs, like the largest top five supercomputers in the world. But what that is is a file system called Spectrum Scale. Allows you to scale up at the performance but also low latency, gets added to the metadata but also high throughput. So we can do layers on that either on flash being all the hot tiers'll be on flash because it's not just the throughput you need which is high. So our lowest end box's close to like what, 26 gigabytes a second. Our highest one like national labs is 4.9 terabytes a second throughput. But it's also the low latency quick access. So we have a storage infrastructure but then we also have high-performance compute. So what we have is our Power Systems, our POWER9 Systems with GPUs, and the idea is how do you, we use the term feed the beast? How do you have the right throughput or IOPS to get the data close to that CPU or the GPU? The Power Systems have a unique bandwidth, so it's not like what you just find from a Comodo, the Intel servers. It's a much faster throughput, so it allows us to actually get data between the GPU CPU in storage or memory very fast. So you can get these deep learning times, and maybe you can share some of that. The learning times go up dramatically, so you get the insight. And then we're also putting layers on top which are IBM Cloud Private, is basically how do you have a hybrid cloud container-based service that allows you to move things seamlessly across and not have to wrestle with how to put all these things together either so it works seamlessly between a public cloud and private cloud? Then we have these toolsets, and I talked about this last time. It might not seem like storage or what you have in APU but we use the term PowerAI, is taking all these machine-learning tools because everyone always used open source. But we make them one more scale but also to ease your use. So how do you use a bunch of great GPUs and CPUs, great throughput, and how do you scale that? A lot of these tools were basically to be run on one CPU. So to be distributed, key research from IBM allows you to actually with PowerAI take the same TensorFlow workflows or dot dot dot and run it across a grid dramatically changing what you're doing from learning times. But anyway you can probably give more, I think, but it's a multiple layer. It's not one thing but it's not what you use for digital storage infrastructure, compute infrastructure for normal workloads. It is custom so you can't... A lot of people try to deploy maybe their NAS storage box and maybe it's flash and try to deploy it. And you can get going that way but then you hit a wall real quick. This is purposely built for AI. >> So Beth Smith was on earlier. She threw out a stat. She said that 85% of their, based on some research, I'm not sure if it was IBM or Forrest or Gartner, said 85% of customers they talked to said AI will be a competitive advantage but only 20% can use it today at scale. So obviously scale is a big challenge, and I want to ask you to comment on another potential challenge. We always talk about elastic infrastructure. You scale up, scale down, or end of month, okay. We sometimes use this concept of plastic infrastructure. Basically plastic maintains its shape because these workloads are so diverse. I don't want to have to rip down my infrastructure and bring in a new one every time my workload changes. So I wonder if you can talk about the sort of requirements from your perspective both in terms of scale and in terms of adaptability to changing workloads. >> Well, I think one of the things that Ed brought up that's really, really important is these open-source frameworks assume that it's running on a single system. They assume that storage is actually local, and that's really the only way that you get really effective throughput from it, is if it's local. So extending it via PowerAI, via these appliances and so forth means that you can use petabytes of storage at a distance and still have good throughput and not have those GP utilization coming down because these are very expensive devices. So if the storage is the blocker, is their controller and he's limiting that flow of data then ultimately you're not making the most effective use of those very expensive computational mediums. But more importantly it means that your time from ideation to product is slowed down, so you're not able to get those business outcomes. That means your competitor could get those business outcomes if they don't have it. And for me what's really important is I mentioned this briefly earlier, is that I need those specialists to touch as much of the data or as much as those enterprise use cases as possible. At the end of the year it's not about touching three use cases. It's the touching three this year, five, ten, more and more and more. And with the infrastructure being storage and computation, all of that is key attributes to kind of seeing that goal. >> Without having to rip that down and then repurpose building it every time. >> Steven: Yeah. >> And just being able to deal with the grid as a grid and you can place workloads across a grid. >> 100%. >> That's our Spectrum compute products that we've been doing for all the major banks in the world to do that and take these workloads and place them across a grid is also a key piece of this. So we always talk about the infrastructures being hey, Ed, that's not storage or infrastructure. No, you need that. And that's why it's part of my portfolio to actually build out the overall infrastructure for people to build on prim but also talk about everything we did with you on prim is hybrid. It's goes to the Cloud natively because some workloads we believe will be on the Cloud for good reasons, and you need to have that part of it. So everything we're going with you is hybrid cloud today, not in the future, today. >> No, 100%, and that's one of the requirements in our organization that we call A-1 architecture. If we write it for our own prim we have to be able to run it on the Cloud and it has to have the same look and feel and painted glass and things like that as well. So it means we only have to write it once, so we're incredibly efficient because we don't have to write it multiple times for different types of infrastructure. Likewise we have expectations from the data scientists that the performance all still have to be up to par as well. We want to really be moving the computation directly to where the data resides and we know that it's not just on prim, it's not in the Cloud, it's a hybrid scenario. >> So don't hate me for asking you this, Ed, but you've only been here for a couple years. Did you just stumble into this? You got this vast portfolio, you got this tooling, you got cloud. You got a part of your organization saying we got to do on prim. The other part's saying we got to do public. Or was this designed to the workload? Was kind of a little bit of both? >> Well, I think luck is good, but it's a embarrassment of riches inside IBM between our primary research, some of the things we were just talking about. How do you run these frameworks in a distributed fashion and not designed that way and do it performing at scale? That's our primary, that's research. That's not even in my group. What we're doing is for workload management. That's in storage, but we have these toolsets. The key thing is work with the clients to figure out what they're trying to do. Everyone's trying to be data-driven, so as we looked at what you need to do to be truly data-driven, it's not just having faster storage although that's important. It's not about the throughput or having to scale up. It's not about having just the CPUs. It's not just about having the open frameworks, but it's how to put that all together that we're invisible. In fact you said it earlier. He doesn't want his users to know at all what's underneath. He just wants to run their workload. You have people from my organization because I'm one of your customers. You're my customer but we go to you and say, "We're trying to use your platform "for a 360 view of the client," and our not data scientists, not data engineers, but ops team can use his platform. So anyway, so I actually think it's because IBM has its broad portfolio that we can bring together. And when IBM shows up which we're showing up in AI together in the Cloud, that's when you see something that we can truly do that you can't get from other organizations. And it's because of the technology differentiation we have from the different groups, but also the industry contacts that we bring. >> 100%. >> And also when you're dealing with data it is the trust. We can engage the clients at a high level and help them because we're not a single-product company. We might be more complex, but when we show up and bring the solution set we can really differentiate. And I think that's when IBM shows up. It's pretty powerful. >> And I think it's moved from "trust me" as well to "show me," and we're able to show it now because we're eating what we're producing. So we're showing. They called it a blueprint. We're using that effectively inside the organization. >> So now that you've sort of built this out internally you spend a lot of time with clients kind of showing them or...? >> Probably 15% of my time. >> So not that much. >> No, no, because I'm in charge of internal transformation operations. They're expecting outcomes from us. But at the same time there's clients that are in the exact same boat. The realization that this is really interesting. There's a lot of noise, a lot of interesting stuff in AI out there from Google, from Facebook, from Amazon, from all, Microsoft, but image recognition isn't important to me. How do I do it for my own organization? I have legacy data from 50 years. This is totally different, and there's no Git repo that I can go to and download them all and use it. It's totally custom, and how do I handle that? So it's different for these guys. >> What's on your wishlist? What's on Ed's to do list? >> Oh geez, uh... I want it so simple for my data scientists that they don't have to worry about where the data's coming from. Whether it be a traditional relational database or an object store, I want it to feed that data effectively and I don't want to have to have them looking into where the data is to make sure the computation's there. I want it just to flow effortlessly. That's really the wishlist. Likewise, I think if we had new accelerators in general outside the box, not something from the traditional GPU viewpoint, maybe data flow or something in new avant-garde-type stuff, that would be interesting because I think it might open up a new train of thought in the area just like GPUs did for us. >> Great story. >> Yeah I know, I think it's... So we're talking about AI for business, and I think what you're seeing is we're trying to showcase what IBM's doing to be really an AI business. And what we've done in this platform is really a showcase. So we're trying to be as transparent as possible not because it's the only way to do it but it's a good example of how a very complex business is using AI to get dramatically better and everyone's using the same kind of platform. >> Well, we learned, we effectively learned being open is much better than being closed. Look at the AI community. Because of its openness that's where we're at right now. And following the same lead we're doing the same thing, and that's why we're making everything available. You can see it and we're doing it, and we're happy to talk to you about it. >> Awesome, all right, so Steven, you stay here. >> Yeah. >> We're going to bring Sumit on and we're going to drill down into the cognitive platform. >> That's good. This guy, thanks for setting it up. I really, really appreciate it. >> Thank you very much. >> All right, good having you guys. All right, keep it right there, everybody. We'll be back at the IBM CDO Strategy Summit. You're watching theCUBE. (upbeat music) (telephone dialing) (modem connecting)
SUMMARY :
Strategy Summit 2018, brought to you by IBM. in the Global Chief Data Office at IBM, Steven. Good to see you again. and laying out to the practitioners and I think you can give what are we doing. So Steven, take us through how you got started because the next day I'm going to be hitting him So as the doer in the organization, And that's some of the approaches that we took. because it's not just the throughput you need and I want to ask you to comment on and that's really the only way Without having to rip that down and you can place workloads across a grid. but also talk about everything we did with you that the performance all still have to be So don't hate me for asking you this, Ed, And it's because of the technology differentiation we have and help them because we're not a single-product company. and we're able to show it now So now that you've sort of built this out internally that I can go to and download them all and use it. that they don't have to worry about and I think what you're seeing is we're trying to showcase and we're happy to talk to you about it. and we're going to drill down I really, really appreciate it. We'll be back at the IBM CDO Strategy Summit.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steven | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Forrest | ORGANIZATION | 0.99+ |
Gartner | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
15% | QUANTITY | 0.99+ |
Ed | PERSON | 0.99+ |
85% | QUANTITY | 0.99+ |
Inderpal Bhandari | PERSON | 0.99+ |
Beth Smith | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
five | QUANTITY | 0.99+ |
San Francisco | LOCATION | 0.99+ |
five things | QUANTITY | 0.99+ |
50 years | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
ten | QUANTITY | 0.99+ |
six months | QUANTITY | 0.99+ |
one | QUANTITY | 0.98+ |
PowerAI | TITLE | 0.98+ |
ORGANIZATION | 0.98+ | |
360 | QUANTITY | 0.98+ |
Intel | ORGANIZATION | 0.98+ |
single | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
20% | QUANTITY | 0.97+ |
hundreds | QUANTITY | 0.97+ |
first aspects | QUANTITY | 0.97+ |
this year | DATE | 0.96+ |
single system | QUANTITY | 0.96+ |
twice a year | QUANTITY | 0.95+ |
IBM CDO Strategy Summit | EVENT | 0.95+ |
IBM CDO Summit | EVENT | 0.94+ |
three use cases | QUANTITY | 0.94+ |
IBM Storage | ORGANIZATION | 0.94+ |
Git | TITLE | 0.94+ |
one thing | QUANTITY | 0.92+ |
#ibmcdo | LOCATION | 0.91+ |
Vice President | PERSON | 0.9+ |
five supercomputers | QUANTITY | 0.88+ |
this morning | DATE | 0.88+ |
A-1 | OTHER | 0.87+ |
IBM Chief Data Officer Strategy Summit 2018 | EVENT | 0.87+ |
Inderpal | PERSON | 0.86+ |
Chief Data Officer Strategy Summit | EVENT | 0.86+ |
26 gigabytes a second | QUANTITY | 0.84+ |
4.9 terabytes a second | QUANTITY | 0.83+ |
Data Officer | EVENT | 0.83+ |
hundreds of use cases | QUANTITY | 0.82+ |
once | QUANTITY | 0.81+ |
couple years | QUANTITY | 0.77+ |
day | DATE | 0.73+ |
Deep Learning | ORGANIZATION | 0.69+ |
Bina Hallman & Steven Eliuk, IBM | IBM Think 2018
>> Announcer: Live, from Las Vegas, it's theCUBE. Covering IBM Think 2018. Brought to you by IBM. >> Welcome back to IBM Think 2018. This is theCUBE, the leader in live tech coverage. My name is Dave Vellante and I'm here with Peter Burress. Our wall-to-wall coverage, this is day two. Everything AI, Blockchain, cognitive, quantum computing, smart ledger, storage, data. Bina Hallman is here, she's the Vice President of Offering Management for Storage and Software Defined. Welcome back to theCUBE, Bina. >> Bina: Thanks for having me back. >> Steve Elliot is here. He's the Vice President of Deep Learning in the Global Chief Data Office at IBM. >> Thank you sir. >> Dave: Welcome to the Cube, Steve. Thanks, you guys, for coming on. >> Pleasure to be here. >> That was a great introduction, Dave. >> Thank you, appreciate that. Yeah, so this has been quite an event, consolidating all of your events, bringing your customers together. 30,000 40,000, too many people to count. >> Very large event, yes. >> Standing room only at all the sessions. It's been unbelievable, your thoughts? >> It's been fantastic. Lots of participation, lots of sessions. We brought, as you said, all of our conferences together and it's a great event. >> So, Steve, tell us more about your role. We were talking off the camera, we've had here Paul Bhandari on before, Chief Data Officer at IBM. You're in that office, but you've got other roles around Deep Learning, so explain that. >> Absolutely. >> Sort of multi-tool star here. >> For sure, so, roles and responsibility at IBM and the Chief Data Office, kind of two pillars. We focus in the Deep Learning group on foundation platform components. So, how to accelerate the infrastructure and platform behind the scenes, to accelerate the ideation or product phase. We want data scientists to be very effective, and for us to ensure our projects very very quickly. That said, I mentioned projects, so on the applied side, we have a number of internal use cases across IBM. And it's not just hand vault, it's in the orders of hundreds and those applied use cases are part of the cognitive plan, per se, and each one of those is part of the transformation of IBM into our cognitive. >> Okay, now, we were talking to Ed Walsh this morning, Bina, about how you collaborate with colleagues in the storage business. We know you guys have been growing, >> Bina: That's right. >> It's the fourth quarter straight, and that doesn't event count, some of the stuff that you guys ship on the cloud in storage, >> That's right, that's right. >> Dave: So talk about the collaboration across company. >> Yeah, we've had some tremendous collaboration, you know, the broader IBM and bringing all of that together, and that's one of the things that, you know, we're talking about here today with Steve and team is really as they built out their cognitive architecture to be able to then leverage some of our capabilities and the strengths that we bring to the table as part of that overall architecture. And it's been a great story, yeah. >> So what would you add to that, Steve? >> Yeah, absolutely refreshing. You know I've built up super computers in the past, and, specifically for deep learning, and coming on board at IBM about a year ago, seeing the elastic storage solution, or server. >> Bina: Yeah, elastic storage server, yep. >> It handles a number of different aspects of my pipeline, very uniquely, so for starters, I don't want to worry about rolling out new infrastructure all the time. I want to be able to grow my team, to grow my projects, and that's what nice about ESS is it's distensible, I'm able to roll out more projects, more people, multi-tenancy et cetera, and it supports us effectively. Especially, you know, it has very unique attributes like the read only performance feed, and random access of data, is very unique to the offering. >> Okay, so, if you're a customer of Bina's, right? >> I am, 100%. >> What do you need for infrastructure for Deep Learning, AI, what is it, you mentioned some attributes before, but, take it down a little bit. >> Well, the reality is, there's many different aspects and if anything kind of breaks down, then the data science experience breaks down. So, we want to make sure that everything from the interconnect of the pipelines is effective, that you heard Jensen earlier today from Nvidia, we've got to make sure that we have compute devices that, you know, are effective for the computation that we're rolling out on them. But that said, if those GPUs are starved by data, that we don't have the data available which we're drawing from ESS, then we're not making effective use of those GPUs. It means we have to roll out more of them, et cetera, et cetera. And more importantly, the time for experimentation is elongated, so that whole idea, so product timeline that I talked about is elongated. If anything breaks down, so, we've got to make sure that the storage doesn't break down, and that's why this is awesome for us. >> So let me um, especially from a deep learning standpoint, let me throw, kind of a little bit of history, and tell me if you think, let me hear your thoughts. So, years ago, the data was put as close to the application as possible, about 10, 15 years ago, we started breaking the data from the application, the storage from the application, and now we're moving the algorithm down as close to the data as possible. >> Steve: Yeah. >> At what point in time do we stop calling this storage, and start acknowledging that we're talking about a fabric that's actually quite different, because we put a lot more processing power as close to the data as possible. We're not just storing. We're really doing truly, deeply distributing computing. What do you think? >> There's a number of different areas where that's coming from. Everything from switches, to storage, to memory that's doing computing very close to where the data actually residents. Still, I think that, you know, this is, you can look all the way back to Google file system. Moving computation to where the data is, as close as possible, so you don't have to transfer that data. I think that as time goes on, we're going to get closer and closer to that, but still, we're limited by the capacity of very fast storage. NVMe, very interesting technology, still limited. You know, how much memory do we have on the GPUs? 16 gigs, 24 is interesting, 48 is interesting, the models that I want to train is in the 100s of gigabytes. >> Peter: But you can still parallelize that. >> You can parallelize it, but there's not really anything that's true model parallelism out there right now. There's some hacks and things that people are doing, but. I think we're getting there, it's still some time, but moving it closer and closer means we don't have to spend the power, the latency, et cetera, to move the data. >> So, does that mean that the rate of increase of data and the size of the objects we're going to be looking at, is still going to exceed the rate of our ability to bring algorithms and storage, or algorithms and data together? What do you think? >> I think it's getting closer, but I can always just look at the bigger problem. I'm dealing with 30 terabytes of data for one of the problems that I'm solving. I would like to be using 60 terabytes of data. If I could, if I could do it in the same amount of time, and I wasn't having to transfer it. With that said, if you gave me 60, I'd say, "I really wanted 120." So, it doesn't stop. >> David: (laughing) You're one of those kind of guys. >> I'm definitely one of those guys. I'm curious, what would it look like? Because what I see right now is it would be advantageous, and I would like to do it, but I ran 40,000 experiments with 30 terabytes of data. It would be four times the amount of transfer if I had to run that many experiments of 120. >> Bina, what do you think? What is the fundamental, especially from a software defined side, what does the fundamental value proposition of storage become, as we start pushing more of the intelligence close to the data? >> Yeah, but you know the storage layer fundamentally is software defined, you still need that setup, protocols, and the file system, the NFS, right? And, so, some of that still becomes relevant, even as you kind of separate some of the physical storage or flash from the actual compute. I think there's still a relevance when you talk about software defined storage there, yeah. >> So you don't expect that there's going to be any particular architectural change? I mean, NVMe is going to have a real impact. >> NVMe will have a real impact, and there will be this notion of composable systems and we will see some level of advancement there, of course, and that's around the corner, actually, right? So I do see it progressing from that perspective. >> So what's underneath it all, what actually, what products? >> Yeah, let me share a little bit about the product. So, what Steve and team are using is our elastic storage server. So, I talked about software defined storage. As you know, we have a very complete set of software defined storage offerings, and within that, our strategy has always been allow the clients to consume the capabilities the way they want. A software only on their own hardware, or as a service, or as an integrated solution. And so what Steve and team are using is an integrated solution with our spectrum scale software, along with our flash and power nine server power systems. And on the software side from spectrum scale, this is a very rich offering that we've had in our portfolio. Highly scalable file system, it's one of the solutions that powers a lot of our supercomputers. A project that we are still in the process and have delivered on around Whirl, our national labs. So same file system combined with a set of servers and flash system, right? Highly scalable, erasure coding, high availability as well as throughput, right? 40 gigabytes per second, so that's the solution, that's the storage and system underneath what Steve and team are leveraging. >> Steve, you talk about, "you want more," what else is on Bina's to-do-list from your standpoint? >> Specifically targeted at storage, or? >> Dave: Yeah, what do you want from the products? >> Well, I think long stretch goals are multi-tenancy and the wide array of dimensions that, especially in the chief data office, that we're dealing with. We have so many different business units, so many different of those enterprise problems in the orders of hundreds how do you effectively use that storage medium driving so many different users? I think it's still hard, I think we're doing it a hell of a lot better than we ever have, but it's still, it's an open research area. How do you do that? And especially, there's unique attributes towards deep learning, like, most of the data is read only to a certain degree. When data changes there's some consistency checks that could be done, but really, for my experiment that's running right now, it doesn't really matter that it's changed. So there's a lot of nuances specific to deep learning that I would like exploited if I could, and that's some of the interactions that we're working on to kind of alleviate those pains. >> I was at a CDO conference in Boston last October, and Indra Pal was there and he presented this enterprise data architecture, and there were probably about three or four hundred CDOs, chief data officers, in the room, to sort of explain that. Can you, sort of summarize what that is, and how it relates to sort of what you do on a day to day basis, and how customers are using it? >> Yeah, for sure, so the architecture is kind of like the backbone and rules that kind of govern how we work with the data, right? So, the realities are, there's no sort of blueprint out there. What works at Google, or works at Microsoft, what works at Amazon, that's very unique to what they're doing. Now, IBM has a very unique offering as well. We have so many, we're a composition of many, many different businesses put together. And now, with the Chief Data Office that's come to light across many organizations like you said, at the conference, three to 400 people, the requirements are different across the orders. So, bringing the data together is kind of one of the big attributes of it, decreasing the number of silos, making a monolithic kind of reliable, accessible entity that various business units can trust, and that it's governed behind the scenes to make sure that it's adhering to everyone's policies, that their own specific business unit has deemed to be their policy. We have to adhere to that, or the data won't come. And the beauty of the data is, we've moved into this cognitive era, data is valuable but only if we can link it. If the data is there, but there's no linkages there, what do I do with it? I can't really draw new insights. I can't draw, all those hundreds of enterprise use cases, I can't build new value in them, because I don't have any more data. It's all about linking the data, and then looking for alternative data sources, or additional data sources, and bringing that data together, and then looking at the new insights that come from it. So, in a nutshell, we're doing that internally at IBM to help our transformation. But at the same time creating a blueprint that we're making accessible to CDOs around the world, and our enterprise customers around the world, so they can follow us on this new adventure. New adventure being, you know, two years old, but. >> Yeah, sure, but it seems like, if you're going to apply AI, you've got to have your data house in order to do that. So this sounds like a logical first step, is that right? >> Absolutely, 100%. And, the realities are, there's a lot of people that are kicking the tires and trying to figure out the right way to do that, and it's a big investment. Drawing out large sums of money to kind of build this hypothetical better area for data, you need to have a reference design, and once you have that you can actually approach the C-level suite and say, "Hey, this is what we've seen, this is the potential, "and we have an architecture now, "and they've already gone down all the hard paths, "so now we don't have to go down as many hard paths." So, it's incredibly empowering for them to have that reference design and learning from our mistakes. >> Already proven internally now, bringing it to our enterprise alliance. >> Well, and so we heard Jenny this morning talk about incumbent disruptors, so I'm kind of curious as to what, any learnings you have there? It's early days, I realize that, but when you think about, the discussions, are banks going to lose control of the payment systems? Are retail stores going to go away? Is owning and driving your own vehicle going to be the exception, not the norm? Et cetera, et cetera, et cetera, you know, big questions, how far can we take machine intelligence? Have you seen your clients begin to apply this in their businesses, incumbents, we saw three examples today, good examples, I thought. I don't think it's widespread yet, but what are you guys seeing? What are you learning, and how are you applying that to clients? >> Yeah, so, I mean certainly for us, from these new AI workloads, we have a number of clients and a number of different types of solutions. Whether it's in genomics, or it's AI deep learning in analyzing financial data, you know, a variety of different types of use cases where we do see clients leveraging the capabilities, like spectrum scale, ESS, and other flash system solutions, to address some of those problems. We're seeing it now. Autonomous driving as well, right, to analyze data. >> How about a little road map, to end this segment? Where do you want to take this initiative? What should we be looking for as observers from the outside looking in? >> Well, I think drawing from the endeavors that we have within the CDO, what we want to do is take some of those ideas and look at some of the derivative products that we can take out of there, and how do we kind of move those in to products? Because we want to make it as simple as possible for the enterprise customer. Because although, you see these big scale companies, and all the wonderful things that they're doing, what we've had the feedback from, which is similar to our own experiences, is that those use cases aren't directly applicable for most of the enterprise customers. Some of them are, right, some of the stuff in vision and brand targeting and speech recognition and all that type of stuff are, but at the same time the majority and the 90% area are not. So we have to be able to bring down sorry, just the echoes, very distracting. >> It gets loud here sometimes, big party going on. >> Exactly, so, we have to be able to bring that technology to them in a simpler form so they can make it more accessible to their internal data scientists, and get better outcomes for themselves. And we find that they're on a wide spectrum. Some of them are quite advanced. It doesn't mean just because you have a big name you're quite advanced, some of the smaller players have a smaller name, but quite advanced, right? So, there's a wide array, so we want to make that accessible to these various enterprises. So I think that's what you can expect, you know, the reference architecture for the cognitive enterprise data architecture, and you can expect to see some of the products from those internal use cases come out to some of our offerings, like, maybe IGC or information analyzer, things like that, or maybe the Watson studio, things like that. You'll see it trickle out there. >> Okay, alright Bina, we'll give you the final word. You guys, business is good, four straight quarters of growth, you've got some tailwinds, currency is actually a tailwind for a change. Customers seem to be happy here, final word. >> Yeah, no, we've got great momentum, and I think 2018 we've got a great set of roadmap items, and new capabilities coming out, so, we feel like we've got a real strong set of future for our IBM storage here. >> Great, well, Bina, Steve, thanks for coming on theCUBE. We appreciate your time. >> Thank you. >> Nice meeting you. >> Alright, keep it right there everybody. We'll be back with our next guest right after this. This is day two, IBM Think 2018. You're watching theCUBE. (techno jingle)
SUMMARY :
Brought to you by IBM. Bina Hallman is here, she's the Vice President He's the Vice President of Deep Learning Dave: Welcome to the Cube, Steve. Yeah, so this has been quite an event, Standing room only at all the sessions. We brought, as you said, all of our conferences together You're in that office, but you've got other roles behind the scenes, to accelerate the ideation in the storage business. and that's one of the things that, you know, seeing the elastic storage solution, or server. like the read only performance feed, AI, what is it, you mentioned some attributes before, that the storage doesn't break down, and tell me if you think, let me hear your thoughts. and start acknowledging that we're talking about a fabric the models that I want to train is in the 100s of gigabytes. to move the data. for one of the problems that I'm solving. and I would like to do it, protocols, and the file system, the NFS, right? So you don't expect that there's going to be and that's around the corner, actually, right? allow the clients to consume the capabilities and that's some of the interactions that we're working on and how it relates to sort of what you do on a and that it's governed behind the scenes you've got to have your data house in order to do that. that are kicking the tires and trying to figure out bringing it to our enterprise alliance. and how are you applying that to clients? leveraging the capabilities, like spectrum scale, ESS, and all the wonderful things that they're doing, So I think that's what you can expect, you know, Okay, alright Bina, we'll give you the final word. and new capabilities coming out, so, we feel We appreciate your time. This is day two, IBM Think 2018.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Steve | PERSON | 0.99+ |
Steve Elliot | PERSON | 0.99+ |
David | PERSON | 0.99+ |
Peter Burress | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
Microsoft | ORGANIZATION | 0.99+ |
Dave | PERSON | 0.99+ |
Paul Bhandari | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
Boston | LOCATION | 0.99+ |
Bina Hallman | PERSON | 0.99+ |
Indra Pal | PERSON | 0.99+ |
60 terabytes | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
16 gigs | QUANTITY | 0.99+ |
Peter | PERSON | 0.99+ |
100% | QUANTITY | 0.99+ |
2018 | DATE | 0.99+ |
Ed Walsh | PERSON | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
30 terabytes | QUANTITY | 0.99+ |
Jenny | PERSON | 0.99+ |
three | QUANTITY | 0.99+ |
60 | QUANTITY | 0.99+ |
40,000 experiments | QUANTITY | 0.99+ |
Steven Eliuk | PERSON | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
24 | QUANTITY | 0.99+ |
Bina | PERSON | 0.99+ |
two years | QUANTITY | 0.99+ |
120 | QUANTITY | 0.99+ |
48 | QUANTITY | 0.99+ |
last October | DATE | 0.99+ |
one | QUANTITY | 0.98+ |
40 gigabytes | QUANTITY | 0.98+ |
first step | QUANTITY | 0.98+ |
hundreds | QUANTITY | 0.97+ |
three examples | QUANTITY | 0.97+ |
30,000 40,000 | QUANTITY | 0.97+ |
today | DATE | 0.97+ |
400 people | QUANTITY | 0.97+ |
four hundred CDOs | QUANTITY | 0.96+ |
Whirl | ORGANIZATION | 0.95+ |
about 10, 15 years ago | DATE | 0.94+ |
this morning | DATE | 0.94+ |
about three | QUANTITY | 0.92+ |
four times | QUANTITY | 0.91+ |
years ago | DATE | 0.91+ |
100s of gigabytes | QUANTITY | 0.89+ |
fourth quarter | DATE | 0.89+ |
a year ago | DATE | 0.88+ |
four straight quarters | QUANTITY | 0.88+ |
Watson studio | ORGANIZATION | 0.85+ |
day two | QUANTITY | 0.84+ |
ESS | ORGANIZATION | 0.83+ |
nine server power systems | QUANTITY | 0.82+ |
Vice President | PERSON | 0.78+ |