Tim Vincent & Steve Roberts, IBM | DataWorks Summit 2018
>> Live from San Jose, in the heart of Silicon Valley, it's theCUBE, overing DataWorks Summit 2018. Brought to you by Hortonworks. >> Welcome back everyone to day two of theCUBE's live coverage of DataWorks, here in San Jose, California. I'm your host, Rebecca Knight, along with my co-host James Kobielus. We have two guests on this panel today, we have Tim Vincent, he is the VP of Cognitive Systems Software at IBM, and Steve Roberts, who is the Offering Manager for Big Data on IBM Power Systems. Thanks so much for coming on theCUBE. >> Oh thank you very much. >> Thanks for having us. >> So we're now in this new era, this Cognitive Systems era. Can you set the scene for our viewers, and tell our viewers a little bit about what you do and why it's so important >> Okay, I'll give a bit of a background first, because James knows me from my previous role as, and you know I spent a lot of time in the data and analytics space. I was the CTO for Bob running the analytics group up 'til about a year and a half ago, and we spent a lot of time looking at what we needed to do from a data perspective and AI's perspective. And Bob, when he moved over to the Cognitive Systems, Bob Picciano who's my current boss, Bob asked me to move over and really start helping build, help to build out more of a software, and more of an AI focus, and a workload focus on how we thinking of the Power brand. So we spent a lot of time on that. So when you talk about cognitive systems or AI, what we're really trying to do is think about how you actually couple a combination of software, so co-optimize software space and the hardware space specific of what's needed for AI systems. Because the act of processing, the data processing, the algorithmic processing for AI is very, very different then what you would have for traditional data workload. So we're spending a lot of time thinking about how you actually co-optimize those systems so you can actually build a system that's really optimized for the demands of AI. >> And is this driven by customers, is this driven by just a trend that IBM is seeing? I mean how are you, >> It's a combination of both. >> So a lot of this is, you know, there's a lot of thought put into this before I joined the team. So there was a lot of good thinking from the Power brand, but it was really foresight on things like Moore's Law coming to an end of it's lifecycle right, and the ramifications to that. And at the same time as you start getting into things like narrow NATS and the floating point operations that you need to drive a narrow NAT, it was clear that we were hitting the boundaries. And then there's new technologies such as what Nvidia produces with with their GPUs, that are clearly advantageous. So there's a lot of trends that were comin' together the technical team saw, and at the same time we were seeing customers struggling with specific things. You know how to actually build a model if the training time is going to be weeks, and months, or let alone hours. And one of the scenarios I like to think about, I was probably showing my age a bit, but went to a school called University of Waterloo, and when I went to school, and in my early years, they had a batch based system for compilation and a systems run. You sit in the lab at night and you submit a compile job and the compile job will say, okay it's going to take three hours to compile the application, and you think of the productivity hit that has to you. And now you start thinking about, okay you've got this new skill in data scientists, which is really, really hard to find, they're very, very valuable. And you're giving them systems that take hours and weeks to do what the need to do. And you know, so they're trying to drive these models and get a high degree of accuracy in their predictions, and they just can't do it. So there's foresight on the technology side and there's clear demand on the customer side as well. >> Before the cameras were rolling you were talking about how the term data scientists and app developers is used interchangeably, and that's just wrong. >> And actually let's hear, 'cause I'd be in this whole position that I agree with it. I think it's the right framework. Data science is a team sport but application development has an even larger team sport in which data scientists, data engineers play a role. So, yeah we want to hear your ideas on the broader application development ecosystem, and where data scientists, and data engineers, and sort, fall into that broader spectrum. And then how IBM is supporting that entire new paradigm of application development, with your solution portfolio including, you know Power, AI on Power? >> So I think you used the word collaboration and team sport, and data science is a collaborative team sport. But you're 100% correct, there's also a, and I think it's missing to a great degree today, and it's probably limiting the actual value AI in the industry, and that's had to be data scientists and the application developers interact with each other. Because if you think about it, one of the models I like to think about is a consumer-producer model. Who consumes things and who produces things? And basically the data scientists are producing a specific thing, which is you know simply an AI model, >> Machine models, deep-learning models. >> Machine learning and deep learning, and the application developers are consuming those things and then producing something else, which is the application logic which is driving your business processes, and this view. So they got to work together. But there's a lot of confusion about who does what. You know you see people who talk with data scientists, build application logic, and you know the number of people who are data scientists can do that is, you know it exists, but it's not where the value, the value they bring to the equation. And the application developers developing AI models, you know they exist, but it's not the most prevalent form fact. >> But you know it's kind of unbalanced Tim, in the industry discussion of these role definitions. Quite often the traditional, you know definition, our sculpting of data scientist is that they know statistical modeling, plus data management, plus coding right? But you never hear the opposite, that coders somehow need to understand how to build statistical models and so forth. Do you think that the coders of the future will at least on some level need to be conversant with the practices of building,and tuning, or training the machine learning models or no? >> I think it's absolutely happen. And I will actually take it a step further, because again the data scientist skill is hard for a lot of people to find. >> Yeah. >> And as such is a very valuable skill. And what we're seeing, and we are actually one of the offerings that we're pulling out is something called PowerAI Vision, and it takes it up another level above the application developer, which is how do you actually really unlock the capabilities of AI to the business persona, the subject matter expert. So in the case of vision, how do you actually allow somebody to build a model without really knowing what a deep learning algorithm is, what kind of narrow NATS you use, how to do data preparation. So we build a tool set which is, you know effectively a SME tool set, which allows you to automatically label, it actually allows you to tag and label images, and then as you're tagging and labeling images it learns from that and actually it helps automate the labeling of the image. >> Is this distinct from data science experience on the one hand, which is geared towards the data scientists and I think Watson Analytics among your tools, is geared towards the SME, this a third tool, or an overlap. >> Yeah this is a third tool, which is really again one of the co-optimized capabilities that I talked about, is it's a tool that we built out that really is leveraging the combination of what we do in Power, the interconnect which we have with the GPU's, which is the NVLink interconnect, which gives us basically a 10X improvement in bandwidth between the CPU and GPU. That allows you to actually train your models much more quickly, so we're seeing about a 4X improvement over competitive technologies that are also using GPU's. And if we're looking at machine learning algorithms, we've recently come out with some technology we call Snap ML, which allows you to push machine learning, >> Snap ML, >> Yeah, it allows you to push machine learning algorithms down into the GPU's, and this is, we're seeing about a 40 to 50X improvement over traditional processing. So it's coupling all these capabilities, but really allowing a business persona to something specific, which is allow them to build out AI models to do recognition on either images or videos. >> Is there a pre-existing library of models in the solution that they can tap into? >> Basically it allows, it has a, >> Are they pre-trained? >> No they're not pre-trained models that's one of the differences in it. It actually has a set of models that allow, it picks for you, and actually so, >> Oh yes, okay. >> So this is why it helps the business persona because it's helping them with labeling the data. It's also helping select the best model. It's doing things under the covers to optimize things like hyper-parameter tuning, but you know the end-user doesn't have to know about all these things right? So you're tryin' to lift, and it comes back to your point on application developers, it allows you to lift the barrier for people to do these tasks. >> Even for professional data scientists, there may be a vast library of models that they don't necessarily know what is the best fit for the particular task. Ideally you should have, the infrastructure should recommend and choose, under various circumstances, the models, and the algorithms, the libraries, whatever for you for to the task, great. >> One extra feature of PowerAI Enterprises is that it does include a way to do a quick visual inspection of a models accuracy with a small data sample before you invest in scaling over a cluster or large data set. So you can get a visual indicator as to the, whether the models moving towards accuracy or you need to go and test an alternate model. >> So it's like a dashboard, of like Gini coefficients and all that stuff, okay. >> Exactly it gives you a snapshot view. And the other thing I was going to mention, you guys talked about application development, data scientists and of course a big message here at the conference is, you know data science meets big data and the work that Hortonworks is doing involving the notion of container support in YARN, GPU awareness in YARN, bringing data science experience, which you can include the PowerAI capability that Tim was talking about, as a workload tightly coupled with Hadoop. And this is where our Power servers are really built, not for just a monolithic building block that always has the same ratio of compute and storage, but fit for purpose servers that can address either GPU optimized workloads, providing the bandwidth enhancements that Tim talked about with the GPU, but also day-to-day servers, that can now support two terrabytes of memory, double the overall memory bandwidth on the box, 44 cores that can support up to 176 threads for parallelization of Spark workloads, Sequel workloads, distributed data science workloads. So it's really about choosing the combination of servers that can really mix this evolving workload need, 'cause a dupe isn't now just map produced, it's a multitude of workloads that you need to be able to mix and match, and bring various capabilities to the table for a compute, and that's where Power8, now Power9 has really been built for this kind of combination workloads where you can add acceleration where it makes sense, add big data, smaller core, smaller memory, where it makes sense, pick and choose. >> So Steve at this show, at DataWorks 2018 here in San Jose, the prime announcement, partnership announced between IBM and Hortonworks was IHAH, which I believe is IBM Host Analytics on Hortonworks. What I want to know is that solution that runs inside, I mean it runs on top of HDP 3.0 and so forth, is there any tie-in from an offering management standpoint between that and PowerAI so you can build models in the PowerAI environment, and then deploy them out to, in conjunction with the IHAH, is there, going forward, I mean just wanted to get a sense of whether those kinds of integrations. >> Well the same data science capability, data science experience, whether you choose to run it in the public cloud, or run it in private cloud monitor on prem, it's the same data science package. You know PowerAI has a set of optimized deep-learning libraries that can provide advantage on power, apply when you choose to run those deployments on our Power system alright, so we can provide additional value in terms of these optimized libraries, this memory bandwidth improvements. So really it depends upon the customer requirements and whether a Power foundation would make sense in some of those deployment models. I mean for us here with Power9 we've recently announced a whole series of Linux Power9 servers. That's our latest family, including as I mentioned, storage dense servers. The one we're showcasing on the floor here today, along with GPU rich servers. We're releasing fresh reference architecture. It's really to support combinations of clustered models that can as I mentioned, fit for purpose for the workload, to bring data science and big data together in the right combination. And working towards cloud models as well that can support mixing Power in ICP with big data solutions as well. >> And before we wrap, we just wanted to wrap. I think in the reference architecture you describe, I'm excited about the fact that you've commercialized distributed deep-learning for the growing number of instances where you're going to build containerized AI and distributing pieces of it across in this multi-cloud, you need the underlying middleware fabric to allow all those pieces to play together into some larger applications. So I've been following DDL because you've, research lab has been posting information about that, you know for quite a while. So I'm excited that you guys have finally commercialized it. I think there's a really good job of commercializing what comes out of the lab, like with Watson. >> Great well a good note to end on. Thanks so much for joining us. >> Oh thank you. Thank you for the, >> Thank you. >> We will have more from theCUBE's live coverage of DataWorks coming up just after this. (bright electronic music)
SUMMARY :
in the heart of Silicon he is the VP of Cognitive little bit about what you do and you know I spent a lot of time And at the same time as you how the term data scientists on the broader application one of the models I like to think about and the application developers in the industry discussion because again the data scientist skill So in the case of vision, on the one hand, which is geared that really is leveraging the combination down into the GPU's, and this is, that's one of the differences in it. it allows you to lift the barrier for the particular task. So you can get a visual and all that stuff, okay. and the work that Hortonworks is doing in the PowerAI environment, in the right combination. So I'm excited that you guys Thanks so much for joining us. Thank you for the, of DataWorks coming up just after this.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
James Kobielus | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Bob | PERSON | 0.99+ |
Steve Roberts | PERSON | 0.99+ |
Tim Vincent | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
James | PERSON | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
Bob Picciano | PERSON | 0.99+ |
Steve | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
100% | QUANTITY | 0.99+ |
44 cores | QUANTITY | 0.99+ |
two guests | QUANTITY | 0.99+ |
Tim | PERSON | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
10X | QUANTITY | 0.99+ |
Nvidia | ORGANIZATION | 0.99+ |
San Jose, California | LOCATION | 0.99+ |
IBM Power Systems | ORGANIZATION | 0.99+ |
Cognitive Systems Software | ORGANIZATION | 0.99+ |
today | DATE | 0.99+ |
three hours | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
Cognitive Systems | ORGANIZATION | 0.99+ |
University of Waterloo | ORGANIZATION | 0.98+ |
third tool | QUANTITY | 0.98+ |
DataWorks Summit 2018 | EVENT | 0.97+ |
50X | QUANTITY | 0.96+ |
PowerAI | TITLE | 0.96+ |
DataWorks 2018 | EVENT | 0.93+ |
theCUBE | ORGANIZATION | 0.93+ |
two terrabytes | QUANTITY | 0.93+ |
up to 176 threads | QUANTITY | 0.92+ |
40 | QUANTITY | 0.91+ |
about | DATE | 0.91+ |
Power9 | COMMERCIAL_ITEM | 0.89+ |
a year and a half ago | DATE | 0.89+ |
IHAH | ORGANIZATION | 0.88+ |
4X | QUANTITY | 0.88+ |
IHAH | TITLE | 0.86+ |
DataWorks | TITLE | 0.85+ |
Watson | ORGANIZATION | 0.84+ |
Linux Power9 | TITLE | 0.83+ |
Snap ML | OTHER | 0.78+ |
Power8 | COMMERCIAL_ITEM | 0.77+ |
Spark | TITLE | 0.76+ |
first | QUANTITY | 0.73+ |
PowerAI | ORGANIZATION | 0.73+ |
One extra | QUANTITY | 0.71+ |
DataWorks | ORGANIZATION | 0.7+ |
day two | QUANTITY | 0.69+ |
HDP 3.0 | TITLE | 0.68+ |
Watson Analytics | ORGANIZATION | 0.65+ |
Power | ORGANIZATION | 0.58+ |
NVLink | OTHER | 0.57+ |
YARN | ORGANIZATION | 0.55+ |
Hadoop | TITLE | 0.55+ |
theCUBE | EVENT | 0.53+ |
Moore | ORGANIZATION | 0.45+ |
Analytics | ORGANIZATION | 0.43+ |
Power9 | ORGANIZATION | 0.41+ |
Host | TITLE | 0.36+ |