Vikram Murali, IBM | IBM Data Science For All

>> Narrator: Live from New York City, it's theCUBE. Covering IBM Data Science For All. Brought to you by IBM. >> Welcome back to New York here on theCUBE. Along with Dave Vellante, I'm John Walls. We're Data Science For All, IBM's two day event, and we'll be here all day long wrapping up again with that panel discussion from four to five here Eastern Time, so be sure to stick around all day here on theCUBE. Joining us now is Vikram Murali, who is a program director at IBM, and Vikram thank for joining us here on theCUBE. Good to see you. >> Good to see you too. Thanks for having me. >> You bet. So, among your primary responsibilities, The Data Science Experience. So first off, if you would, share with our viewers a little bit about that. You know, the primary mission. You've had two fairly significant announcements. Updates, if you will, here over the past month or so, so share some information about that too if you would. >> Sure, so my team, we build The Data Science Experience, and our goal is for us to enable data scientist, in their path, to gain insights into data using data science techniques, mission learning, the latest and greatest open source especially, and be able to do collaboration with fellow data scientist, with data engineers, business analyst, and it's all about freedom. Giving freedom to data scientist to pick the tool of their choice, and program and code in the language of their choice. So that's the mission of Data Science Experience, when we started this. The two releases, that you mentioned, that we had in the last 45 days. There was one in September and then there was one on October 30th. Both of these releases are very significant in the mission learning space especially. We now support Scikit-Learn, XGBoost, TensorFlow libraries in Data Science Experience. We have deep integration with Horton Data Platform, which is keymark of our partnership with Hortonworks. Something that we announced back in the summer, and this last release of Data Science Experience, two days back, specifically can do authentication with Technotes with Hadoop. So now our Hadoop customers, our Horton Data Platform customers, can leverage all the goodies that we have in Data Science Experience. It's more deeply integrated with our Hadoop based environments. >> A lot of people ask me, "Okay, when IBM announces a product like Data Science Experience... You know, IBM has a lot of products in its portfolio. Are they just sort of cobbling together? You know? So exulting older products, and putting a skin on them? Or are they developing them from scratch?" How can you help us understand that? >> That's a great question, and I hear that a lot from our customers as well. Data Science Experience started off as a design first methodology. And what I mean by that is we are using IBM design to lead the charge here along with the product and development. And we are actually talking to customers, to data scientist, to data engineers, to enterprises, and we are trying to find out what problems they have in data science today and how we can best address them. So it's not about taking older products and just re-skinning them, but Data Science Experience, for example, it started of as a brand new product: completely new slate with completely new code. Now, IBM has done data science and mission learning for a very long time. We have a lot of assets like SPSS Modeler and Stats, and digital optimization. And we are re-investing in those products, and we are investing in such a way, and doing product research in such a way, not to make the old fit with the new, but in a way where it fits into the realm of collaboration. How can data scientist leverage our existing products with open source, and how we can do collaboration. So it's not just re-skinning, but it's building ground up. >> So this is really important because you say architecturally it's built from the ground up. Because, you know, given enough time and enough money, you know, smart people, you can make anything work. So the reason why this is important is you mentioned, for instance, TensorFlow. You know that down the road there's going to be some other tooling, some other open source project that's going to take hold, and your customers are going to say, "I want that." You've got to then integrate that, or you have to choose whether or not to. If it's a super heavy lift, you might not be able to do it, or do it in time to hit the market. If you architected your system to be able to accommodate that. Future proof is the term everybody uses, so have you done? How have you done that? I'm sure API's are involved, but maybe you could add some color. >> Sure. So we are and our Data Science Experience and mission learning... It is a microservices based architecture, so we are completely dockerized, and we use Kubernetes under the covers for container dockerstration. And all these are tools that are used in The Valley, across different companies, and also in products across IBM as well. So some of these legacy products that you mentioned, we are actually using some of these newer methodologies to re-architect them, and we are dockerizing them, and the microservice architecture actually helps us address issues that we have today as well as be open to development and taking newer methodologies and frameworks into consideration that may not exist today. So the microservices architecture, for example, TensorFlow is something that you brought in. So we can just pin up a docker container just for TensorFlow and attach it to our existing Data Science Experience, and it just works. Same thing with other frameworks like XGBoost, and Kross, and Scikit-Learn, all these are frameworks and libraries that are coming up in open source within the last, I would say, a year, two years, three years timeframe. Previously, integrating them into our product would have been a nightmare. We would have had to re-architect our product every time something came, but now with the microservice architecture it is very easy for us to continue with those. >> We were just talking to Daniel Hernandez a little bit about the Hortonworks relationship at high level. One of the things that I've... I mean, I've been following Hortonworks since day one when Yahoo kind of spun them out. And know those guys pretty well. And they always make a big deal out of when they do partnerships, it's deep engineering integration. And so they're very proud of that, so I want to come on to test that a little bit. Can you share with our audience the kind of integrations you've done? What you've brought to the table? What Hortonworks brought to the table? >> Yes, so Data Science Experience today can work side by side with Horton Data Platform, HDP. And we could have actually made that work about two, three months back, but, as part of our partnership that was announced back in June, we set up drawing engineering teams. We have multiple touch points every day. We call it co-development, and they have put resources in. We have put resources in, and today, especially with the release that came out on October 30th, Data Science Experience can authenticate using secure notes. That I previously mentioned, and that was a direct example of our partnership with Hortonworks. So that is phase one. Phase two and phase three is going to be deeper integration, so we are planning on making Data Science Experience and a body management pact. And so a Hortonworks customer, if you have HDP already installed, you don't have to install DSX separately. It's going to be a management pack. You just spin it up. And the third phase is going to be... We're going to be using YARN for resource management. YARN is very good a resource management. And for infrastructure as a service for data scientist, we can actually delegate that work to YARN. So, Hortonworks, they are putting resources into YARN, doubling down actually. And they are making changes to YARN where it will act as the resource manager not only for the Hadoop and Spark workloads, but also for Data Science Experience workloads. So that is the level of deep engineering that we are engaged with Hortonworks. >> YARN stands for yet another resource negotiator. There you go for... >> John: Thank you. >> The trivia of the day. (laughing) Okay, so... But of course, Hortonworks are big on committers. And obviously a big committer to YARN. Probably wouldn't have YARN without Hortonworks. So you mentioned that's kind of what they're bringing to the table, and you guys primarily are focused on the integration as well as some other IBM IP? >> That is true as well as the notes piece that I mentioned. We have a notes commenter. We have multiple notes commenters on our side, and that helps us as well. So all the notes is part of the HDP package. We need knowledge on our side to work with Hortonworks developers to make sure that we are contributing and making end roads into Data Science Experience. That way the integration becomes a lot more easier. And from an IBM IP perspective... So Data Science Experience already comes with a lot of packages and libraries that are open source, but IBM research has worked on a lot of these libraries. I'll give you a few examples: Brunel and PixieDust is something that our developers love. These are visualization libraries that were actually cooked up by IBM research and the open sourced. And these are prepackaged into Data Science Experience, so there is IBM IP involved and there are a lot of algorithms, mission learning algorithms, that we put in there. So that comes right out of the package. >> And you guys, the development teams, are really both in The Valley? Is that right? Or are you really distributed around the world? >> Yeah, so we are. The Data Science Experience development team is in North America between The Valley and Toronto. The Hortonworks team, they are situated about eight miles from where we are in The Valley, so there's a lot of synergy. We work very closely with them, and that's what we see in the product. >> I mean, what impact does that have? Is it... You know, you hear today, "Oh, yeah. We're a virtual organization. We have people all over the world: Eastern Europe, Brazil." How much of an impact is that? To have people so physically proximate? >> I think it has major impact. I mean IBM is a global organization, so we do have teams around the world, and we work very well. With the invent of IP telephoning, and screen-shares, and so on, yes we work. But it really helps being in the same timezone, especially working with a partner just eight miles or ten miles a way. We have a lot of interaction with them and that really helps. >> Dave: Yeah. Body language? >> Yeah. >> Yeah. You talked about problems. You talked about issues. You know, customers. What are they now? Before it was like, "First off, I want to get more data." Now they've got more data. Is it figuring out what to do with it? Finding it? Having it available? Having it accessible? Making sense of it? I mean what's the barrier right now? >> The barrier, I think for data scientist... The number one barrier continues to be data. There's a lot of data out there. Lot of data being generated, and the data is dirty. It's not clean. So number one problem that data scientist have is how do I get to clean data, and how do I access data. There are so many data repositories, data lakes, and data swamps out there. Data scientist, they don't want to be in the business of finding out how do I access data. They want to have instant access to data, and-- >> Well if you would let me interrupt you. >> Yeah? >> You say it's dirty. Give me an example. >> So it's not structured data, so data scientist-- >> John: So unstructured versus structured? >> Unstructured versus structured. And if you look at all the social media feeds that are being generated, the amount of data that is being generated, it's all unstructured data. So we need to clean up the data, and the algorithms need structured data or data in a particular format. And data scientist don't want to spend too much time in cleaning up that data. And access to data, as I mentioned. And that's where Data Science Experience comes in. Out of the box we have so many connectors available. It's very easy for customers to bring in their own connectors as well, and you have instant access to data. And as part of our partnership with Hortonworks, you don't have to bring data into Data Science Experience. The data is becoming so big. You want to leave it where it is. Instead, push analytics down to where it is. And you can do that. We can connect to remote Spark. We can push analytics down through remote Spark. All of that is possible today with Data Science Experience. The second thing that I hear from data scientist is all the open source libraries. Every day there's a new one. It's a boon and a bane as well, and the problem with that is the open source community is very vibrant, and there a lot of data science competitions, mission learning competitions that are helping move this community forward. And it's a good thing. The bad thing is data scientist like to work in silos on their laptop. How do you, from an enterprise perspective... How do you take that, and how do you move it? Scale it to an enterprise level? And that's where Data Science Experience comes in because now we provide all the tools. The tools of your choice: open source or proprietary. You have it in here, and you can easily collaborate. You can do all the work that you need with open source packages, and libraries, bring your own, and as well as collaborate with other data scientist in the enterprise. >> So, you're talking about dirty data. I mean, with Hadoop and no schema on, right? We kind of knew this problem was coming. So technology sort of got us into this problem. Can technology help us get out of it? I mean, from an architectural standpoint. When you think about dirty data, can you architect things in to help? >> Yes. So, if you look at the mission learning pipeline, the pipeline starts with ingesting data and then cleansing or cleaning that data. And then you go into creating a model, training, picking a classifier, and so on. So we have tools built into Data Science Experience, and we're working on tools, that will be coming up and down our roadmap, which will help data scientist do that themselves. I mean, they don't have to be really in depth coders or developers to do that. Python is very powerful. You can do a lot of data wrangling in Python itself, so we are enabling data scientist to do that within the platform, within Data Science Experience. >> If I look at sort of the demographics of the development teams. We were talking about Hortonworks and you guys collaborating. What are they like? I mean people picture IBM, you know like this 100 plus year old company. What's the persona of the developers in your team? >> The persona? I would say we have a very young, agile development team, and by that I mean... So we've had six releases this year in Data Science Experience. Just for the on premises side of the product, and the cloud side of the product it's got huge delivery. We have releases coming out faster than we can code. And it's not just re-architecting it every time, but it's about adding features, giving features that our customers are asking for, and not making them wait for three months, six months, one year. So our releases are becoming a lot more frequent, and customers are loving it. And that is, in part, because of the team. The team is able to evolve. We are very agile, and we have an awesome team. That's all. It's an amazing team. >> But six releases in... >> Yes. We had immediate release in April, and since then we've had about five revisions of the release where we add lot more features to our existing releases. A lot more packages, libraries, functionality, and so on. >> So you know what monster you're creating now don't you? I mean, you know? (laughing) >> I know, we are setting expectation. >> You still have two months left in 2017. >> We do. >> We do not make frame release cycles. >> They are not, and that's the advantage of the microservices architecture. I mean, when you upgrade, a customer upgrades, right? They don't have to bring that entire system down to upgrade. You can target one particular part, one particular microservice. You componentize it, and just upgrade that particular microservice. It's become very simple, so... >> Well some of those microservices aren't so micro. >> Vikram: Yeah. Not. Yeah, so it's a balance. >> You're growing, but yeah. >> It's a balance you have to keep. Making sure that you componentize it in such a way that when you're doing an upgrade, it effects just one small piece of it, and you don't have to take everything down. >> Dave: Right. >> But, yeah, I agree with you. >> Well, it's been a busy year for you. To say the least, and I'm sure 2017-2018 is not going to slow down. So continue success. >> Vikram: Thank you. >> Wish you well with that. Vikram, thanks for being with us here on theCUBE. >> Thank you. Thanks for having me. >> You bet. >> Back with Data Science For All. Here in New York City, IBM. Coming up here on theCUBE right after this. >> Cameraman: You guys are clear. >> John: All right. That was great.

Published Date : Nov 1 2017

SUMMARY :

Brought to you by IBM. Good to see you. Good to see you too. about that too if you would. and be able to do collaboration How can you help us understand that? and we are investing in such a way, You know that down the and attach it to our existing One of the things that I've... And the third phase is going to be... There you go for... and you guys primarily are So that comes right out of the package. The Valley and Toronto. We have people all over the We have a lot of interaction with them Is it figuring out what to do with it? and the data is dirty. You say it's dirty. You can do all the work that you need with can you architect things in to help? I mean, they don't have to and you guys collaborating. And that is, in part, because of the team. and since then we've had about and that's the advantage of microservices aren't so micro. Yeah, so it's a balance. and you don't have to is not going to slow down. Wish you well with that. Thanks for having me. Back with Data Science For All. That was great.

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Dave	PERSON	0.99+
Vikram	PERSON	0.99+
John	PERSON	0.99+
three months	QUANTITY	0.99+
six months	QUANTITY	0.99+
John Walls	PERSON	0.99+
October 30th	DATE	0.99+
2017	DATE	0.99+
April	DATE	0.99+
June	DATE	0.99+
one year	QUANTITY	0.99+
Daniel Hernandez	PERSON	0.99+
Hortonworks	ORGANIZATION	0.99+
September	DATE	0.99+
one	QUANTITY	0.99+
ten miles	QUANTITY	0.99+
YARN	ORGANIZATION	0.99+
eight miles	QUANTITY	0.99+
Vikram Murali	PERSON	0.99+
New York City	LOCATION	0.99+
North America	LOCATION	0.99+
two day	QUANTITY	0.99+
Python	TITLE	0.99+
two releases	QUANTITY	0.99+
New York	LOCATION	0.99+
two years	QUANTITY	0.99+
three years	QUANTITY	0.99+
six releases	QUANTITY	0.99+
Toronto	LOCATION	0.99+
today	DATE	0.99+
Both	QUANTITY	0.99+
two months	QUANTITY	0.99+
a year	QUANTITY	0.99+
Yahoo	ORGANIZATION	0.99+
third phase	QUANTITY	0.98+
both	QUANTITY	0.98+
this year	DATE	0.98+
first methodology	QUANTITY	0.98+
First	QUANTITY	0.97+
second thing	QUANTITY	0.97+
one small piece	QUANTITY	0.96+
One	QUANTITY	0.96+
XGBoost	TITLE	0.96+
Cameraman	PERSON	0.96+
about eight miles	QUANTITY	0.95+
Horton Data Platform	ORGANIZATION	0.95+
2017-2018	DATE	0.94+
first	QUANTITY	0.94+
The Valley	LOCATION	0.94+
TensorFlow	TITLE	0.94+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Vikram Murali: