Pandit Prasad, IBM | DataWorks Summit 2018
>> From San Jose, in the heart of Silicon Valley, it's theCube. Covering DataWorks Summit 2018. Brought to you by Hortonworks. (upbeat music) >> Welcome back to theCUBE's live coverage of Data Works here in sunny San Jose, California. I'm your host Rebecca Knight along with my co-host James Kobielus. We're joined by Pandit Prasad. He is the analytics, projects, strategy, and management at IBM Analytics. Thanks so much for coming on the show. >> Thanks Rebecca, glad to be here. >> So, why don't you just start out by telling our viewers a little bit about what you do in terms of in relationship with the Horton Works relationship and the other parts of your job. >> Sure, as you said I am in Offering Management, which is also known as Product Management for IBM, manage the big data portfolio from an IBM perspective. I was also working with Hortonworks on developing this relationship, nurturing that relationship, so it's been a year since the Northsys partnership. We announced this partnership exactly last year at the same conference. And now it's been a year, so this year has been a journey and aligning the two portfolios together. Right, so Hortonworks had HDP HDF. IBM also had similar products, so we have for example, Big Sequel, Hortonworks has Hive, so how Hive and Big Sequel align together. IBM has a Data Science Experience, where does that come into the picture on top of HDP, so it means before this partnership if you look into the market, it has been you sell Hadoop, you sell a sequel engine, you sell Data Science. So what this year has given us is more of a solution sell. Now with this partnership we go to the customers and say here is NTN experience for you. You start with Hadoop, you put more analytics on top of it, you then bring Big Sequel for complex queries and federation visualization stories and then finally you put Data Science on top of it, so it gives you a complete NTN solution, the NTN experience for getting the value out of the data. >> Now IBM a few years back released a Watson data platform for team data science with DSX, data science experience, as one of the tools for data scientists. Is Watson data platform still the core, I call it dev ops for data science and maybe that's the wrong term, that IBM provides to market or is there sort of a broader dev ops frame work within which IBM goes to market these tools? >> Sure, Watson data platform one year ago was more of a cloud platform and it had many components of it and now we are getting a lot of components on to the (mumbles) and data science experience is one part of it, so data science experience... >> So Watson analytics as well for subject matter experts and so forth. >> Yes. And again Watson has a whole suit of side business based offerings, data science experience is more of a a particular aspect of the focus, specifically on the data science and that's been now available on PRAM and now we are building this arm from stack, so we have HDP, HDF, Big Sequel, Data Science Experience and we are working towards adding more and more to that portfolio. >> Well you have a broader reference architecture and a stack of solutions AI and power and so for more of the deep learning development. In your relationship with Hortonworks, are they reselling more of those tools into their customer base to supplement, extend what they already resell DSX or is that outside of the scope of the relationship? >> No it is all part of the relationship, these three have been the core of what we announced last year and then there are other solutions. We have the whole governance solution right, so again it goes back to the partnership HDP brings with it Atlas. IBM has a whole suite of governance portfolio including the governance catalog. How do you expand the story from being a Hadoop-centric story to an enterprise data-like story, and then now we are taking that to the cloud that's what Truata is all about. Rob Thomas came out with a blog yesterday morning talking about Truata. If you look at it is nothing but a governed data-link hosted offering, if you want to simplify it. That's one way to look at it caters to the GDPR requirements as well. >> For GDPR for the IBM Hortonworks partnership is the lead solution for GDPR compliance, is it Hortonworks Data Steward Studio or is it any number of solutions that IBM already has for data governance and curation, or is it a combination of all of that in terms of what you, as partners, propose to customers for soup to nuts GDPR compliance? Give me a sense for... >> It is a combination of all of those so it has a HDP, its has HDF, it has Big Sequel, it has Data Science Experience, it had IBM governance catalog, it has IBM data quality and it has a bunch of security products, like Gaurdium and it has some new IBM proprietary components that are very specific towards data (cough drowns out speaker) and how do you deal with the personal data and sensitive personal data as classified by GDPR. I'm supposed to query some high level information but I'm not allowed to query deep into the personal information so how do you blog those queries, how do you understand those, these are not necessarily part of Data Steward Studio. These are some of the proprietary components that are thrown into the mix by IBM. >> One of the requirements that is not often talked about under GDPR, Ricky of Formworks got in to it a little bit in his presentation, was the notion that the requirement that if you are using an UE citizen's PII to drive algorithmic outcomes, that they have the right to full transparency. It's the algorithmic decision paths that were taken. I remember IBM had a tool under the Watson brand that wraps up a narrative of that sort. Is that something that IBM still, it was called Watson Curator a few years back, is that a solution that IBM still offers, because I'm getting a sense right now that Hortonworks has a specific solution, not to say that they may not be working on it, that addresses that side of GDPR, do you know what I'm referring to there? >> I'm not aware of something from the Hortonworks side beyond the Data Steward Studio, which offers basically identification of what some of the... >> Data lineage as opposed to model lineage. It's a subtle distinction. >> It can identify some of the personal information and maybe provide a way to tag it and hence, mask it, but the Truata offering is the one that is bringing some new research assets, after GDPR guidelines became clear and then they got into they are full of how do we cater to those requirements. These are relatively new proprietary components, they are not even being productized, that's why I am calling them proprietary components that are going in to this hosting service. >> IBM's got a big portfolio so I'll understand if you guys are still working out what position. Rebecca go ahead. >> I just wanted to ask you about this new era of GDPR. The last Hortonworks conference was sort of before it came into effect and now we're in this new era. How would you say companies are reacting? Are they in the right space for it, in the sense of they're really still understand the ripple effects and how it's all going to play out? How would you describe your interactions with companies in terms of how they're dealing with these new requirements? >> They are still trying to understand the requirements and interpret the requirements coming to terms with what that really means. For example I met with a customer and they are a multi-national company. They have data centers across different geos and they asked me, I have somebody from Asia trying to query the data so that the query should go to Europe, but the query processing should not happen in Asia, the query processing all should happen in Europe, and only the output of the query should be sent back to Asia. You won't be able to think in these terms before the GDPR guidance era. >> Right, exceedingly complicated. >> Decoupling storage from processing enables those kinds of fairly complex scenarios for compliance purposes. >> It's not just about the access to data, now you are getting into where the processing happens were the results are getting displayed, so we are getting... >> Severe penalties for not doing that so your customers need to keep up. There was announcement at this show at Dataworks 2018 of an IBM Hortonwokrs solution. IBM post-analytics with with Hortonworks. I wonder if you could speak a little bit about that, Pandit, in terms of what's provided, it's a subscription service? If you could tell us what subset of IBM's analytics portfolio is hosted for Hortonwork's customers? >> Sure, was you said, it is a a hosted offering. Initially we are starting of as base offering with three products, it will have HDP, Big Sequel, IBM DB2 Big Sequel and DSX, Data Science Experience. Those are the three solutions, again as I said, it is hosted on IBM Cloud, so customers have a choice of different configurations they can choose, whether it be VMs or bare metal. I should say this is probably the only offering, as of today, that offers bare metal configuration in the cloud. >> It's geared to data scientist developers and machine-learning models will build the models and train them in IBM Cloud, but in a hosted HDP in IBM Cloud. Is that correct? >> Yeah, I would rephrase that a little bit. There are several different offerings on the cloud today and we can think about them as you said for ad-hoc or ephemeral workloads, also geared towards low cost. You think about this offering as taking your on PRAM data center experience directly onto the cloud. It is geared towards very high performance. The hardware and the software they are all configured, optimized for providing high performance, not necessarily for ad-hoc workloads, or ephemeral workloads, they are capable of handling massive workloads, on sitcky workloads, not meant for I turned this massive performance computing power for a couple of hours and then switched them off, but rather, I'm going to run these massive workloads as if it is located in my data center, that's number one. It comes with the complete set of HDP. If you think about it there are currently in the cloud you have Hive and Hbase, the sequel engines and the stories separate, security is optional, governance is optional. This comes with the whole enchilada. It has security and governance all baked in. It provides the option to use Big Sequel, because once you get on Hadoop, the next experience is I want to run complex workloads. I want to run federated queries across Hadoop as well as other data storage. How do I handle those, and then it comes with Data Science Experience also configured for best performance and integrated together. As a part of this partnership, I mentioned earlier, that we have progress towards providing this story of an NTN solution. The next steps of that are, yeah I can say that it's an NTN solution but are the product's look and feel as if they are one solution. That's what we are getting into and I have featured some of those integrations. For example Big Sequel, IBM product, we have been working on baking it very closely with HDP. It can be deployed through Morey, it is integrated with Atlas and Granger for security. We are improving the integrations with Atlas for governance. >> Say you're building a Spark machine learning model inside a DSX on HDP within IH (mumbles) IBM hosting with Hortonworks on HDP 3.0, can you then containerize that machine learning Sparks and then deploy into an edge scenario? >> Sure, first was Big Sequel, the next one was DSX. DSX is integrated with HDP as well. We can run DSX workloads on HDP before, but what we have done now is, if you want to run the DSX workloads, I want to run a Python workload, I need to have Python libraries on all the nodes that I want to deploy. Suppose you are running a big cluster, 500 cluster. I need to have Python libraries on all 500 nodes and I need to maintain the versioning of it. If I upgrade the versions then I need to go and upgrade and make sure all of them are perfectly aligned. >> In this first version will you be able build a Spark model and a Tesorflow model and containerize them and deploy them. >> Yes. >> Across a multi-cloud and orchestrate them with Kubernetes to do all that meshing, is that a capability now or planned for the future within this portfolio? >> Yeah, we have that capability demonstrated in the pedestal today, so that is a new one integration. We can run virtual, we call it virtual Python environment. DSX can containerize it and run data that's foreclosed in the HDP cluster. Now we are making use of both the data in the cluster, as well as the infrastructure of the cluster itself for running the workloads. >> In terms of the layers stacked, is also incorporating the IBM distributed deep-learning technology that you've recently announced? Which I think is highly differentiated, because deep learning is increasingly become a set of capabilities that are across a distributed mesh playing together as is they're one unified application. Is that a capability now in this solution, or will it be in the near future? DPL distributed deep learning? >> No, we have not yet. >> I know that's on the AI power platform currently, gotcha. >> It's what we'll be talking about at next year's conference. >> That's definitely on the roadmap. We are starting with the base configuration of bare metals and VM configuration, next one is, depending on how the customers react to it, definitely we're thinking about bare metal with GPUs optimized for Tensorflow workloads. >> Exciting, we'll be tuned in the coming months and years I'm sure you guys will have that. >> Pandit, thank you so much for coming on theCUBE. We appreciate it. I'm Rebecca Knight for James Kobielus. We will have, more from theCUBE's live coverage of Dataworks, just after this.
SUMMARY :
Brought to you by Hortonworks. Thanks so much for coming on the show. and the other parts of your job. and aligning the two portfolios together. and maybe that's the wrong term, getting a lot of components on to the (mumbles) and so forth. a particular aspect of the focus, and so for more of the deep learning development. No it is all part of the relationship, For GDPR for the IBM Hortonworks partnership the personal information so how do you blog One of the requirements that is not often I'm not aware of something from the Hortonworks side Data lineage as opposed to model lineage. It can identify some of the personal information if you guys are still working out what position. in the sense of they're really still understand the and interpret the requirements coming to terms kinds of fairly complex scenarios for compliance purposes. It's not just about the access to data, I wonder if you could speak a little that offers bare metal configuration in the cloud. It's geared to data scientist developers in the cloud you have Hive and Hbase, can you then containerize that machine learning Sparks on all the nodes that I want to deploy. In this first version will you be able build of the cluster itself for running the workloads. is also incorporating the IBM distributed It's what we'll be talking next one is, depending on how the customers react to it, I'm sure you guys will have that. Pandit, thank you so much for coming on theCUBE.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Rebecca | PERSON | 0.99+ |
James Kobielus | PERSON | 0.99+ |
Rebecca Knight | PERSON | 0.99+ |
Europe | LOCATION | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Asia | LOCATION | 0.99+ |
Rob Thomas | PERSON | 0.99+ |
San Jose | LOCATION | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
Pandit | PERSON | 0.99+ |
last year | DATE | 0.99+ |
Python | TITLE | 0.99+ |
yesterday morning | DATE | 0.99+ |
Hortonworks | ORGANIZATION | 0.99+ |
three solutions | QUANTITY | 0.99+ |
Ricky | PERSON | 0.99+ |
Northsys | ORGANIZATION | 0.99+ |
Hadoop | TITLE | 0.99+ |
Pandit Prasad | PERSON | 0.99+ |
GDPR | TITLE | 0.99+ |
IBM Analytics | ORGANIZATION | 0.99+ |
first version | QUANTITY | 0.99+ |
both | QUANTITY | 0.99+ |
one year ago | DATE | 0.98+ |
Hortonwork | ORGANIZATION | 0.98+ |
three | QUANTITY | 0.98+ |
today | DATE | 0.98+ |
DSX | TITLE | 0.98+ |
Formworks | ORGANIZATION | 0.98+ |
this year | DATE | 0.98+ |
Atlas | ORGANIZATION | 0.98+ |
first | QUANTITY | 0.98+ |
Granger | ORGANIZATION | 0.97+ |
Gaurdium | ORGANIZATION | 0.97+ |
one | QUANTITY | 0.97+ |
Data Steward Studio | ORGANIZATION | 0.97+ |
two portfolios | QUANTITY | 0.97+ |
Truata | ORGANIZATION | 0.96+ |
DataWorks Summit 2018 | EVENT | 0.96+ |
one solution | QUANTITY | 0.96+ |
one way | QUANTITY | 0.95+ |
next year | DATE | 0.94+ |
500 nodes | QUANTITY | 0.94+ |
NTN | ORGANIZATION | 0.93+ |
Watson | TITLE | 0.93+ |
Hortonworks | PERSON | 0.93+ |