Nanda Vijaydev, HPE (BlueData) | CUBE Conversation, September 2019
from our studios in the heart of Silicon Valley Palo Alto California this is a cute conversation hi and welcome to the cube Studios for another cube conversation where we go in-depth with thought leaders driving innovation across the tech industry I'm your host Peter Burris AI is on the forefront of every board in every enterprise on a global basis as well as machine learning deep learning and other advanced technologies that are intended to turn data into business action that differentiates the business leads to more revenue leads to more profitability but the challenge is is that all of these new use cases are not able to be addressed with the traditional ways that we've set up the workflows that we've set up to address them so as a consequence we're going to need greater opera's the operationalization of how we translate business problems into ml and related technology solutions big challenge we've got a great guest today to talk about it non-division diof is a distinguished technologist and lead data scientists at HPE in the blue data team nonde welcome to the cube thank you happy to be here so ananda let's start with this notion of a need for an architected approach to how we think about matching AI ml technology to operations so that we get more certain results better outcomes more understanding of where we're going and how the technology is working within the business absolutely yeah ai and doing AI in an enterprise is not new there have been enterprise-grade tools in the space before but most of them have a very prescribed way of doing things sometimes you use custom sequel to use that particular tool or the way you present data to that tool requires some level of pre-processing which makes you copy the data into the tool so you have already data fidelity maybe at risk and you have a data duplication happening and then the scale right when you talk about doing AI at the scale that is required now considering data is so big and there is a variety of data sets for the scale it can probably be done but there is a huge cost associated with that and you may still not meet the variety of use cases that you want to actually work on so the problem now is to make sure that you empower your users who are working in the space and augment them with the right set of technologies and the ability to bring data in a timely manner for them to work on these solutions so it sounds as though what we're trying to do is simplify the process of taking great ideas and turn it into great outcomes but you mentioned users I think it's got to start with or let me ask you if we have to start here that we've always thought about how is going to center in the data science or the data scientist as these solutions have start to become more popularized if diffused across the industry a lot more people are engaging are all roles being served as well as you need to be absolutely I think that's the biggest challenge right in the past you know when we talk about very prescribed solutions end to end was happening within those tools so the different user persona were probably part of that particular solution and also the way these models came into production which is really making it available for a consumer is read coding or redeveloping this in technologies that were production friendly which is you're rewriting that and sequel you're recording that and C so there is a lot of details that are lost in translation and the third big problem was really having visibility or having a say from a developer's point of view or a data scientist point of view in how these things are performing in production that how do you actually take it back take that feedback back into deciding you know is this model still good or how do you retrain so when you look at this lifecycle holistically this is an iterative process it is no longer you know workflow where you hand things off this is not a water flow methodology anymore this is a very very continuous and iterative process especially in the New Age data science the tools that are developing where you build the model that developer decides what the run time is and the run times are capable of serving those models as is you don't have to recode you don't have to lose things during translation so with this back to your question of how do you serve two different roles now all those personas and all those roles have to be part of the same project and they have to be part of the same experiment they're just serving different parts of the lifecycle and now you've whatever tooling you provide or whatever architecture technologies you provide have to look at it holistically there has to be continuous development there has to be collaboration there has to be central repositories that actually cater to those needs so each so the architected approach needs to be able to serve each of the roles but in a way that is collaborative and is ultimately put in service to the outcome and driving the use of the technology forward well that leads to another question should it should the should this architected approach be tied to one or another set of algorithms or one or another set of implementation infrastructure or does it have to be able to serve a wide array of Technology types yeah great question right this is a living ecosystem we can no longer build for you know you plant something for the next two years or the next three years technologies are coming every day and the reason is because the types of use cases are evolving and what you need to solve that use case is completely different when you look at two different use cases so whatever standards you come up with you know the consistency has to be across how a user is on-boarded into the system a consistency has to be about data access about security about how does one provision these environments but as far as what tool is used or how is that tool being applied to a specific problem there's a lot of variability in there and it has to cater your architecture has to make sure that this variability is addressed and it is growing so HPE spends a lot of time with customers and you're learning from your customer successes and how you turn that into tooling that leads to this type of operator operationalization but give us some visibility into some of those successes that really stand out for you that have been essential to how HP has participated in this journey to create better tools for better AI and m/l absolutely you know traditionally with blue data HPE now you know we've been exposed to a lot of big data processing technologies where the current landscape the data is different data is not always at rest data is not structured you know data is coming it could be a stream of data it could be a picture and in the use cases like we talked about you know it could be image recognition or a voice recognition where the type of data is very different right so back to how we've learnt from our customers like in my role I talked to you know tens of customers on a daily or weekly basis and each one of them are at a different level of maturity in their life cycle and these are some very established customers but you know the various groups that are adopting this new age technologies even within an organization there is a lot of variability so whatever we offered them we have to help support all of that particular user groups there are some who are coming from the classic or language background there are some that are coming from Python background some are doing things in Scala someone doing things in SPARC and there are some commercial tools that they're using like h2o driverless AI or data iku so what we have to look at is in this life cycle we have to make sure that all these communities are represented and/or addressed and if they build a model in a specific technology how do we consume that how do we take it in then how do we deploy that from an end to point of view it doesn't matter where a model gets built it does matter how end-users access it it doesn't matter how security is applied to it it does matter how scaling is applied to it so really there is a lot of consistency is required in the operationalization and also in how you onboard those different tools how do you make sure that consistency or methodology or standard practices are applied in this entire lifecycle and also monitoring that's a huge aspect right when you have deployed a model and it's in production monitoring means two different things to people where is it even available you know when you go to a website when you click on something is a website available very similarly when you go to an endpoint or you're scoring against a model is that model available do you have enough resources can it scale depending on how much requests come in that's one aspect of monitoring and the second aspect is really how was the model performing you know is that what is the accuracy what is the drift when is it time to retrain so you no longer have the luxury to look at these things in isolation right so it we want to make sure that all these things can be addressed in a manner knowing that this iteration sometimes can be a month sometimes it can be a day sometimes it's probably a few hours and that is why it can no longer be an isolated and even infrastructure point of view some of these workloads may need things like GPU and you may need it for a very short amount of time let how do you make sure that you give what is needed for that duration that is required and take it back and assign it to something else because these are very valuable resources so I want to build on if I may on that notion of onboarding the tools we're talking about use cases that enterprises are using today to create business value we're talking about HPE as an example delivering tooling that operationalize is how that's done today but the reality is we're gonna see the state of the art still evolve pretty dramatically over the next few years how is HPE going about ensuring that your approach and the approach you working with your customers does not get balkanized does not get you know sclerotic that it's capable of evolving and changing as folks learn new approaches to doing things absolutely you know it this has to start with having an open architecture you know you have to there has to be standards without which enterprises can't run but at the same time those standards shouldn't be so constricting that it doesn't allow you to expand into newer use cases right so what HP EML ops offers is really making sure that you can do what you do today in a best-practice manner or in the most efficient manner bringing time to value you know making sure that there is you know instant provisioning or access to beta or making sure that you don't duplicate data compute storage separation containerization you know these are some of the standard best practice technologies that are out there making sure that you adopt those and what these sets users for is to make sure that they can evolve with the later use cases you can never have you know you can never have things you know frozen in time you just want to make sure that you can evolve and this is what it sets them up for and you evolve with different use cases and different tools as they come along nada thanks very much has been a very it's been a great conversation we appreciate you being on the cube thank you Peter so my guest has been non Division I of the distinguished technologists and lead data scientists at HPE blue data and for all of you thanks for joining us again for another cube conversation on Peter burst see you next time you [Music]
**Summary and Sentiment Analysis are not been shown because of improper transcript**
ENTITIES
Entity | Category | Confidence |
---|---|---|
September 2019 | DATE | 0.99+ |
Nanda Vijaydev | PERSON | 0.99+ |
Scala | TITLE | 0.99+ |
Python | TITLE | 0.99+ |
second aspect | QUANTITY | 0.99+ |
tens of customers | QUANTITY | 0.99+ |
Silicon Valley | LOCATION | 0.99+ |
HPE | ORGANIZATION | 0.99+ |
Peter Burris | PERSON | 0.99+ |
Peter | PERSON | 0.98+ |
HP | ORGANIZATION | 0.98+ |
BlueData | ORGANIZATION | 0.97+ |
each | QUANTITY | 0.97+ |
two different use cases | QUANTITY | 0.97+ |
a day | QUANTITY | 0.97+ |
third big problem | QUANTITY | 0.97+ |
a month | QUANTITY | 0.96+ |
two different things | QUANTITY | 0.96+ |
each one | QUANTITY | 0.95+ |
two different roles | QUANTITY | 0.94+ |
one | QUANTITY | 0.94+ |
today | DATE | 0.92+ |
Palo Alto California | LOCATION | 0.92+ |
SPARC | TITLE | 0.91+ |
lot of details | QUANTITY | 0.87+ |
New Age | DATE | 0.83+ |
blue data | ORGANIZATION | 0.83+ |
lot | QUANTITY | 0.81+ |
ananda | PERSON | 0.76+ |
h2o | TITLE | 0.76+ |
lot more | QUANTITY | 0.74+ |
a few hours | QUANTITY | 0.72+ |
few years | DATE | 0.7+ |
next two years | DATE | 0.69+ |
daily | QUANTITY | 0.65+ |
years | QUANTITY | 0.62+ |
weekly | QUANTITY | 0.62+ |
next three | DATE | 0.61+ |
HPE | TITLE | 0.59+ |
time | QUANTITY | 0.55+ |
Division I | QUANTITY | 0.54+ |