Nicholas Gerasimatos, Red Hat | Microsoft Ignite 2019

>>live from Orlando, Florida It's the cue covering Microsoft Ignite Brought to you by Cho He City >>Welcome back, everyone. And welcome to the cubes live coverage of Microsoft Ignite Here in Orlando, I'm your host, Rebecca Night, along with my co host Stew Minimum. We're joined by Nicholas Djerassi. Moto's He is a cloud computing evangelist at Red Hat. Thank you so much for coming on the Cube. It's a pleasure. Thank you. So tell us a little bit about what you do at Red Hat. >>So I work with a lot of red, have partners really trying to foster the ecosystem and build red have products and solutions that can actually be deployable, repeatable for different customers. So different verticals. Financial health care doesn't really matter. For the most part, I try and just focus on cloud computing and really just evangelizing a lot of our technologies that we have. >>Okay, so So what are the kinds of things you're doing here at ignite? >>So I've been spending a lot of time actually working with some of the partners, like a center IBM. We've been doing a bunch of different webinars a little bit of hands on workshops that kind of educating people about distributed computing edge computing on dhe some of the technologies that we've been working along with Microsoft. So, uh, co engineering of sequel server The man is service offering that we're doing with open shift, which is our enterprise great kubernetes platform along many other >>different things. So So, Nicholas, you know, it's been a couple of years now that we've gotten over some of the gas. Wait. Microsoft has not said that, you know, we're killing the penguins, you know, off on the side. I was in Boston for Red Hat Summit. Tatiana Della's up on stage there, you know, Red hat. You know he's not hiding at the show. So bring us inside. You know where customers deployments are happening where engineering efforts are working together. You know, we know we've been hearing for years red hats in all of the clouds and partnering all of the merit. So what? What, you know, different or special, about the Microsoft relationship? >>I mean, honestly, I think the relationship is just evolving and growing because our customers were asking for it right there, going towards hybrid and multi cloud type of strategies. They want to be able to take advantage of, you know, running rail within their own data. Centers were running rails specifically on top of Microsoft Azure, but they're also looking at other club service providers. I think it's gonna be mandated eventually at some point in time where customers are gonna start looking at diversification when it comes to running applications, wherever it makes sense, taking advantage of different you know, cloud end of service is different providers. So we've been getting a lot of time like understanding what their needs are and then trying to build the engineering to actually address those needs. I think a lot of that has really come from the co engineering that we have going on. So we have a red head engineer sitting alongside bikers, off engineers, spending a lot of time building things like the Windows distraction layer wsl things along those lines, All >>right, so I'll be a Q Khan in a couple of weeks and kubernetes still, a lot of people don't really understand where it fits Way have been saying in a Cuban eight is gonna be baked into every platform. Red hat, of course, is not really a major contributor but has a lot of customers on open shift. We had Microsoft, you know, this week, talking about as your arc is in preview. But you know, they're they're the David Taunton who does partnership, Engagement says. You know, this does not mean that we will not continue to partner with open shift in the best place to run open shift is on azure. It's the most secure. It's the best. So help us understand his toe. You know where this fits In the overall discussion of that multi hybrid cloud that we were talking about earlier. I >>think everybody wants kind of a single pane of glass for manageability. They want ability to actually look and see where their infrastructure is being deployed. One of the pitfalls of moving to the cloud is the fact that it's so easy to spend a resource is that a lot of times we lose track of where these resource is. Our or individuals leave companies, and when they leave, cos they leave behind a lot of leftover items and instances, and that becomes really costly over a period of time. Maybe not so bad if you have, you know, 100 or 500 instances. But when you talk to some of these enterprise customers that are running 110,000 instances and spending millions of dollars a month, it could get very costly. And not only that, but it could also be a security risk is well, >>so let's talk about security. What kinds of conversations are you having with regard to security and data protection at this conference? >>So you know, one of the biggest things that we've had a lot of customers asking about his redhead insights so ready in sizes away it's a smart management application that actually ties into looking at either workloads or configuration management. It could actually tell you if you have a drift. So, for example, let's say you install sequel server on well, and you miss configure it. You leave the admin account running on it, it can actually alert you and make recommendations for remediation. Or maybe in general, you're using you know, S E. Lennox is disabled. The things along those lines so insights can actually look into, uh, the operating system or the applications and tell you if there's miss configurations all right, >>a lot of discussion about developers here, You know, day to keynote was all about, you know, AP Dev And, like Sathya have been a lot of time talking about the citizen developer. Seems like that would be an intersection between what red hats doing in and Microsoft. >>Um, so I would say, you know, we're obviously very developer first focused right when we built things like Open Shift Way kind of. We're thinking about developers. Before you were thinking about operations, and later on, we actually had to build more of the operations aspects into it. Now, like, for example, in open shift, there's two different portals. There's one for the developer Focus and one for the I T admin focus with operations groups because they want to see what's going on. Developers don't really care specifically about seeing the distraction of where things are. They just want to deploy their code, get it out the door as quickly as they can, and they're really just not too concerned about the infrastructure component pieces. But all of these developers, they want to be ableto right there, applications right there code and deploy it essentially anywhere and everywhere and having the easiest process and We're really just trying to make that as simple as possible, like visual studio plug ins that we have for open shift, you know, Eclipse G and other things. So really, I mean, Red has always been very developer focused first, >>so does that seeing Microsoft Satya Nadella up on the stage talking about this developer first attitude that Microsoft is really embracing the developer. And, as you said at development for all that does seem like a bit of a cultural shift for Microsoft much more aligned with the red hat way and sort of open source. So are you talking about that within without your cut with your colleagues? That red hat, about the change that you've seen the evolution of Microsoft? >>Absolutely. I mean, if you look at, like Microsoft, the contributions that they're putting towards, like kubernetes or even contribution towards open shift, it's It's amazing, right? I mean, it's like the company's gonna complete 1 80 from the way that they used to be. There's so much more open the acquisition of Like Get Hub, for example, all these different changes, it's it's amazing. He's done amazing things with the company. I can't say enough positive things about all the wonderful things that he's done. So >>all right, so Nicholas Red Hat has an interesting position in the marketplace because you do partner with all of the clouds on the environment. While IBM is now the parent owner of Red Hat and they have a cloud, your customers touch all of them. I'm not gonna ask you to competitively analyze them. But when you're talking to customers that are choosing Azure, is there anything that calling out as to why they're choosing Microsoft where you know they have, you know, a advantage of the marketplace or what is drawing customers to them on then? Of course, redhead. With that, >>I think Microsoft is more advanced when it comes to artificial intelligence and machine learning. A, I and ML and computing. I think they're light years ahead of everyone else at this point in time. I think you know, Amazon and Google are kind of playing a little bit of catch up there, Um, and it's showing right. If you look at the power platform, for example, customers are embracing that. It's just it's fantastic looking at a lot of the changes that they've implemented and I think it's very complimentary toe the way that people are starting to build their applications. Moving towards distributed infrastructures, Micro Service's and then obviously cloud native service is as well >>in terms of the future will be. We are really just scratching the surface when it comes to to the cloud. What do you see 5 10 years from now in terms of growth rates and also in terms of the ways in which companies are using the cloud. >>So I kind of like Thio equate it towards, like, the progression that we've had with cars. I know it sounds so simple, but, you know, we went from steam engine to regular piston engines, and now we've gotten to a point where we have electric cars and there's gonna be self driving cars. I think we're gonna get to a point where code is gonna be autonomous in a sense, right self correcting ability to actually just write code and deploy it. Not really having to worry about that entire infrastructure layer. Everybody's calling it server lists. There's always gonna be a server per se, but I think we're gonna have a point where next 5 to 10 years that all of that is gonna be completely abstracted away. It's just gonna be focused on writing the code and machine learning is gonna help us actually evolve that code and make it run faster and make it run better. We're already seeing huge benefits. And when it comes to machine learning and the big data analytics and things on those lines, it's just natural progression. All right, >>love, you know what's top of mine from the customers that you're talking to Earth event. Any new learning is that you've had or, you know, things that have kind of caught your attention. >>I think the biggest thing, honestly, is really been them. The multi cloud Polly Cloud methodology that everybody seems to be embracing. It seems like every customer I'm talking to is looking at trying to avoid that vendor lock and per se, but still have that flexibility to deploy their applications wherever and still utilize cloud Native Service's without actually specifically having to, you know, go completely open source >>and one of the challenges there is every cloud. I need different skills to be able to do them. If I'm deploying it, it's the people and being able to do that. You know, we all lived through that era of trying to do multi vendor, and often it was challenges. So have we learned from what we've done in the past? Can multi cloud actually be more valuable to a company than the sum of its parts? >>I think so. And I think that's the reason why I, like Microsoft, is investing in art. For example, I think those methodologies way No multi clouds, tough. It's never gonna be easy. And so these companies need to start building in developing platforms for it. There needs to be be great if there were standard AP ice and such right, but they're never gonna do something along those lines. But I think the investments that they're putting forth now are gonna make Multiplied and Polly Cloud a lot easier in the future. And I think customers are asking for it. Customers ask for it, they're gonna build it. >>What does this mean for the workforce, though? In in terms of the kinds of candidates that cos they're going to hire because, as we said, it does require different skills and and different capabilities. So how what's your advice to the young computer scientists coming up in terms of what they should be learning. And then also, how do you think companies are making sensible of this? >>So I know from a company respectable. It's challenging a lot of companies. Especially, for example, I was talking to a very large financial institution, and they were saying that their biggest issue right now is hiring talented people to deal with Micro Service's kubernetes. Any time to hire someone, they end up getting poached by the big cloud companies. So you know, it's one of those things where people are gonna have to start diversifying their talents and look at the future. So I mean, obviously, Micro Service's are here. They're gonna continue to be here. I would say people should invest in that. But also look a server Lis, you know, I definitely think serverless these days towards the future. And then when it comes to like learning skills of multi club, I think cloud competing, that's just the number one growing in general. >>So since you didn't bring up server Lis, you know, today I hear serverless and most customers that I talked to that means a W s number two in the space probably is Microsoft, but there's efforts in to try to help, you know, give a little bit of open source and standardization there. Where's Red Hat? Stand on this. What do you see? What from Microsoft? What are you hearing from customers? >>Were heavily contribute all the different, you know, projects, trying to make server lists like easier to use and not so much specific vendors, Right? So whether that's, you know, Apache, spar or whatever you want to consider it to be, were trying to invest. Invest in those different types of technologies. I think the main issue we serve earless right now is we still don't really know how to utilize it effectively. And it's still kind of this gray area in a sense, right? It's cutting edge, bleeding edge emerging technologies. And it's just, in my opinion, it's not perfectly ready for prime time. But I think that's specifically because there's just not enough people that are actually invested in it. This point in time. So >>So what are you gonna take back with you when you head back to Phoenix from from this conference? What are the things that have sparked your interest the most. >>Gosh, I live, I would probably have to say, Really digging in deep on the Ark announcement. I think that's the thing that I'm most interested in, understanding how how we can actually contribute to that and maybe make that plug double for things like open Shift. You know, whether it's open shift on premise, open shit, running in the cloud on another, Well, architecture's, you know, things like insights. Being able to plug into that, I really see us trying to work with Microsoft to start building those things. >>Well, Nicholas, thank you so much for coming on. The cubit was really fabulous conversation. Thank you. I'm Rebecca Knight for Sue minimum. Stay tuned for more of the cubes. Live coverage from Microsoft ignite.

Published Date : Nov 6 2019

SUMMARY :

So tell us a little bit about what you do at Red Hat. For the most part, I try and just focus on cloud computing and really just evangelizing a lot of our technologies that computing edge computing on dhe some of the technologies that we've been working along with Microsoft. we're killing the penguins, you know, off on the side. taking advantage of different you know, cloud end of service is different providers. We had Microsoft, you know, this week, talking about as your arc is in is the fact that it's so easy to spend a resource is that a lot of times we lose track of where these resource is. What kinds of conversations are you having with regard to security So you know, one of the biggest things that we've had a lot of customers asking about his redhead insights so ready you know, AP Dev And, like Sathya have been a lot of time talking about the citizen developer. like visual studio plug ins that we have for open shift, you know, Eclipse G and other things. So are you talking about that within I mean, if you look at, like Microsoft, the contributions that they're putting towards, all right, so Nicholas Red Hat has an interesting position in the marketplace because you do partner with all of the clouds I think you know, Amazon and Google are kind of playing a little bit of catch up there, We are really just scratching the surface when it comes to to I know it sounds so simple, but, you know, we went from steam engine to regular piston engines, love, you know what's top of mine from the customers that you're talking to Earth event. Native Service's without actually specifically having to, you know, go completely open If I'm deploying it, it's the people and being able to do that. And I think that's the reason why I, like Microsoft, is investing in art. In in terms of the kinds of candidates that cos they're going to hire because, So you know, but there's efforts in to try to help, you know, give a little bit of open Were heavily contribute all the different, you know, projects, trying to make server lists like easier So what are you gonna take back with you when you head back to Phoenix from from this conference? open shit, running in the cloud on another, Well, architecture's, you know, things like insights. Well, Nicholas, thank you so much for coming on.

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
Nicholas Djerassi	PERSON	0.99+
Nicholas Gerasimatos	PERSON	0.99+
Rebecca Knight	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Google	ORGANIZATION	0.99+
Nicholas	PERSON	0.99+
Microsoft	ORGANIZATION	0.99+
Tatiana Della	PERSON	0.99+
David Taunton	PERSON	0.99+
Rebecca Night	PERSON	0.99+
110,000 instances	QUANTITY	0.99+
Boston	LOCATION	0.99+
Satya Nadella	PERSON	0.99+
Phoenix	LOCATION	0.99+
Red Hat	ORGANIZATION	0.99+
Orlando, Florida	LOCATION	0.99+
today	DATE	0.99+
Stew Minimum	PERSON	0.99+
500 instances	QUANTITY	0.99+
one	QUANTITY	0.99+
Earth	LOCATION	0.99+
first	QUANTITY	0.99+
Sathya	PERSON	0.99+
this week	DATE	0.99+
Apache	ORGANIZATION	0.98+
100	QUANTITY	0.98+
Red Hat Summit	EVENT	0.98+
Orlando	LOCATION	0.98+
red	ORGANIZATION	0.97+
Eclipse G	TITLE	0.96+
One	QUANTITY	0.96+
red hat	PERSON	0.96+
10 years	QUANTITY	0.94+
Red hat	ORGANIZATION	0.93+
5	QUANTITY	0.93+
single pane	QUANTITY	0.92+
Cuban	OTHER	0.91+
two different portals	QUANTITY	0.91+
Red Hat	PERSON	0.91+
millions of dollars a month	QUANTITY	0.89+
S E. Lennox	PERSON	0.89+
Windows	TITLE	0.87+
red hats	ORGANIZATION	0.86+
Thio	PERSON	0.85+
AP	ORGANIZATION	0.83+
Sue minimum	PERSON	0.77+
Nicholas Red Hat	PERSON	0.76+
5 10 years	QUANTITY	0.74+
spar	ORGANIZATION	0.74+
Azure	TITLE	0.71+
1 80	QUANTITY	0.69+
Get Hub	ORGANIZATION	0.68+
Red	ORGANIZATION	0.68+
open shift	TITLE	0.66+
redhead	PERSON	0.65+
double	QUANTITY	0.65+
Red Hat	TITLE	0.65+
Cho He City	ORGANIZATION	0.61+
2019	DATE	0.55+
Multiplied	ORGANIZATION	0.52+
Polly Cloud	ORGANIZATION	0.5+
two	QUANTITY	0.48+
eight	QUANTITY	0.48+
couple	QUANTITY	0.46+
Ark	ORGANIZATION	0.46+

Dinesh Nirmal, IBM | CUBEConversation

(upbeat music) >> Hi everyone. We have a special program today. We are joined by Dinesh Nirmal, who is VP of Development and Analytics, for Analytics at the IBM company and Dinesh has an extremely broad perspective on what's going on in this part of the industry, and IBM has a very broad portfolio. So, between the two of us, I think we can cover a lot of ground today. So, Dinesh, welcome. >> Oh thank you George. Great to be here. >> So just to frame the discussion, I wanted to hit on sort of four key highlights. One is balancing the compatibility across cloud on-prem, and edge versus leveraging specialized services that might be on any one of those platforms. And then harmonizing and simplifying both the management and the development of services across these platforms. You have that trade-off between: do I do everything compatibly; or can I take advantage of platforms, specific stuff? And then, we've heard a huge amount of noise on Machine Learning. And everyone says they're democratizing it. We want to hear your perspective on how you think that's most effectively done. And then, if we have time, the how to manage Machine Learning feedback, data feedback loops to improve the models. So, having started with that. >> So you talked about the private cloud and the public cloud and then, how do you manage the data and the models, or the other analytical assets across the hybrid nature of today. So, if you look at our enterprises, it's a hybrid format that most customers adopt. I mean you have some data in the public side; but you have your mission critical data, that's very core to your transactions, exist in the private cloud. Now, how do you make sure that the data that you've pushed on the cloud, that you can go use to build models? And then you can take that model deployed on-prem or on public cloud. >> Is that the emerging sort of mainstream design pattern, where mission critical systems are less likely to move, for latency, for the fact that they're fused to their own hardware, but you take the data, and the researching for the models happens up in the cloud, and then that gets pushed down close to where the transaction decisions are. >> Right, so there's also the economics of data that comes into play, so if you are doing a, you know, large scale neural net, where you have GPUs, and you want to do deep learning, obviously, you know, it might make more sense for you to push it into the cloud and be able to do that or one of the other deep learning frameworks out there. But then you have your core transactional data that includes your customer data, you know, or your customer medical data, which I think some customers might be reluctant to push on a public cloud, and then, but you still want to build models and predict and all those things. So I think it's a hybrid nature, depending on the sensitivities of the data. Customers might decide to put it on cloud versus private cloud which is in their premises, right? So then how do you serve those customer needs, making sure that you can build a model on the cloud and that you can deploy that model on private cloud or vice versa. I mean, you can build that model on private cloud or only on private, and then deployed on your public cloud. Now the challenge, one last statement, is that people think, well, once I build a model, and I deploy it on public cloud, then it's easy, because it's just an API call at that time, just to call that model to execute the transactions. But that's not the case. You take support vector machine, for example, right, that still has vectors in there, that means your data is there, right, so even though you're saying you're deploying the model, you still have sensitive data there, so those are the kind of things customers need to think about before they go deploy those models. >> So I might, this is a topic for our Friday interview with a member of the Watson IT family, but it's not so black and white when you say we'll leave all your customer data with you, and we'll work on the models, because it, sort of like, teabags, you know, you can take the customer's teabag and squeeze some of the tea out, in your IBM or public cloud, and give them back the teabag, but you're getting some of the benefit of this data. >> Right, so like, it depends, depends on the algorithms you build. You could take a linear regression, and you don't have the challenges I mentioned, in support of retro machine, because none of the data is moving, it's just modeled. So it depends, I think that's where, you know, like Watson has done, will help tremendously because the data is secure in that sense. But if you're building on your own, it's a different challenge, you've got to make sure you pick the right algorithms to do that. >> Okay, so let's move on to the modern sort of what we call operational analytic pipeline, where the key steps are ingest, process, analyze, predict, serve, and you can drill down on those more. Today there's, those pipelines are pretty much built out of multi-vendor components. How do you see that evolving under pressures of, or tension between simplicity, coming from one vendor, and the pieces all designed together, and the specialization, where you want to have a, you know, unique tool in one component. >> Right, so you're exactly right. So you can take a two prong approach. One is, you can go to a cloud provider, and get each of the services, and you stitch it together. That's one approach. A challenging approach, but that has its benefits, right, I mean, you bring some core strengths from each vendor into it. The other one is the integrate approach, where you ingest the data, you shape or cleanse the data, you get it prepared for analytics, you build the model, you predict, you visualize. I mean, that all comes in one. The benefit there is you get the whole stack in one, you have one you have a whole pipeline that you can execute, you have one service provider that's giving them services, it's managed. So all those benefits come with it, and that's probably the preferred way for it integrated all together in one stack, I think that's the path most people go towards, because then you have the whole pipeline available to you, and also the services that comes with it. So any updates that comes with it, and how do you make sure, if you take the first round, one challenge you have is how do you make sure all these services are compatible with each other? How do you make sure they're compliant? So if you're an insurance company, you want it to be HIPAA compliant. Are you going to individually make sure that each of these services are HIPAA compliant? Or would you get from one integrated provider, you can make sure they are HIPAA compliant, tests are done, so all those benefits, to me, outweigh you going, putting unmanaged service all together, and then creating a data link to underlay all of it. >> Would it be fair to say, to use an analogy, where Hadoop, being sort of, originating in many different Apache products, is a quasi-multi vendor kind of pipeline, and the state of, the state of the machine learning analytic pipeline, is still kind of multi-vendor today. You see that moving toward single vendor pipeline, who do you see as the sort of last man standing? >> So, I mean, I can speak from an IBM perspective, I can say that the benefit that a company, a vendor like IBM brings forward, is like, so the different, public or private cloud or hybrid, you obviously have the choice of going to public cloud, you can get the same service on public cloud, so you get a hybrid experience, so that's one aspect of it. Then, if you get the integrated solution, all the way from ingest to visualization, you have one provider, it's tested, it's integrated, you know, it's combined, it works well together, so I would say, going forward, if you look at it purely from an enterprise perspective, I would say integrated solutions is the way to go, because that what will be the last man standing. I'll give you an example. I was with a major bank in Europe, about a month ago, and I took them through our data science experience, our machine learning project and all that, and you know, the CTO's take was that, Dinesh, I got it. Building the model itself, it only took us two days, but incorporating our model into our existing infrastructure, it has been 11 months, we haven't been able to do it. So that's the challenge our enterprises face, and they want an integrated solution to bring that model into their existing infrastructure. So that's, you know, that's my thought. >> Today though, let's talk about the IBM pipeline. Spark is core, Ingest is, off the-- >> Dinesh: Right, so you can do spark streaming, you can use Kafka, or you can use infostream which is our proprietary tool. >> Right, although, you wouldn't really use structured streaming for ingest, 'cause of the back pressure? >> Right, so they are-- >> The point that I'm trying to make is, it's still multi-vendor, and then the serving side, I don't know, where, once the analysis is done and predictions are made, some sort of sequel database has to take over, so it's, today, it's still pretty multi vendor. So how do you see any of those products broadening their footprints so that the number of pieces decreases. >> So good question, they are all going to get into end pipeline, because that's where the value is, unless you provide an integrated end to end solution for a customer, especially parts customer it's all about putting it all together, and putting these pieces together is not easy, even if you ingest the data, IOP kind of data, a lot of times, 99% of the time, data is not clean, unless you're in a competition where you get cleansed data, in real world, that never happens. So then, I would say 80% of a data scientists time is spent on cleaning the data, shaping the data, preparing the data to build that pipeline. So for most customers, it's critical that they get that end to end, well oiled, well connected solution integrated solution, than take it from each vendor, every isolated solution. To answer your question, yes, every vendor is going to move into the ingest, data cleansing phase, transformation, and the building the pipeline and then visualization, if you look at those five steps, has to be developed. >> But just building the data cleansing and transformation, having it in your, native to your own pipeline, that doesn't sound like it's going to solve the problem of messy data that needs, you know, human supervision to correct. >> I mean, so there is some level of human supervision to be sure, so I'll give you an example, right, so when data from an insurance company goes, a lot of times, the gender could be missing, how do you know if it's a male or female? Then you got to build another model to say, you know, this patient has gone for a prostate exam, you know, it's a male, gynecology is a female, so you have to do some intuitary work in there, to make sure that the data is clean, and then there's some human supervision to make sure that this is good to build models, because when you're executing that pipeline in real time, >> Yeah. >> It's all based on the past data, so you want to make sure that the data is as clean as possible to train and model, that you're going to execute on. >> So, let me ask you, turning to a slide we've got about complexity, and first, for developers, and then second, for admins, if we take the steps in the pipeline, as ingest, process, analyze, predict, serve, and sort of products or product categories as Kafka, Spark streaming and sequel, web service for predict, and MPP sequel, or no sequel for serve, even if they all came from IBM, would it be possible to unify the data model, the addressing and name space, and I'm just kicking off a few that I can think of, programming model, persistence, transaction model, workflow, testing integration, there's one thing to say it's all IBM, and then there's another thing, so that the developer working with it, sees as it as one suite. >> So it has to be validated, and that's the benefit that IBM brings already, because we obviously test each segment to make sure it works, but when you talk about complexity, building the model is one, you know, development of the model, but now the complexity also comes in the deployment of the model, now we talk about the management of the model, where, how you monitor it? When was the model deployed, was it deployed in tests, was it deployed in production, and who changed that model last, what was changed, and how is it scoring? Is it scoring high or low? You want to get notification when the model starts going low. So complexity is all the way across, all the way from getting the data, in cleaning the data, developing the model, it never ends. And the other benefit that IBM has added is the feedback loop, where when you talk about complexity, it reduces the complexity, so today, if the model scores low, you have to take it offline, retrain the model based on the new data, and then redeploy it. Usually for enterprises, there is slots where you can take it offline, put it back online, all these things, so it's a process. What we have done is created a feedback loop where we are training the model in real time, using real time data, so the model is continuously-- >> Online learning. >> Online learning. >> And challenger, champion, or AB testing to see which one is more robust. >> Right, so you can do that, I mean, you could have multiple models where you can do AB testing, but in this case, you can condition, train the model to say, okay, this model scores the best. And then, another benefit is that, if you look at the whole machine learning process, there's the data, there's development, there's deployment. On development side, more and more it's getting commoditized, meaning picking the right algorithm, there's a lot of tools, including IBM, where he can say, question what's the right one to use for this, so that piece is getting a little, less complex, I don't want to say easier, but less complex, but the data cleansing and the deployment, these are two enterprises, when you have thousands of models how do you make sure that you deploy the right model. >> So you might say that the pipeline for managing the model is sort of separate from the original data pipeline, maybe it includes the same technology, or as much of the same technology, but once your pipeline, your data pipeline is in production, the model pipeline has to keep cycling through. >> Exactly, so the data pipeline could be changing, so if you take a lone example, right, a lot of the data that goes in the model pipeline, is static, I mean, my age, it's not going to change every day, I mean, it is, but you know, the age that goes into my salary, my race, my gender, those are static data that you can take from data and put it in there, but then there's also real time data that's coming, my loan amount, my credit score, all those things, so how do you bring that data pipeline between real time and static data, into the model pipeline, so the model can predict accurately and based on the score dipping, you should be able to re-try the model using real time data. >> I want to take, Dinesh, to the issue of a multi-vendor stack again, and the administrative challenges, so here, we look at a slide that shows me just rattling off some of the admin challenges, governance, performance modeling, scheduling orchestration, availability, recovering authentication, authorization, resource isolation, elasticity, testing integration, so that's the Y-axis, and then for every different product in the pipeline, as the access, say Kafka, Spark, structured streaming MPP, sequel, no sequel, so you got a mess. >> Right. >> Most open source companies are trying to make life easier for companies by managing their software as a service for the customer, and that's typically how they monetize. But tell us what you see the problem is, or will be with that approach. >> So, great question. Let me take a very simple example. Probably most of our audience know about GDPR, which is the European law to write to forget. So if you're an enterprise, and say, George, I want my data deleted, you have to delete all of my data within a period of time. Now, that's where one of the aspects you talked about with governance comes in. How do you make sure you have governance across not just data but your individual assets? So if you're using a multi-vendor solution, in all of that, that state governance, how do I make sure that data get deleted by all these services that's all tied together. >> Let me maybe make an analogy. On CSI, when they pick up something at the crime scene, they got to make sure that it's bagged, and the chain of custody doesn't lose its integrity all the way back to the evidence room. I assume you're talking about something like that. >> Yeah, something similar. Where the data, as it moves between private cloud, public cloud, analytical assets, is using that data, all those things need to work seamlessly for you to execute that particular transaction to delete data from everywhere. >> So that's, it's not just administrative costs, but regulations that are pushing towards more homogenous platforms. >> Right, right, and even if you take some of the other things on the stack, monitoring, logging, metering, provides some of those capabilities, but you have to make sure when you put all these services together, how are they going to integrate all together? You have one monitoring stack, so if you're pulling you know, your IOT kind of data into a data center, or your whole stack evaluation, how do you make sure you're getting the right monitoring data across the board? Those are the kind of challenges that you will have. >> It's funny you mention that, because we were talking to an old Lotus colleague of mine, who was CTO of Microsoft's IT organization, and we were talking about how the cloud vendors can put machine learning application, machine learning management application across their properties, or their services, but he said one of the first problems he'll encounter is the telemetry, like it's really easy on hardware, CPUs, utilization, memory utilization, a noise enabler for iO, but as you get higher up in the application services, it becomes much more difficult to harmonize, so that a program can figure out what's going wrong. >> Right, and I mean, like anomaly detection, right? >> Yes. >> I mean, how do you make sure you're seeing patterns where you can predict something before it happens, right? >> Is that on the road map for...? >> Yeah, so we're already working with some big customers to say, if you have a data center, how do you look at outage to predict what can go wrong in the future, root cause analysis, I mean, that is a huge problem solved. So let's say customer hit a problem, you took an outage, what caused it? Because today, you have specialists who will come and try to figure out what the problem is, but can we use machine learning or deep learning to figure out, is it a fix that was missing, or an application got changed that caused a CPU spike, that caused the outage? So that whole cost analysis is the one that's the hardest to solve, because you are talking about people's decades worth of knowledge, now you are influencing a machine to do that prediction. >> And from my understanding, root cause analysis is most effective when you have a rich model of how your, in this case, data structure and apps are working, and there might be many little models, but they're held together by some sort of knowledge graph that says here is where all the pieces fit, these are the pieces below these, sort of as peers to these other things, how does that knowledge graph get built in, and is this the next generation of a configuration management database. >> Right, so I call it the self-healing, self-managing, self-fixing data center. It's easy for you to turn up the heat or A/C, the temperature goes down, I mean, those are good, but the real value for a customer is exactly what you mentioned, building up that knowledge graft from different models that all comes together, but the hardest part, is, how do you, predicting an anomaly is one thing, but getting to the root cause is a different thing, because at that point, now you're saying, I know exactly what's caused this problem, and I can prevent it from happening again. That's not easy. We are working with our customers to figure out how do we get to the root cause analysis, but it's all about building the knowledge graph with multiple models coming from different systems, today, I mean enterprises have different systems from multi-vendors. We have to bring all that monitoring data into one source, and that's where that knowledge comes in, and then different models will feed that data, and then you need to prime that data, using deep learning algorithms to say, what caused this? >> Okay, so this actually sounds extremely relevant, although we're probably, in the interest of time, going to have to dig down on that one another time, but, just at a high level, it sounds like the knowledge graph is sort of your web or directory, into how local components or local models work, and then, knowing that, if it sees problems coming up here, it can understand how it affects something else tangentially. >> So think of knowledge graph as a neural net, because it's just building new neural net based on the past data, and it has that built-in knowledge where it says, okay, these symptoms seem to be a problem that I have encountered in the past. Now I can predict the root cause because I know this happened in the past. So it's kind of like you putting that net to build new problem determinations as it goes along. So it's a complex task. It's not easy to get to root cause analysis. But that's something we are aggressively working on developing. >> Okay, so let me ask, let's talk about sort of democratizing machine learning and the different ways of doing that. You've actually talked about the big pain points, maybe not so sexy, but that are critical, which is operationalizing the models, and preparing the data. Let me bounce off you some of the other approaches. One that we have heard from Amazon is that they're saying, well, data expunging might be an issue, and operationalizing the models might be an issue, but the biggest issue in terms of making this developer ready, is we're going to take the machine learning we use to run our business, whether it's merchandising fashion, running recommendation engines, managing fulfillment or logistics, and just like I did with AWS, they're dog-fooding it internally, and then they're going to put it out on AWS as a new layer of a platform. Where do you see that being effective, and where less effective? >> Right, so let me answer the first part of your question, the democratization of learning. So that happens when for example, a real estate agent who has no idea about machine learning, be able to come and predict the house prices in this area. That's to me, is democratizing. Because at that time, you have made it available to everyone, everyone can use it. But that comes back to our first point, which is having that clean set of data. You can build all the pre-canned pipelines out there, but if you're not feeding the set of data into, none of this, you know. Garbage in, garbage out, that's what you're going to get. So when we talk about democratization, it's not that easy and simple because you can build all this pre-canned pipelines that you have used in-house for your own purposes, but every customer has many unique cases. So if I take you as a bank, your fraud detection methods is completely different than me as a bank, my limit for fraud detection could be completely different. So there is always customization that's involved, the data that's coming in is different, so while it's a buzzword, I think there's knowledge that people need to feed it, there's models that needs to be tuned and trained, and there's deployment that is completely different, so you know, there is work that has to be done. >> So then what I'm taking away from what you're saying is, you don't have to start from ground zero with your data, but you might want to add some of your data, which is specialized, or slightly different from what the pre-trained model is, you still have to worry about operationalizing it, so it's not a pure developer ready API, but it uplevels the skills requirement so that it's not quite as demanding as working with TensorFlow or something like that. >> Right, I mean, so you can always build pre-canned pipelines and make it available, so we have already done that. For example, fraud detection, we have pre-canned pipelines for IT analytics, we have pre-canned pipelines. So it's nothing new, you can always do what you have done in house, and make it available to the public or the customers, but then they have to take it and have to do customization to meet their demands, bring their data to re-train the model, all those things has to be done, it's not just about providing the model, but every customer use case is completely different, whether you are looking at fraud detection from that one bank's perspective, not all banks are going to do the same thing. Same thing for predicting, for example, the loan, I mean, your loan approval process is going to be completely different than me as a bank loan approval process. >> So let me ask you then, and we're getting low on time here, but what would you, if you had to characterize Microsoft, Azure, Google, Amazon, as each bringing to bear certain advantages and disadvantages, and you're now the ambassador, so you're not a representative of IBM, help us understand the sweet spot for each of those. Like you're trying to fix the two sides of the pipeline, I guess, thinking of it like a barbell, you know, where are the others based on their data assets and their tools, where do they need to work. >> So, there's two aspects of it, there's enterprise aspect, so as an enterprise, I would like to say, it's not just about the technology, but there's also the services aspect. If my model goes down in the middle of the night, and my banking app is down, who do I call? If I'm using a service that is available on the cloud provider which is open source, do I have the right amount of coverage to call somebody and fix it. So there's the enterprise capabilities, availabilities, reliability, that is different, than a developer comes in, has a CSV file that he or she wants to build a model to predict something, that's different, this is different, two different aspects. So if you talk about, you know, all these vendors, if I'm bearing an enterprise card, some of the things I would look is, can I get an integrated solution, end to end on the machine learning platform. >> And that means end to end in one location, >> Right. >> So you don't have network issues or latency and stuff like that. >> Right, it's an integrated solution, where I can bring in the data, there's no challenges to latency, those kinds of things, and then can I get the enterprise level service, SLA all those things, right? So, in there, the named vendors obviously have an upper hand, because they are preferred to enterprises than a brand new open source that will come along, but then there is, within enterprises, there are a line of businesses building models, using some of the open source vendors, which is okay, but eventually they'd have to get deployed and then how do you make sure you have that enterprise capabilities up there. So if you ask me, I think each vendor brings some capabilities. I think the benefit IBM brings in is, one, you have the choice or the freedom to bring in cloud or on-prem or hybrid, you have all the choices of languages, like we support R, Python Spar, Spark, I mean, SPS, so I mean, the choice, the freedom, the reliability, the availability, the enterprise nature, that's where IBM comes in and differentiates, and that's for our customers, a huge plus. >> One last question, and we're really out of time, in terms of thinking about a unified pipeline, when we were at Spark Summit, sitting down with Matei Zaharia and Reynold Shin, the question came up, the data breaks has an incomplete pipeline, no persistence, no ingest, not really much in the way of serving, but boy are they good at, you know, data transmigration, and munging and machine learning, but they said they consider it part of their ultimate responsibility to take control. And on the ingest side it's Kafka, the serving side, might be Redis or something else, or the Spark databases like Snappy Data and Splice Machine. Spark is so central to IBM's efforts. What might a unified Spark pipeline look like? Have you guys thought about that? >> It's not there, obviously they probably could be working on it, but for our purpose, Spark is critical for us, and the reason we invested in Spark so much is because of the executions, where you can take a tremendous amount of data, and, you know, crunch through it in a very short amount of time, that's the reason, we also invented Spark Sequel, because we have a good chunk of customers still use Sequel heavily, We put a lot of work into the Spark ML, so we are continuing to invest, and probably they will get to and integrated into a solution, but it's not there yet, but as it comes along, we'll adapt. If it meets our needs and demands, and enterprise can do it, then definitely, I mean, you know, we saw that Spark's core engine has the ability to crunch a tremendous amount of data, so we are using it, I mean, 45 of our internal products use Spark as our core engine. Our DSX, Data Science Experience, has Spark as our core engine. So, yeah, I mean, today it's not there, but I know they're probably working on it, and if there are elements of this whole pipeline that comes together, that is convenient for us to use, and at enterprise level, we will definitely consider using it. >> Okay, on that note, Dinesh, thanks for joining us, and taking time out of your busy schedule. My name is George Gilbert, I'm with Dinesh Nirmal from IBM, VP of Analytics Development, and we are at the Cube studio in Palo Alto, and we will be back in the not too distant future, with more interesting interviews with some of the gurus at IBM. (peppy music)

Published Date : Aug 22 2017

SUMMARY :

So, between the two of us, I think Oh thank you George. the how to manage Machine Learning feedback, that you can go use to build models? but you take the data, and the researching for and that you can deploy that model on private cloud but it's not so black and white when you say and you don't have the challenges I mentioned, and the specialization, where you want to have and get each of the services, and you stitch it together. who do you see as the sort of last man standing? So that's, you know, that's my thought. Spark is core, Ingest is, off the-- Dinesh: Right, so you can do spark streaming, so that the number of pieces decreases. and then visualization, if you look at those five steps, of messy data that needs, you know, human supervision so you want to make sure that the data is as clean as in the pipeline, as ingest, process, analyze, if the model scores low, you have to take it offline, to see which one is more robust. Right, so you can do that, I mean, you could have So you might say that the pipeline for managing I mean, it is, but you know, the age that goes MPP, sequel, no sequel, so you got a mess. But tell us what you see the problem is, Now, that's where one of the aspects you talked about and the chain of custody doesn't lose its integrity for you to execute that particular transaction to delete but regulations that are pushing towards more Those are the kind of challenges that you will have. It's funny you mention that, because we were to say, if you have a data center, how do you look at most effective when you have a rich model and then you need to prime that data, using deep learning but, just at a high level, it sounds like the knowledge So it's kind of like you putting that net Let me bounce off you some of the other approaches. pipelines that you have used in-house for your own purposes, the pre-trained model is, you still have to worry So it's nothing new, you can always do what you have So let me ask you then, and we're getting low on time So if you talk about, you know, all these vendors, So you don't have network issues or latency and then how do you make sure you have that but boy are they good at, you know, where you can take a tremendous amount of data, of the gurus at IBM.

ENTITIES

Entity	Category	Confidence
Microsoft	ORGANIZATION	0.99+
George Gilbert	PERSON	0.99+
IBM	ORGANIZATION	0.99+
George	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Dinesh Nirmal	PERSON	0.99+
99%	QUANTITY	0.99+
Google	ORGANIZATION	0.99+
80%	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
two	QUANTITY	0.99+
HIPAA	TITLE	0.99+
Dinesh	PERSON	0.99+
Reynold Shin	PERSON	0.99+
Friday	DATE	0.99+
AWS	ORGANIZATION	0.99+
Today	DATE	0.99+
today	DATE	0.99+
five steps	QUANTITY	0.99+
45	QUANTITY	0.99+
two days	QUANTITY	0.99+
11 months	QUANTITY	0.99+
each segment	QUANTITY	0.99+
first part	QUANTITY	0.99+
two enterprises	QUANTITY	0.99+
One	QUANTITY	0.99+
first point	QUANTITY	0.99+
first round	QUANTITY	0.99+
each vendor	QUANTITY	0.99+
Lotus	TITLE	0.99+
each	QUANTITY	0.99+
Azure	ORGANIZATION	0.99+
two aspects	QUANTITY	0.99+
one challenge	QUANTITY	0.99+
one approach	QUANTITY	0.99+
Spark	TITLE	0.99+
two sides	QUANTITY	0.99+
Cube	ORGANIZATION	0.99+
one stack	QUANTITY	0.98+
one component	QUANTITY	0.98+
one source	QUANTITY	0.98+
GDPR	TITLE	0.98+
One last question	QUANTITY	0.98+
one	QUANTITY	0.98+
thousands of models	QUANTITY	0.98+
one vendor	QUANTITY	0.98+
both	QUANTITY	0.98+
Kafka	TITLE	0.97+
one thing	QUANTITY	0.97+
Sequel	TITLE	0.97+
one location	QUANTITY	0.97+
second	QUANTITY	0.96+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for spar: