Maribel Lopez & Zeus Kerravala | theCUBE on Cloud 2021

>>from around the globe. It's the Cube presenting Cuban cloud brought >>to you by silicon angle. Okay, we're back. Here. Live Cuban Cloud. And this is Dave. Want with my co host, John Ferrier Were all remote. We're getting into the analyst power half hour. Really pleased to have Maribel Lopez here. She's the principal and founder of Lopez Research and Zias Caraballo, who is the principal and founder of ZK research. Guys, great to see you. Let's get into it. How you doing? >>Great. How you been? Good, >>thanks. Really good. John's hanging in there quarantining and, uh, all healthy, So I hope you guys are too. Hey, Mary, But let's start with you. You know, here we are on 2021 you know, just exited one of the strangest years, if not the strangest year of our lives. But looking back in the past decade of cloud and we're looking forward. How do you see that? Where do we come from? Where we at and where we going >>When we obviously started with the whole let's build a public cloud and everything was about public cloud. Uh, then we went thio the notion of private cloud than we had hybrid cloud and multi cloud. So we've done a lot of different clouds right now. And I think where we are today is that there's a healthy recognition on the cloud computing providers that you need to give it to the customers the way they want it, not the way you've decided to build it. So how do you meet them where they are so that they can have a cloud like experience wherever they want their data to be? >>Yes and yes, you've, you know, observed, This is well, in the early days of cloud, you heard a lot of rhetoric. It was private cloud And and then now we're, you know, hearing a lot of multi cloud and so forth. But initially, a lot of the traditional vendors kind of pooh poohed it. They called us analysts. We said we were all cloud crazy, but they seem to have got their religion. >>Well, everything. Everyone's got a definition of cloud, but I actually think we are right in the midst of another transformation of clouds Miracle talked about. We went from, you know, private clouds, which is really hosting the public cloud to multi cloud hybrid cloud. And if you look at the last post that put on Silicon Angle, which was talking about five acquisition of Volterra, I actually think we're in the midst of the transition to what's called distributed Club, where if you look at modernized cloud apps today, they're actually made up of services from different clouds on also distributed edge locations. And that's gonna have a pretty profound impact on the way we build out, because those distributed edges be a telco edge, cellular vagina. Th whatever the services that lived there are much more ephemeral in nature, right? So the way we secure the way we connect changes quite a bit. But I think that the great thing about Cloud is we've seen several several evolutionary changes. So what the definition is and we're going through that now, which is which is pretty cool to think about, right? It's not a static thing. Um, it's, uh, you know, it's a it's an ongoing transition. But I think, uh, you know, we're moving into this distributed Cloudera, which to me is a lot more complex than what we're dealing with in the Palace. >>I'm actually pretty excited about that because I think that this move toe edge and the distribution that you've talked about, it's like we now have processing everywhere. We've got it on devices, we've got it in, cars were moving, the data centers closer and closer to where the action's happening. And I think that's gonna be a huge trend for 2021. Is that distributed that you were talking about a lot of edge discussion? You >>know what? The >>reason we're doing This, too, is we want. It's not just we're moving the data closer to the user, right? And some. If you think you brought up the autonomous vehicle right in the car being an edge, you think of the data that generates right? There's some things such as the decision to stop or not right that should be done in car. I don't wanna transport that data all the way back to Google him back to decide whether I want to stop. You could also use the same data determine whether drivers driving safely for insurance purposes, right? So the same data give me located at the edge or in a centralized cloud for different purposes, and I think that's what you know, kind of cool about this is we're being able to use our data and much different ways. Now. >>You know, it's interesting is it's so complex. It's mind blowing because this is distributed computing. Everyone kind of agrees this is where it is. But if you think about the complexity and I want to get your guys reaction to this because you know some of the like side fringe trend discussions are data sovereignty, misinformation as a vulnerability. Okay, you get the chips now you got gravitas on with Amazon in front. Apple's got their own chips. Intel is gonna do a whole new direction. So you've got tons of computer. And then you mentioned the ephemeral nature. How do you manage those? What's the observe ability look like? They're what's the trust equation? So all these things kind of play into it. It sounds almost mind blowing, just even thinking about it. But how do you guys, this analyst tryto understand where someone's either blowing bullshit or kind of like has the real deal? Because all those things come into play? I mean, you could have a misinformation campaign targeting the car. Let's say Hey, you know that that data is needs to be. This is this is misinformation who's a >>in a lot of ways, this creates almost unprecedented opportunity now for for starts and for companies to transform right. The fundamental tenet of my research has always been share shifts happen when markets transition and we're in the middle of the big one. If the computer resource is we're using, John and the application resource will be using or ephemeral nature than all the things that surrounded the way we secured the way we connect. Those also have to be equal, equally agile, right, So you can't have, you know, you think of a micro services based application being secured with traditional firewalls, right? Just the amount of, or even virtual the way that the length of time it takes to spend those things up is way too long. So in many ways, this distributed cloud change changes everything in I T. And that that includes all of the services in the the infrastructure that we used to secure and connect. And that's a that is a profound change, and you mentioned the observe ability. You're right. That's another thing that the traditional observe ability tools are based on static maps and things and, you know, traditional up, down and we don't. Things go up and down so quickly now that that that those don't make any sense. So I think we are going to see quite a rise in different types of management tools and the way they look at things to be much more. I suppose you know Angela also So we can measure things that currently aren't measurable. >>So you're talking about the entire stack. Really? Changing is really what you're inferring anyway from your commentary. And that would include the programming model as well, wouldn't it? >>Absolutely. Yeah. You know, the thing that is really interesting about where we have been versus where we're going is we spent a lot of time talking about virtual izing hardware and moving that around. And what does that look like? And that, and creating that is more of a software paradigm. And the thing we're talking about now is what is cloud is an operating model look like? What is the manageability of that? What is the security of that? What? You know, we've talked a lot about containers and moving into a different you know, Dev suck ups and all those different trends that we've been talking about, like now we're doing them. So we've only got into the first crank of that. And I think every technology vendor we talked to now has to address how are they going to do a highly distributed management and security landscape? Like, what are they gonna layer on top of that? Because it's not just about Oh, I've taken Iraq of something server storage, compute and virtualized it. I now have to create a new operating model around it. In a way, we're almost redoing what the OS I stack looks like and what the software and solutions are for that. >>So >>it was really Hold on, hold on, hold on their lengthened. Because that side stack that came up earlier today, Mayor. But we're talking about Yeah, we were riffing on the OSC model, but back in the day and we were comparing the S n a definite the, you know, the proprietary protocol stacks that they were out there and someone >>said Amazon's S N a. Is that recall? E think that's what you said? >>No, no. Someone in the chest. That's a comment like Amazon's proprietary meaning, their scale. And I said, Oh, that means there s n a But if you think about it, that's kind of almost that can hang. Hang together. If the kubernetes is like a new connective tissue, is that the TCP pipe moment? Because I think Os I kind of was standardizing at the lower end of the stack Ethernet token ring. You know, the data link layer physical layer and that when you got to the TCP layer and really magic happened right to me, that's when Cisco's happened and everything started happening then and then. It kind of stopped because the application is kinda maintain their peace there. A little history there, but like that's kind of happening now. If you think about it and then you put me a factor in the edge, it just kind of really explodes it. So who's gonna write that software? E >>think you know, Dave, your your dad doesn't change what you build ups. It's already changed in the consumer world, you look atyou, no uber and Waze and things like that. Those absolute already highly decomposed applications that make a P I calls and DNS calls from dozens of different resource is already right. We just haven't really brought that into the enterprise space. There's a number, you know, what kind of you know knew were born in the cloud companies that have that have done that. But they're they're very few and far between today. And John, your point about the connectivity. We do need to think about connectivity at the network layer. Still, obviously, But now we're creating that standardization that standardized connectivity all the way a player seven. So you look at a lot of the, you know, one of the big things that was a PDP. I calls right, you know, from different cloud services. And so we do need to standardize in every layer and then stitch that together. So that does make It does make things a lot more complicated. Now I'm not saying Don't do it because you can do a whole lot more with absolute than you could ever do before. It's just that we kind of cranked up the level of complexity here, and flowered isn't just a single thing anymore, right? That's that. That's what we're talking about here It's a collection of edges and private clouds and public clouds. They all have to be stitched together at every layer in orderto work. >>So I was I was talking a few CEOs earlier in the day. We had we had them on, I was asking them. Okay, So how do you How do you approach this complexity? Do you build that abstraction layer? Do you rely on someone like Microsoft to build that abstraction layer? Doesn't appear that Amazon's gonna do it, you know? Where does that come from? Or is it or is it dozens of abstraction layers? And one of the CEO said, Look, it's on us. We have to figure out, you know, we get this a p I economy, but But you guys were talking about a mawr complicated environment, uh, moving so so fast. Eso if if my enterprise looks like my my iPhone APs. Yes, maybe it's simpler on an individual at basis, but its app creep and my application portfolio grows. Maybe they talk to each other a little bit better. But that level of complexity is something that that that users are gonna have to deal >>with what you thought. So I think quite what Zs was trying to get it and correct me if I'm wrong. Zia's right. We've got to the part where we've broken down what was a traditional application, right? And now we've gotten into a P. I calls, and we have to think about different things. Like we have to think about how we secure those a p I s right. That becomes a new criteria that we're looking at. How do we manage them? How do they have a life cycle? So what was the life cycle of, say, an application is now the life cycle of components and so that's a That's a pretty complex thing. So it's not so much that you're getting app creep, but you're definitely rethinking how you want to design your applications and services and some of those you're gonna do yourself and a lot of them are going to say it's too complicated. I'm just going to go to some kind of SAS cloud offering for that and let it go. But I think that many of the larger companies I speak to are looking for a larger company to help them build some kind of framework to migrate from what they've used with them to what they need tohave going forward. >>Yeah, I think. Where the complexities. John, You asked who who creates the normalization layer? You know, obviously, if you look to the cloud providers A W s does a great job of stitching together all things AWS and Microsoft does a great job of stitching together all things Microsoft right in saying with Google. >>But >>then they don't. But if if I want to do some Microsoft to Amazon or Google Toe Microsoft, you know, connectivity, they don't help so much of that. And that's where the third party vendors that you know aviatrix on the network side will tear of the security side of companies like that. Even Cisco's been doing a lot of work with those companies, and so what we what we don't really have And we probably won't for a while if somebody is gonna stitch everything together at every >>you >>know, at every layer. So Andi and I do think we do get after it. Maribel, I think if you look at the world of consumer APS, we moved to a lot more kind of purpose built almost throwaway apps. They serve a purpose or to use them for a while. Then you stop using them. And in the enterprise space, we really haven't kind of converted to them modeling on the mobile side. But I think that's coming. Well, >>I think with micro APS, right, that that was kind of the issue with micro APS. It's like, Oh, I'm not gonna build a full scale out that's gonna take too long. I'm just gonna create this little workflow, and we're gonna have, like, 200 work flows on someone's phone. And I think we did that. And not everybody did it, though, to your point. So I do think that some people that are a little late to the game might end up in in that app creep. But, hey, listen, this is a fabulous opportunity that just, you know, throw a lot of stuff out and do it differently. What What? I think what I hear people struggling with ah lot is be to get it to work. It typically is something that is more vertically integrated. So are you buying all into a Microsoft all you're buying all into an Amazon and people are starting to get a little fear about doing the full scale buy into any specific platform yet. In absence of that, they can't get anything to work. >>Yeah, So I think again what? What I'm hearing from from practitioners, I'm gonna put a micro serve. And I think I think, uh, Mirabelle, this is what you're implying. I'm gonna put a micro services layer. Oh, my, my. If I can't get rid of them, If I can't get rid of my oracle, you know, workloads. I'm gonna connect them to my modernize them with a layer, and I'm gonna impart build that. I'm gonna, you know, partner to get that done. But that seems to be a a critical path forward. If I don't take that step, gonna be stuck in the path in the past and not be able to move forward. >>Yeah, absolutely. I mean, you do have to bridge to the past. You you aren't gonna throw everything out right away. That's just you can't. You can't drive the bus and take the wheels off that the same time. Maybe one wheel, but not all four of them at the same time. So I think that this this concept of what are the technologies and services that you use to make sure you can keep operational, but that you're not just putting on Lee new workloads into the cloud or new workloads as decomposed APS that you're really starting to think about. What do I want to keep in whatever I want to get rid of many of the companies you speak Thio. They have thousands of applications. So are they going to do this for thousands of applications? Are they gonna take this as an opportunity to streamline? Yeah, >>well, a lot of legacy never goes away, right? And I was how companies make this transition is gonna be interesting because there's no there's no really the fact away I was I was talking to this one company. This is New York Bank, and they've broken their I t division down into modern I t and legacy I t. And so modern. Everything is cloud first. And so imagine me, the CEO of Legacy i e 02 miracles. But what they're doing, if they're driving the old bus >>and >>then they're building a new bus and parallel and eventually, you know, slowly they take seats out of the old bus and they take, you know, the seat and and they eventually start stripping away things. That old bus, >>But >>that old bus is going to keep running for a long time. And so stitching the those different worlds together is where a lot of especially big organizations that really can't commit to everything in the cloud are gonna struggle. But it is a It is a whole new world. And like I said, I think it creates so much opportunity for people. You know, e >>whole bus thing reminds me that movie speed when they drive around 55 miles an hour, just put it out to the airport and just blew up E >>got But you know, we all we all say that things were going to go away. But to Zia's point, you know, nothing goes away. We're still in 2021 talking about mainframes just as an aside, right? So I think we're going to continue tohave some legacy in the network. But the But the issue is ah, lot will change around that, and they're gonna be some people. They're gonna make a lot of money selling little startups that Just do one specific piece of that. You know, we just automation of X. Oh, >>yeah, that's a great vertical thing. This is the This is the distributed network argument, right? If you have a note in the network and you could put a containerized environment around it with some micro services um, connective tissue glue layer, if you will software abstract away some integration points, it's a note on the network. So if in mainframe or whatever, it's just I mean makes the argument right, it's not core. You're not building a platform around the mainframe, but if it's punching out, I bank jobs from IBM kicks or something, you know, whatever, Right? So >>And if those were those workloads probably aren't gonna move anywhere, right, they're not. Is there a point in putting those in the cloud? You could say Just leave them where they are. Put a connection to the past Bridge. >>Remember that bank when you talk about bank guy we interviewed in the off the record after the Cube interviews like, Yeah, I'm still running the mainframe, so I never get rid of. I love it. Run our kicks job. I would never think about moving that thing. >>There was a large, large non US bank who said I buy. I buy the next IBM mainframe sight unseen. Andi, he's got no choice. They just write the check. >>But milliseconds is like millions of dollars of millisecond for him on his back, >>so those aren't going anywhere. But then, but then, but they're not growing right. It's just static. >>No, no, that markets not growing its's, in fact. But you could make a lot of money and monetizing the legacy, right? So there are vendors that will do that. But I do think if you look at the well, we've already seen a pretty big transition here. If you look at the growth in a company like twilio, right, that it obviates the need for a company to rack and stack your own phone system to be able to do, um, you know, calling from mobile lapse or even messaging. Now you just do a P. I calls. Um, you know, it allows in a lot of ways that this new world we live in democratizes development, and so any you know, two people in the garage can start up a company and have a service up and running another time at all, and that creates competitiveness. You know much more competitiveness than we've ever had before, which is good for the entire industry. And, you know, because that keeps the bigger companies on their toes and they're always looking over their shoulder. You know what, the banks you're looking at? The venues and companies like that Brian figure out a way to monetize. So I think what we're, you know well, that old stuff never going away. The new stuff is where the competitive screen competitiveness screen. >>It's interesting. Um IDs Avery. Earlier today, I was talking about no code in loco development, how it's different from the old four g l days where we didn't actually expand the base of developers. Now we are to your point is really is democratizing and, >>well, everybody's a developer. It could be a developer, right? A lot of these tools were written in a way that line of business people create their own APs to point and click interface is, and so the barrier. It reminds me of when, when I started my career, I was a I. I used to code and HTML build websites and then went to five years. People using drag and drop interface is right, so that that kind of job went away because it became so easy to dio. >>Yeah, >>sorry. A >>data e was going to say, I think we're getting to the part. We're just starting to talk about data, right? So, you know, when you think of twilio, that's like a service. It's connecting you to specific data. When you think of Snowflake, you know, there's been all these kinds of companies that have crept up into the landscape to feel like a very specific void. And so now the Now the question is, if it's really all about the data, they're going to be new companies that get built that are just focusing on different aspects of how that data secured, how that data is transferred, how that data. You know what happens to that data, because and and does that shift the balance of power about it being out of like, Oh, I've created these data centers with large recommend stack ums that are virtualized thio. A whole other set of you know this is a big software play. It's all about software. >>Well, we just heard from Jim Octagon e You guys talking earlier about just distributed system. She basically laid down that look. Our data architectures air flawed there monolithic. And data by its very nature is distributed so that she's putting forth the whole new paradigm around distributed decentralized data models, >>which Howie shoe is just talking about. Who's gonna build the visual studio for data, right? So programmatic. Kind of thinking around data >>I didn't >>gathering. We didn't touch on because >>I do think there's >>an opportunity for that for, you know, data governance and data ownership and data transport. But it's also the analytics of it. Most companies don't have the in house, um, you know, data scientists to build on a I algorithms. Right. So you're gonna start seeing, you know, cos pop up to do very specific types of data. I don't know if you saw this morning, um, you know, uniforms bought this company that does, you know, video emotion detection so they could tell on the video whether somebody's paying attention, Not right. And so that's something that it would be eso hard for a company to build that in house. But I think what you're going to see is a rise in these, you know, these types of companies that help with specific types of analytics. And then you drop you pull those in his resource is into your application. And so it's not only the storage and the governance of the data, but also the analytics and the analytics. Frankly, there were a lot of the, uh, differentiation for companies is gonna come from. I know Maribel has written a lot on a I, as have I, and I think that's one of the more exciting areas to look at this year. >>I actually want to rip off your point because I think it's really important because where we left off in 2020 was yes, there was hybrid cloud, but we just started to see the era of the vertical eyes cloud the cloud for something you know, the cloud for finance, the cloud for health care, the telco and edge cloud, right? So when you start doing that, it becomes much more about what is the specialized stream that we're looking at. So what's a specialized analytic stream? What's a specialized security stack stream? Right? So until now, like everything was just trying to get to what I would call horizontal parody where you took the things you had before you replicated them in a new world with, like, some different software, but it was still kind of the same. And now we're saying, OK, let's try Thio. Let's try to move out of everything, just being a generic sort of cloud set of services and being more total cloud services. >>That is the evolution of everything technology, the first movement. Everything doing technology is we try and make the old thing the new thing look like the old thing, right? First PCs was a mainframe emulator. We took our virtual servers and we made them look like physical service, then eventually figure out, Oh, there's a whole bunch of other stuff that I could do then I couldn't do before. And that's the part we're trying to hop into now. Right? Is like, Oh, now that I've gone cloud native, what can I do that I couldn't do before? Right? So we're just we're sort of hitting that inflection point. That's when you're really going to see the growth takeoff. But for whatever reason, and i t. All we ever do is we're trying to replicate the old until we figure out the old didn't really work, and we should do something new. >>Well, let me throw something old and controversial. Controversial old but old old trope out there. Consumerism ation of I t. I mean, if you think about what year was first year you heard that term, was it 15 years ago? 20 years ago. When did that first >>podcast? Yeah, so that was a long time ago >>way. So if you think about it like, it kind of is happening. And what does it mean, right? Come. What does What does that actually mean in today's world Doesn't exist. >>Well, you heard you heard. Like Fred Luddy, whose founder of service now saying that was his dream to bring consumer like experiences to the enterprise will. Well, it didn't really happen. I mean, service not pretty. Pretty complicated compared toa what? We know what we do here, but so it's It's evolving. >>Yeah, I think there's also the enterprise ation of consumer technology that John the companies, you know, you look a zoom. They came to market with a highly consumer facing product, realized it didn't have the security tools, you know, to really be corporate great. And then they had to go invest a bunch of money in that. So, you know, I think that waken swing the pendulum all the way over to the consumer side, but that that kind of failed us, right? So now we're trying to bring it back to center a little bit where we blend the two together. >>Cloud kind of brings that I never looked at that way. That's interesting and surprising of consumer. Yeah, that's >>alright, guys. Hey, we gotta wrap Zs, Maribel. Always a pleasure having you guys on great great insights from the half hour flies by. Thanks so much. We appreciate it. >>Thank >>you guys. >>Alright, keep it right there. Mortgage rate content coming from the Cuban Cloud Day Volonte with John Ferrier and a whole lineup still to come Keep right there.

Published Date : Jan 22 2021

SUMMARY :

It's the Cube presenting Cuban to you by silicon angle. You know, here we are on 2021 you know, just exited one of the strangest years, recognition on the cloud computing providers that you need to give it to the customers the way they want it, It was private cloud And and then now we're, you know, hearing a lot of multi cloud And if you look at the last post that put on Silicon Angle, which was talking about five acquisition of Volterra, Is that distributed that you were talking about and I think that's what you know, kind of cool about this is we're being able to use our data and much different ways. And then you mentioned the ephemeral nature. And that's a that is a profound change, and you mentioned the observe ability. And that would include the programming model as well, And the thing we're talking about now is what is cloud is an operating model look like? and we were comparing the S n a definite the, you know, the proprietary protocol E think that's what you said? And I said, Oh, that means there s n a But if you think about it, that's kind of almost that can hang. think you know, Dave, your your dad doesn't change what you build ups. We have to figure out, you know, we get this a p But I think that many of the larger companies I speak to are looking for You know, obviously, if you look to the cloud providers A W s does a great job of stitching together that you know aviatrix on the network side will tear of the security side of companies like that. Maribel, I think if you look at the world of consumer APS, we moved to a lot more kind of purpose built So are you buying all into a Microsoft all you're buying all into an Amazon and If I don't take that step, gonna be stuck in the path in the past and not be able to move forward. So I think that this this concept of what are the technologies and services that you use And I was how companies make this transition is gonna out of the old bus and they take, you know, the seat and and they eventually start stripping away things. And so stitching the those different worlds together is where a lot got But you know, we all we all say that things were going to go away. I bank jobs from IBM kicks or something, you know, And if those were those workloads probably aren't gonna move anywhere, right, they're not. Remember that bank when you talk about bank guy we interviewed in the off the record after the Cube interviews like, I buy the next IBM mainframe sight unseen. But then, but then, but they're not growing right. But I do think if you look at the well, how it's different from the old four g l days where we didn't actually expand the base of developers. because it became so easy to dio. A So, you know, when you think of twilio, that's like a service. And data by its very nature is distributed so that she's putting forth the whole new paradigm Who's gonna build the visual studio for data, We didn't touch on because an opportunity for that for, you know, data governance and data ownership and data transport. the things you had before you replicated them in a new world with, like, some different software, And that's the part we're trying to hop into now. Consumerism ation of I t. I mean, if you think about what year was first year you heard that So if you think about it like, it kind of is happening. Well, you heard you heard. realized it didn't have the security tools, you know, to really be corporate great. Cloud kind of brings that I never looked at that way. Always a pleasure having you guys Mortgage rate content coming from the Cuban Cloud Day Volonte with John Ferrier and

ENTITIES

Entity	Category	Confidence
Mary	PERSON	0.99+
John Ferrier	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
John	PERSON	0.99+
Dave	PERSON	0.99+
Cisco	ORGANIZATION	0.99+
Fred Luddy	PERSON	0.99+
IBM	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Maribel Lopez	PERSON	0.99+
Google	ORGANIZATION	0.99+
Angela	PERSON	0.99+
2021	DATE	0.99+
2020	DATE	0.99+
thousands	QUANTITY	0.99+
New York Bank	ORGANIZATION	0.99+
Volterra	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Zeus Kerravala	PERSON	0.99+
ZK research	ORGANIZATION	0.99+
telco	ORGANIZATION	0.99+
five years	QUANTITY	0.99+
iPhone	COMMERCIAL_ITEM	0.99+
Mirabelle	PERSON	0.99+
Maribel	PERSON	0.99+
one wheel	QUANTITY	0.99+
Jim Octagon	PERSON	0.99+
Apple	ORGANIZATION	0.99+
Brian	PERSON	0.99+
Lopez Research	ORGANIZATION	0.99+
two people	QUANTITY	0.99+
millions of dollars	QUANTITY	0.98+
First	QUANTITY	0.98+
20 years ago	DATE	0.98+
one	QUANTITY	0.98+
five	QUANTITY	0.98+
two	QUANTITY	0.98+
Zias Caraballo	PERSON	0.98+
around 55 miles an hour	QUANTITY	0.98+
first movement	QUANTITY	0.98+
15 years ago	DATE	0.97+
first year	QUANTITY	0.97+
first crank	QUANTITY	0.97+
today	DATE	0.97+
first	QUANTITY	0.97+
Andi	PERSON	0.97+
four	QUANTITY	0.96+
Zia	PERSON	0.96+
this year	DATE	0.96+
Cloud	TITLE	0.96+
dozens	QUANTITY	0.95+
200 work flows	QUANTITY	0.94+
this morning	DATE	0.94+
Avery	PERSON	0.93+
Thio	PERSON	0.92+
uber	ORGANIZATION	0.92+
Intel	ORGANIZATION	0.91+
single thing	QUANTITY	0.9+
earlier today	DATE	0.9+
US	LOCATION	0.89+
Google Toe	ORGANIZATION	0.89+
one company	QUANTITY	0.88+
tons of computer	QUANTITY	0.87+
dozens of abstraction layers	QUANTITY	0.87+
Earlier today	DATE	0.86+
Iraq	LOCATION	0.86+
Os	TITLE	0.85+
edge	ORGANIZATION	0.84+
Cuban Cloud	OTHER	0.83+
twilio	ORGANIZATION	0.83+

Summit Virtual Event Coverage | AWS Summit Online

>>from the Cube Studios in Palo Alto and Boston connecting with thought leaders all around the world. >>This is a cube conversation >>live on. Welcome to the Special Cube Virtual coverage of AWS Summit Online. This is an event of virtual event by AWS. We're covering with the Virtual Cube his Amazon, so it would have no >>looking started. We started. Thank you. Right >>everyone, welcome to this Special Cube. Virtual coverage of the AWS Summit Virtual Online This is an event that Amazon normally has in person in San Francisco, but now it's virtual around the world. Seoul, Korea, in Tokyo, all over the world, in Asia Pacific and in North America, I'm John Furrier Dave Jones Stew Minimum. Let's do We're kicking off aws Virtual with the Cube Virtual. I'm in Palo Alto with the quarantine crew. You're in Massachusetts in Boston when the quarantine crew there still great to have you on to talk about AWS Virtual summit. >>Yeah, John, it's it's great to see you. Ah, it's been ah, you know, interesting times doing all these remote interviews A Z Many of us say I don't blame hotels, but I do miss the communities I do miss the hallway conversation. But great to see you, John. Love the Midnight Madness shirt. We >>want to thank Amazon for stepping up with some sponsorship for allow us to do the Virtual Cube alongside their virtual event because now it's a global community. It's all virtual. There are no boundaries. The Cube has no boundaries to We've got a great program. We have Cory Quinn coming up. Expect to hear from him last week in AWS is known for is a rising star in the community. Certainly Cube guest and also guest host and analyst for the Cube. We spent to hear all the latest from his big zoom post controversy to really what's going on in AWS around what services are hot. I know you're going to a great interview with him, but that's not what Amazon we're seeing a ton of activity, obviously, most recently last week was the jet, I think, which was an agency protest kind of confidential. Microsoft blew that up big time with a post by their worldwide comes person. Frank Shaw countered by Drew Heard Who's the coms globally for end of us and so a war of words is ensuing. This is again pointing to the cloud Native War that's going on with a jet I conference gets Jedi contract a $10 billion which is awards to Microsoft. This shows that the heat is on to do. This is a absolute bloodbath between AWS and Microsoft. We're seeing it play out now virtually with Amazon ai Large scale cloud. This is huge. This is this is another level. A def con one. Basically your thoughts. >>Yeah, John, you know, you've covered this really well and really impressing plot number one you talk about You know, this requirement When AWS launched the govcloud had the CIA as a client early on many years ago. It was the green light for many companies to go from. Wait. Is the club secure enough? Do well, good enough for the federal government in the US It's probably good enough for the enterprise. When Microsoft one jet I they didn't have all the certifications to meet what was in the contract? They had a ticking clock. Make sure that they could meet those security engagements. Aziz. Well, as you know what, one of the pieces the esports that move was working, made a partnership announced with Azure. We know the federal government uses Oracle quite a bit, though they can now run that in azure and not have the penalties from Oracle. So you know that many have said, you know Hey, AWS, why don't you kind of let that one go? You got federal business, but those ripple effect we understand from one contract kind of move things around. >>Well, my take on this is just the tip of the teapot. Either Microsoft's got something that we don't know where they're running scared. My predictions do is that the clock is gonna take out D o. D. Is going award the contract again to Microsoft because I don't think the d. O. D. Wants to change basically on the data that I'm getting from my reporting. And then, ultimately Amazon will keep this going in court because Microsoft has been deficient on winning the deal. That is by the judge and in government contracts. As you know, when you're deficient, you're ineligible. So, essentially on the tech specs, Microsoft failed to meet the criteria the contract and they're deficient. They still can't host top secret content even if they wanted to. This is going to be a game changer when if this comes out to be true, it will be a huge tech scandal. If it's true, then am I gonna have egg on their face? OK, so we passed. This speaks to the large scale problems that are having with Cove it. You're seeing Amazon. They're all working at home, but they still got to run the servers. They >>can do >>it. They got cloud native. You've got Dev ops. But for their customers to be people who are trying to do hybrid. What >>are you >>hearing in terms of the kinds of situations that people are doing? Are they still going to work with maths on our There's still data centers that need to be managed. What >>are >>you hearing in the tech world's do around Covad 19. And as the cloud becomes more apparent, it's obvious that if you're not cloud native, you're going to be on the wrong side of history. Here is pretty obvious. >>Yeah, well, absolutely. John. There there is a bit of a Elwyn behind cloud. Everything from you mentioned work from home. Everybody needs to be on their VPN. They need to access their service access their services where they are. If you've got a global workforce, if you thought that your infrastructure was going to be able to handle that, you might not be in for a WS is meeting that need. There's been some of the cloud providers that have had performance issues have had to prioritize which customers can get access to things AWS standing strong. They're meeting their customers and their answering the call of cloud. You know, we know that AWS puts a huge investment into their environment. If you compare an availability zone from AWS, you know, it is very, very sturdy. It's not just, you know, a you know, a small cluster on. And they say, Hey, we can run all over the place, you know, to be specific It's, you know, John Azure has been having some of those performance issues and has been from concerns. Corey actually wrote a really good article talking about that. It actually put a bad you on public cloud in general. But we know not all public cloud with the same, though, you know, Google has been doing quite well, you know. Managing the demand spike, though, has AWS. Microsoft has needed to respond a little bit. >>It's just mentioned Microsoft's outages. Microsoft actually got caught on eight K filing, which you just have to be going through, and they noticed that they said they had all this up. Time for the cloud. Turns out it wasn't the cloud. It was the teams product. They had to actually put a strike a line through it legally. So a lot of people getting called out, it doesn't matter. It's a crisis. I think that's not gonna be a core issue is gonna be what technology has been needed the most. And I got to ask you still, when was the last time you and I talked about virtual desktops? Because, hey, if you're working at home and you're not at your desk, you need might need some stuff on your desk. This >>is a real issue. >>I mean, it's a >>kind >>of a corner case in tech, but virtual desktops. If >>you're not >>at the office, you need to have that at home. This is a huge issue. It's been a surge >>in demand. Yeah, there were jokes in the community that you know, finally, it's the year of VD I, but desktop as a service. John is an area that took a little while to get going. You know, Dave Volante and I were just having about this. You and Dave interviewed me when Amazon released workspaces, and it was like, Ah, you know, Citrix is doing so well and VD I, you know, isn't the hotness anymore, But that's not service as grown. If you talk about desktop as a service compared to V i p. I is still, you know, a bit of a heavy lift. Even if you've got, you know, hyper converged infrastructure. Roll this out. It's a couple of months to put these whole solutions together. Now, if you have some of that in perspective, can you scale it and you build them up much faster? Yes, you can. But if you're starting to enable your workforce a little bit faster, desktop as a service is going to be faster. AWS has a strong solution with work base. Is it really is that enablement? And it's also putting pressure on the SAS providers. One. They need scale and do they need to be responsive that some of their customers need to scale up really fast and some of them dial things down. Always worry about some of these on track that the SAS providers, but you in. So you know, customers need to make sure they're being loud and clear with their providers. If you need help. If you need to adjust something, you know, push back on them because they should be responsive because we know that there is a broad impact on this. But it will not be a permanent impact, though you know, these are the times that companies need to work closely with customers because otherwise you will. You will either make a customer for life, or you will have somebody that will not be saying about you for a long >>while. Still, let's just quickly run through some of the highlights so far on the virtual conference virtual event. Aussie Amazon Pre announced last month the Windows Migration Service, which has been a big part of their business. They've been doing it for 11 years, so we're gonna have an interview with an AWS person to talk about that also app Flows announced as well as part of the virtual kind of private, you know, private checks. So you're seeing that right here. Large scale data lakes breaking down those silos, moving data from the cloud from the console into the top. Applicants like Salesforce is a big one. That was kind of pre announced. The big story here is the Kendra availability and the augmented AI availability. Among other things, this is the big story. This kind of shows the Amazon track record they pre announced at reinvent, trying to run as fast as they can to get it shipping the focus of AI. The focus of large scale capacity, whether it's building on top of GC, too. Server list. Lambda ai. All this is kind of coming together data, high capacity, operational throughput and added value. That seems to be the highlights. Your reaction? >>Yeah, John, You know, at flow is an interesting one. We were just talking about asp providers. An area that we've been spending a lot of time talking with. The system is you know, my data is all over the place, you know? Yes, there's my data centers public, but there's all of these past provides. So, you know, if I have data in service now, I have it in workday. I have a sales force you know, how do I have connectors there? How do I You're that How do I protect that, though? Amazon, you know, working with a broad ecosystem and helping to pull that together. Eyes definitely an interesting one. What? Kendra definitely been some good buzz in the ecosystem for a while. They're You know, the question is on natural language processing and a I, you know, where are the customers with these deployments? Because some of them, if they're a little bit more long term, Egypt might be the kind of projects that get put on pause rather than the ones that are critical for me to run the business today. >>And I just did a podcast with the VM ware ecosystem last week talking about which projects will be funded. Which ones won't. It brings up this new virtual work environment where, you know, some people are going to get paid and some people aren't. If you're not core to the enterprise, you're probably not going to get paid. If you're not getting a phone call to come into work, you're probably gonna get fired. So there will be project that will be cut and projects that will be funded certainly virtual events, which I want to talk to you about in a minute to applications that are driving revenue and or engagement around the new workforce. So the virtualization of business is happening now. We joke because we know server virtualization actually enabled the cloud. Right? So I think there's going to be a huge Cambrian explosion of applications. So I want to get your thoughts. The folks you've been talking to the past few months, what are you hearing in terms of those kinds of projects that people will be leaning into and funding versus ones they might put on hold? Have you heard anything? >>Yeah. Well, you know, John, it's interesting when you go back at its core, what is AWS and they want to enable built. So, you know, the last couple of years we've been talking about all of the new applications that will get built. That's not getting put on hold, Jones. You know it. What? I do not just to run the business but grow the business. I need the We'll have applications at the core of what we do. Data and applications, Really. Or what? Driving companies today. So that piece is so critically important and therefore AWS is a very strategic partner there. >>I'm saying the same things Do I think the common trend that I would just add to that would be I'm seeing companies looking at the covert crisis is the opportunity and frankly in some cases, an excuse to lay people off, and that's kind of you're seeing some of that. But the >>end of >>the day that people are resetting, reinventing and then putting new growth strategies together that still doesn't change business still needs to get done. So great point. It's to virtual events were here with the AWS summit. Normally run the show floor. The Cube. We're here with the Virtual Cube doing our virtual thing. It's been interesting to a lot of our events have converted to virtual. Some have been canceled, but most of them have been been running on the virtual. We've been plugged in, but the cube is evolving, and I want to get your thoughts on how you see the Cube evolving. I've been getting a lot of questions that came again on the VM Ware community podcast. How is the Cube morphed and I know that we've been working hard with a lot of our customers. How have we evolved? Because we're >>in the >>middle of this digital way, this virtualization away. The Cube is in there. We've been successful. That's been different use cases. Some have been embedded into the software. Amazon's got their own run a show. But events are more than just running the show content. >>Yeah, more John, >>more community behind us to your thoughts and how well Cube has evolved. And what are you seeing? >>I'm glad, John. You just mentioned community. So you know, you and I have talked many times on air that, you know, the Cube is much network in the community as it is a media company. So, you know, first of all, it's been so heartening over the last couple of months that we've been putting out. We're still getting some great feedback from the community. One of things I personally miss is, you know, when we step off the stage and you walk the hallway and you bump into people that know when they ask your questions were you know, they share some of the things that they're going through. That data that we always look for is something we still need. So I'm making sure that reach out to friends, you know, diving back into the social channels to make sure that we understand the pulse of what's going on. But you know, John, you know, our community has always been online, though a big piece of the Cube is relatively unchanged. Other than we're doing all the interviews, we have to deal with everyone's home systems in home network. Every once in a while you hear a dog barking in the background or, you know, a child running, but it actually humanized. So there's that opportunity or the communities to rally together. Some of my favorite interviews have been, you know, the open source communities that are gathering together toe work on common issues, a lot of them specifically for the global endemic, you know, And so there are some really good stories out there. I worry when you talk about companies that are think, Hey, this There have been so many job losses in this pandemic that it just is heartbreak. So, you know, we've loved when the tech community is helping to spur new opportunities, great new industries. I had a great interview that I did with our friends from a cloud guru, and they've seen about a 20 to 30% increase on people taking the online training. And one of the main things that they're taking training on is the one on one courses on AWS on Google and on Azure, as well as an interesting point. John, they said, Multi cloud is something that come up. So you know, 2020 we've been wondering. Is aws going to admit that multi cloud is a thing, or are they going to stick with their hybrid message and, you know, as their partners not talk about? It's >>been interesting on the virtual queue because we and Amazon's been a visionary and this leading Q B virtual with them. It's become a connective tissues to between the community. And if you think about how much money the companies they're saving by not running the physical events and with the layoffs, as you mentioned, I think that could be an opportunity for the Cube to be that connective tissue to bring people together. I think that's the mission that we hope will unfold, but ultimately, digital investments will probably go up from this. I'm seeing a lot of great conversion around. Okay, So the content, What does it mean to me? Is that my friend group are my friends involved? How do I learn? How do I discover? How do I connect? And I think the interesting thing about the Cube is we've seen that up front. And I think there's a positive sign of heads do around virtualization of the media and the community. And I think it's going to be economic opportunity. And I hope that we could help people find either jobs or ways to re engage and reconnect. So again, reinvents coming. You got VM World. All >>these big shows do They dropped so much cash. Can you answer? They >>put all that cash with the community. I think that's a viable scenario. >>Yeah. No, Absolutely. John. There there is, you know, big money and events, you know? Yes, there are less cost. They're also, you know, almost none of them are charging for people to attend, and very few of them are urging the bunker. So, you know, big shift in and how we have to look at these. It needs to be a real focus on content. I mean from our standpoint, John, from day one. We've been doing this a decade now. In the early days when it was a wing and a prayer on the technology, it was always about the content. And the best people help extract that signal from the noise. So, you know, some things have changed the mission overall days. >>And you know what? Amazon is being humble. They're saying we're figuring it out. Of course, we're psyched that we're there with the Virtual Cube students do. Thanks for spending the time kicking off this virtual coverage wrap up. Not >>as good as face to face. >>Love to be there on site. But I think it's easy to get guests used to in the virtual world. But we're gonna go to a hybrid as soon as it comes back to normal. Sounds like clouds to public hybrid virtual. There it is too. Thanks so much. Okay, that's the cube coverage for AWS Summit. Virtual online. That's the Cube virtual coverage. I'm sure. First Amendment, Thanks for watching. Stay tuned for the next segment. Yeah, >>yeah, yeah, yeah

Published Date : May 8 2020

SUMMARY :

Welcome to the Special Cube Virtual coverage of AWS Summit Online. We started. there still great to have you on to talk about AWS Virtual summit. Ah, it's been ah, you know, interesting times doing This shows that the heat is on to do. Yeah, John, you know, you've covered this really well and really impressing So, essentially on the tech specs, Microsoft failed to meet the criteria the contract and they're deficient. But for their customers to be people who are trying to do hybrid. maths on our There's still data centers that need to be managed. you hearing in the tech world's do around Covad 19. But we know not all public cloud with the same, though, you know, Google has been doing quite well, And I got to ask you still, when was the last time you and I talked of a corner case in tech, but virtual desktops. at the office, you need to have that at home. So you know, customers need to make sure you know, private checks. I have a sales force you know, you know, some people are going to get paid and some people aren't. So, you know, the last couple of years we've been talking about all of the new looking at the covert crisis is the opportunity and frankly in some cases, an excuse to lay people off, I've been getting a lot of questions that came again on the VM Ware community podcast. But events are more than just running the show content. And what are you seeing? out to friends, you know, diving back into the social channels to make sure that we understand Okay, So the content, What does it mean to me? Can you answer? put all that cash with the community. They're also, you know, almost none of them are charging for people to attend, And you know what? But I think it's easy to get guests used to in the virtual world.

ENTITIES

Entity	Category	Confidence
John	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Massachusetts	LOCATION	0.99+
Microsoft	ORGANIZATION	0.99+
San Francisco	LOCATION	0.99+
Jones	PERSON	0.99+
Frank Shaw	PERSON	0.99+
Tokyo	LOCATION	0.99+
Dave	PERSON	0.99+
Asia Pacific	LOCATION	0.99+
Palo Alto	LOCATION	0.99+
Google	ORGANIZATION	0.99+
Oracle	ORGANIZATION	0.99+
Dave Volante	PERSON	0.99+
Boston	LOCATION	0.99+
Seoul	LOCATION	0.99+
Corey	PERSON	0.99+
US	LOCATION	0.99+
Drew Heard	PERSON	0.99+
Cory Quinn	PERSON	0.99+
$10 billion	QUANTITY	0.99+
last week	DATE	0.99+
North America	LOCATION	0.99+
WS	ORGANIZATION	0.99+
11 years	QUANTITY	0.99+
CIA	ORGANIZATION	0.99+
last week	DATE	0.99+
last month	DATE	0.99+
Citrix	ORGANIZATION	0.99+
Aziz	PERSON	0.99+
2020	DATE	0.99+
First Amendment	QUANTITY	0.99+
Azure	ORGANIZATION	0.97+
Virtual Cube	COMMERCIAL_ITEM	0.96+
Cube	COMMERCIAL_ITEM	0.96+
John Azure	PERSON	0.96+
one	QUANTITY	0.96+
30%	QUANTITY	0.95+
today	DATE	0.94+
VM Ware	TITLE	0.94+
one contract	QUANTITY	0.94+
SAS	ORGANIZATION	0.93+

UNLIST TILL 4/2 - Model Management and Data Preparation

>> Sue: Hello, everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled Machine Learning with Vertica, Data Preparation and Model Management. My name is Sue LeClaire, Director of Managing at Vertica and I'll be your host for this webinar. Joining me is Waqas Dhillon. He's part of the Vertica Product Management Team at Vertica. Before we begin, I want to encourage you to submit questions or comments during the virtual session. You don't have to wait. Just type your question or comment in the question box below the slides and click submit. There will be a Q and A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Alternately, you can visit Vertica Forums to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. Also, a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides, and yes, this virtual session is being recorded and will be available to view on demand later this week. We'll send you a notification as soon as it's ready. So, let's get started. Waqas, over to you. >> Waqas: Thank you, Sue. Hi, everyone. My name is Waqas Dhillon and I'm a Product Manager here at Vertica. So today, we're going to go through data preparation and model management in Vertica, and the session would essentially be starting with some introduction and going through some of the machine learning configurations and you're doing machine learning at scale. After that, we have two media sections here. The first one is on data preparation, and so we'd go through data preparation is, what are the Vertica functions for data exploration and data preparation, and then share an example with you. Similarly, in the second part of this talk we'll go through different export models using PMML and how that works with Vertica, and we'll share examples from that, as well. So yeah, let's dive right in. So, Vertica essentially is an open architecture with a rich ecosystem. So, you have a lot of options for data transformation and ingesting data from different tools, and then you also have options for connecting through ODBC, JDBC, and some other connectors to BI and visualization tools. There's a lot of them that Vertica connects to, and in the middle sits Vertica, which you can have on external tables or you can have in place analytics on R, on cloud, or on prem, so that choice is yours, but essentially what it does is it offers you a lot of options for performing your data and analytics on scale, and within that, data analytics machine learning is also a core component, and then you have a lot of options and functions for that. Now, machine learning in Vertica is actually built on top of the architecture that distributed data analytics offers, so it offers a lot of those capabilities and builds on top of them, so you eliminate the overhead data transfer when you're working with Vertica machine learning, you keep your data secure, storing and managing the models really easy and much more efficient. You can serve a lot of concurrent users all at the same time, and then it's really scalable and avoids maintenance cost of a separate system, so essentially a lot of benefits here, but one important thing to mention here is that all the algorithms that you see, whether they're analytics functions, advanced analytics functions, or machine learning functions, they are distributed not just across the cluster on different nodes. So, each node gets a distributed work load. On each node, too, there might be multiple tracks and multiple processors that are running with each of these functions. So, highly distributed solution and one of its kind in this space. So, when we talk about Vertica machine learning, it essentially covers all machine learning process and we see it as something starting with data ingestion and doing data analysis and understanding, going through the steps of data preparation, modeling, evaluation, and finally deployment, as well. So, when you're using with Vertica, you're using Vertica for machine learning, it takes care of all these steps and you can do all of that inside of the Vertica database, but when we look at the three main pillars that Vertica machine learning aims to build on, the first one is to have Vertica as a platform for high performance machine learning. We have a lot of functions for data exploration and preparation and we'll go through some of them here. We have distributed in-database algorithms for model training and prediction, we have scalable functions for model evaluation, and finally we have distributed scoring functions, as well. Doing all of the stuff in the database, that's a really good thing, but we don't want it isolated in this space. We understand that a lot of our customers, our users, they like to work with other tools and work with Vertica, as well. So, they might use Vertica for data prep, another two for model training, or use Vertica for model training and take those nodes out to other tools and do prediction there. So, integration is really important part of our overall offering. So, it's a pretty flexible system. We have been offering UdX in four languages, a lot of people find there over the past few years, but the new capability of importing PMML models for in-database scoring and exporting Vertica native-models, for external scoring it's something that we have recently added, and another talk would actually go through the TensorFlow integrations, a really exciting and important milestone that we have where you can bring TensorFlow models into Vertica for in-database scoring. For this talk, we'll focus on data exploration and preparation, importing PMML, and exporting PMML models, and finally, since Vertica is not just a cue engine, but also a data store, we have a lot of really good capability for model storage and management, as well. So, yeah. Let's dive into the first part on machine learning at scale. So, when we say machine learning at scale we're actually having a few really important considerations and they have their own implications. The first one is that we want to have speed, but also want it to come at a reasonable cost. So, it's really important for us to pick the right scaling architecture. Secondly, it's not easy to move big data around. It might be easy to do that on a smaller data set, on an Excel sheet, or something of the like, but once you're talking about big data and data analytics at really big scale, it's really not easy to move that data around from one tool to another, so what you'd want to do is bring models to the data instead of having to move this data to the tools, and the third thing here is that some sub-sampling it can actually compromise your accuracy, and a lot of tools that are out there they still force you to take smaller samples of your data because they can only handle so much data, but that can impact your accuracy and the need here is that you should be able to work with all of your data. We'll just go through each of these really quickly. So, the first factor here is scalability. Now, if you want to scale your architecture, you have two main options. The first is vertical scaling. Let's say you have a machine, a server, essentially, and you can keep on adding resources, like RAM and CPU and keep increasing the performance as well as the capacity of that system, but there's a limit to what you can do here, and the limit, you can hit that in terms of cost, as well as in terms of technology. Beyond a certain point, you will not be able to scale more. So, the right solution to follow here is actually horizontal scaling in which you can keep on adding more instances to have more computing power and more capacity. So, essentially what you get with this architecture is a super computer, which stitches together several nodes and the workload is distributed on each of those nodes for massive develop processing and really fast speeds, as well. The second aspect of having big data and the difficulty around moving it around is actually can be clarified with this example. So, what usually happens is, and this is a simplified version, you have a lot of applications and tools for which you might be collecting the data, and this data then goes into an analytics database. That database then in turn might be connected to some VI tools, dashboard and applications, and some ad-hoc queries being done on the database. Then, you want to do machine learning in this architecture. What usually happens is that you have your machine learning tools and the data that is coming in to the analytics database is actually being exported out of the machine learning tools. You're training your models there, and afterwards, when you have new incoming data, that data again goes out to the machine learning tools for prediction. With those results that you get from those tools usually ended up back in the distributed database because you want to put it on dashboard or you want to power up some applications with that. So, there's essentially a lot of data overhead that's involved here. There are cons with that, including data governance, data movement, and other complications that you need to resolve here. One of the possible solutions to overcome that difficulty is that you have machine learning as part of the distributed analytical database, as well, so you get the benefits of having it applied on all of the data that's inside of the database and not having to care about all of the data movement there, but if there are some use cases where it still makes sense to at least train the models outside, that's where you can do your data preparation outside of the database, and then take the data out, the prepared data, build your model, and then bring the model back to the analytics database. In this case, we'll talk about Vertica. So, the model would be archived, hosted by Vertica, and then you can keep on applying predictions on the new data that's incoming into the database. So, the third consideration here for machine learning on scale is sampling versus full data set. As I mentioned, a lot of tools they cannot handle big data and you are forced to sub-sample, but what happens here, as you can see in the figure on the left most, figure A, is that if you have a single data point, essentially any model can explain that, but if you have more data points, as in figure B, there would be a smaller number of models that could be able to explain that, and in figure C, even more data points, lesser number of models explained, but lesser also means here that these models would probably be more accurate, and the objective for building machine learning models is mostly to have prediction capability and generalization capability, essentially, on unseen data, so if you build a model that's accurate on one data point, it could not have very good generalization capabilities. The conventional wisdom with machine learning is that the more data points that you have for learning the better and more accurate models that you'll get out of your machine learning models. So, you need to pick a tool which can handle all of your data and does not force you to sub-sample that, and doing that, even a simpler model might be much better than a more complex model here. So, yeah. Let's go to data exploration and data preparation part. Vertica's a really powerful tool and it offers a lot of scalability in this space, and as I mentioned, will support the whole process. You can define the problem and you can gather your data and construct your data set inside Vertica, and then consider it a prepared training modeling deployment and managing the model, but this is a really critical step in the overall machine learning process. Some estimate it takes between 60 to 80% of the overall effort of a machine learning process. So, a lot of functions here. You can use part of Vertica, do data exploration, de-duplication, outlier detection, balancing, normalization, and potentially a lot more. You can actually go to our Vertica documentation and find them there. Within Vertica we divide them into two parts. Within data prep, one is exploration functions, the second is transformation functions. Within exploration, you have a rich set functions that you can use in DB, and then if you want to build your own you can use the UDX to do that. Similarly, for transformation there's a lot of functions around time series, pattern matching, outlier detection that you can use to transform that data, and it's just a snapshot of some of those functions that are available in Vertica right now. And again, the good thing about these functions is not just their presence in the database. The good thing is actually their ability to scale on really, really large data set and be able to compute those results for you on that data set in an acceptable amount of time, which makes your machine learning processes really critical. So, let's go to an example and see how we can use some of these functions. As I mentioned, there's a whole lot of them and we'll not be able to go through all of them, but just for our understanding we can go through some of them and see how they work. So, we have here a sample data set of network flows. It's a similar attack from some source nodes, and then there are some victim nodes on which these attacks are happening. So yeah, let's just look at the data here real quick. We'll load the data, we'll browse the data, compute some statistics around it, ask some questions, make plots, and then clean the data. The objective here is not to make a prediction, per se, which is what we mostly do in machine learning algorithms, but to just go through the data prep process and see how easy it is to do that with Vertica and what kind of options might be there to help you through that process. So, the first step is loading the data. Since in this case we know the structure of the data, so we create a table and create different column names and data types, but let's say you have a data set for which you do not already know the structure, there's a really cool feature in Vertica called flex tables and you can use that to initially import the data into the database and then go through all of the variables and then assign them variable types. You can also use that if your data is dynamic and it's changing, to board the data first and then create these definitions. So once we've done that, we load the data into the database. It's for one week of data out of the whole data set right now, but once you've done that we'd like to look at the flows just to look at the data, you know how it looks, and once we do select star from flows and just have a limit here, we see that there's already some data duplication, and by duplication I mean rows which have the exact same data for each of the columns. So, as part of the cleaning process, the first thing we'd want to do is probably to remove that duplication. So, we create a table with distinct flows and you can see here we have about a million flows here which are unique. So, moving on. The next step we want to do here, this is essentially time state data and these times are in days of the week, so we want to look at the trends of this data. So, the network traffic that's there, you can call it flows. So, based on hours of the day how does the traffic move and how does it differ from one day to another? So, it's part of an exploration process. There might be a lot of further exploration that you want to do, but we can start with this one and see how it goes, and you can see in the graph here that we have seven days of data, and the weekend traffic, which is in pink and purple here seems a little different from the rest of the days. Pretty close to each other, but yeah, definitely something we can look into and see if there's some real difference and if there's something we want to explore further here, but the thing is that this is just data for one week, as I mentioned. What if we load data for 70 days? You'd have a longer graph probably, but a lot of lines and would not really be able to make sense out of that data. It would be a really crowded plot for that, so we have to come up with a better way to be able to explore that and we'll come back to that in a little bit. So, what are some other things that we can do? We can get some statistics, we can take one sample flow and look at some of the values here. We see that the forward column here and ToS column here, they have zero values, and when we explore further we see that there's a lot of values here or records here for which these columns are essentially zero, so probably not really helpful for our use case. Then, we can look at the flow end. So, flow end is the end time when the last packet in a flow was sent and you can do a select min flow and max flow to see the data when it started and when it ended, and you can see it's about one week's of data for the first til eighth. Now, we also want to look at the data whether it's balanced or not because balanced data is really important for a lot of classification use cases that we want to try with this and you can see that source address, destination address, source port, and destination port, and you see it's highly in balanced data and so is versus destination address space, so probably something that we need to do, really powerful Vertica balancing functions that you can use within, and just sampling, over-sampling, or hybrid sampling here and that can be really useful here. Another thing we can look at is there's so many statistics of these columns, so off the unique flows table that we created we just use the summarize num call function in Vertica and it gives us a lot of really cool (mumbling) and percentile information on that. Now, if we look at the duration, which is the last record here, we can see that the mean is about 4.6 seconds, but when we look at the percentile information, we see that the median is about 0.27. So, there's a lot of short flows that have duration less than 0.27 seconds. Yes, there would be more and they'd probably bring the mean to the 4.6 value, but then the number of short flows is probably pretty high. We can ask some other questions from the data about the features. We can look at the protocols here and look at the count. So, we see that most of the traffic that we have is for TCP and UDP, which is sort of expected for a data set like this, and then we want to look at what are the most popular network services here? So again, simply queue here, select destination port count, add in the information here. We get the destination port and count for each. So, we can see that most of the traffic here is web traffic, HTTP and HTTPS, followed by domain name resolution. So, let's explore some more. We can look at the label distributions. We see that the labels that are given with that because this is essentially data for which we already know whether something was an anomaly or not, record was anomaly or not, and creating our algorithm based on it. So, we see that there's this background label, a lot of records there, and then anomaly spam seems to be really high. There are anomaly UDB scans and SSS scams, as well. So, another question we can ask is among the SMTP flows, how labels are distributed, and we can say that anomaly spam is highest, and then comes the background spam. So, can we say out of this that SMTP flows, they are spams, and maybe we can build a model that actually answers that question for us? That can be one machine learning model that you can build out of this data set. Again, we can also verify the destination port of flows that were labeled as spam. So, you can expect port 25 for SMTP service here, and we can see that SMTP with destination port 25, you have a lot of counts here, but there are some other destination ports for which the count is really low, and essentially, when we're doing and analysis at this scale, these data points might not really be needed. So, as part of the data prep slash data cleaning we might want to get rid of these records here. So now, what we can do is going back to the graph that I showed earlier, we can try and plot the daily trends by aggregating them. Again, we take the unique flow and convert into a flow count and to a manageable number that we can then feed into one of the algorithms. Now, PCA principle component analysis, it's a really powerful algorithm in Vertica, and what it essentially does is a lot of times when you have a high number of columns, which might be highly (mumbling) with each other, you can feed them into the PCA algorithm and it will get for you a list of principle components which would be linearly independent from each other. Now, each of these components would explain a certain extent of the variants of the overall data set that you have. So, you can see here component one explains about 73.9% of the variance, and component two explains about 16% of the variance. So, if you combine those two components alone, that would get you for around 90% of the variance. Now, you can use PCA for a lot of different purposes, but in this specific example, we want to see if we combine all the data points that we have together and we do that by day of the week, what sort of information can we get out of it? Is there any insight that this provides? Because once you have two data points, it's really easy to plot them. So, we just apply the PCA, we first (mumbling) it, and then reapply on our data set, and this is the graph we get as a result. Now, you can see component one is on the X axis here, component two on the y axis, and each of these points represents a day of the week. Now, with just two points it's easy to plot that and compare this to the graph that we saw earlier, which had a lot of lines and the more weeks that we added or the more days that we added, the more lines that we'd have versus this graph in which you can clearly tell that five days traffic starting from Monday til Friday, that's closely clustered together, so probably pretty similar to each other, and then Saturday traffic is pretty much apart from all of these days and it's also further away from Sunday. So, these two days of traffic is different from other days of traffic and we can always dive deeper into this and look at exactly what's happening here and see how this traffic is actually different, but with just a few functions and some pretty simple SQL queries, we were already able to get a pretty good insight from the data set that we had. Now, let's move on to our next part of this talk on importing and exporting PMML models to and from Vertica. So, current common practice is when you're putting your machine learning models into production, you'd have a dev or test environment, and in that you might be using a lot of different tools, Scikit and Spark, R, and once you want to deploy these models into production, you'd put them into containers and there would be a pool of containers in the production environment which would be talking to your database that could be your analytical database, and all of the new data that's incoming would be coming into the database itself. So, as I mentioned in one of the slides earlier, there is a lot of data transfer that's happening between that pool of containers hosting your machine learning training models versus the database which you'd be getting data for scoring and then sending the scores back to the database. So, why would you really need to transfer your models? The thing is that no machine learning platform provides everything. There might be some really cool algorithms that might compromise, but then Spark might have its own benefits in terms of some additional algorithms or some other stuff that you're looking at and that's the reason why a lot of these tools might be used in the same company at the same time, and then there might be some functional considerations, as well. You might want to isolate your data between data science team and your production environment, and you might want to score your pre-trained models on some S nodes here. You cannot host probably a big solution, so there is a whole lot of use cases where model movement or model transfer from one tool to another makes sense. Now, one of the common methods for transferring models from one tool to another is the PMML standard. It's an XML-based model exchange format, sort of a standard way to define statistical and data mining models, and helps you share models between the different applications that are PMML compliant. Really popular tool, and that's the tool of choice that we have for moving models to and from Vertica. Now, with this model management, this model movement capability, there's a lot of model management capabilities that Vertica offers. So, models are essentially first class citizens of Vertica. What that means is that each model is associated with a DB schema, so the user that initially creates a model, that's the owner of it, but he can transfer the ownership to other users, he can work with the ownership rights in any way that you would work with any other relation in a database would be. So, the same commands that you use for granting access to a model, changing its owner, changing its name, or dropping it, you can use similar commands for more of this one. There are a lot of functions for exploring the contents of models and that really helps in putting these models into production. The metadata of these models is also available for model management and governance, and finally, the import/export part enables you to apply all of these operations to the model that you have imported or you might want to export while they're in the database, and I think it would be nice to actually go through and example to showcase some of these capabilities in our model management, including the PMML model import and export. So, the workflow for export would be that we trained some data, we'll train a logistic regression model, and we'll save it as an in-DB Vertica model. Then, we'll explore the summary and attributes of the model, look at what's inside the model, what the training parameters are, concoctions and stuff, and then we can export the model as PMML and an external tool can import that model from PMML. And similarly, we'll go through and example for export. We'll have an external PMML model trained outside of Vertica, we'll import that PMML model and from there on, essentially, we'll treat it as an in-DB PMML model. We'll explore the summary and attribute of the model in much the same way as in in-DB model. We'll apply the model for in-DB scoring and get the prediction results, and finally, we'll bring some test data. We'll use that on test data for which the scoring needs to be done. So first, we want to create a connection with the database. In this case, we are using a Python Jupyter Notebook. We have the Vertica Python connector here that you can use, really powerful connector, allows you to do a lot of cool stuff to the database using the Jupyter front end, but essentially, you can use any other SQL front end tool or for that matter, any other Python ID which lets you connect to the database. So, exporting model. First, we'll create an logistic regression model here. Select logistic regression, we'll give it a model name, then put relation, which might be a table, time table, or review. There's response column and the predictor columns. So, we get a logistic regression model that we built. Now, we look at the models table and see that the model has been created. This is a table in Vertica that contains a list of all the models that are there in the database. So, we can see here that my model that we just created, it's created with Vertica models as a category, model type is logistic regression, and we have some other metadata around this model, as well. So now, we can look at some of the summary statistics of the model. We can look at the details. So, it gives us the predictor, coefficients, standard error, Z value, and P value. We can look at the regularization parameters. We didn't use any, so that would be a value of one, but if you had used, it would show it up here, the call string and also additional information regarding iteration count, rejected row count, and accepted row count. Now, we can also look at the list of attributes of the model. So, select get model attribute using parameter, model name is myModel. So, for this particular model that we just created, it would give us the name of all the attributes that are there. Similarly, you can look at the coefficients of the model in a column format. So, using parameter name myModel, and in this case we add attribute name equals details because we want all the details for that particular model and we get the predictor name, coefficient, standard error, Z value, and P value here. So now, what we can do is we can export this model. So, we used the select export models and we give it a path to where we want the model to be exported to. We give it the name of the model that needs to be exported because essentially might have a lot of models that you have created, and you give it the category here, which in our example is PMML, and you get a status message here that export model has been successful. So now, let's move onto the importing models example. In much the same way that we created a model in Vertica and exported it out, you might want to create a model outside of Vertica in another tool and then bring that to Vertica for scoring because Vertica contains all of the hard data and it might make sense to host that model in Vertica because scoring happens a lot more quickly than model training. So, in this particular case we do a select import models and we are importing a logistic regression model that was created in Spark. The category here again is PMML. So, we get the status message that the import was successful. Now, let's look at the attributes, look at the models table, and see that the model is really present there. Now previously when we ran this query because we had only myModel there, so that was the only entry you saw, but now once this model is imported you can see that as line item number two here, Spark logistic regression, it's a public schema. The category here however is different because it's not an individuated model, rather an imported model, so you get PMML here and then other metadata regarding the model, as well. Now, let's do some of the same operations that we did with the in-DB model so we can look at the summary of the imported PMML model. So, you can see the function name, data fields, predictors, and some additional information here. Moving on. Let's look at the attributes of the PMML model. Select your model attribute. Essentially the same query that we applied earlier, but the difference here is only the model name. So, you get the attribute names, attribute field, and number of rows. We can also look at the coefficient of the PMML model, name, exponent, and coefficient here. So yeah, pretty much similar to what you can do with an in-DB model. You can also perform all operations on an important model and one additional thing we'd want to do here is to use this important model for our prediction. So in this case, we'll data do a select predict PMML and give it some values using parameters model name, and logistic regression, and match by position, it's a really cool feature. This is true in this case. Sector, true. So, if you have model being imported from another platform in which, let's say you have 50 columns, now the names of the columns in that environment in which you're training the model might be slightly different than the names of the column that you have set up for Vertica, but as long as the order is the same, Vertica can actually match those columns by position and you don't need to have the exact same names for those columns. So in this case, we have set that to true and we see that predict PMML gives us a status of one. Now, using the important model, in this case we had a certain value that we had given it, but you can also use it on a table, as well. So in that case, you also get the prediction here and you can look at the (mumbling) metrics, see how well you did. Now, just sort of wrapping this up, it's really important to know the important distinction between using your models in any tool, any single node solution tool that you might already be using, like Python or R versus Vertica. What happens is, let's say you build a model in Python. It might be a single node solution. Now, after building that model, let's say you want to do prediction on really large amounts of data and you don't want to go through the overhead of keeping to move that data out of the database to do prediction every time you want to do it. So, what you can do is you can import that model into Vertica, but what Vertica does differently than Python is that the PMML model would actually be distributed across each mode in the cluster, so it would be applying on the data segments in each of those nodes and they might be different threads running for that prediction. So, the speed that you get here from all prediction would be much, much faster. Similarly, once you build a model for machine learning in Vertica, the objective mostly is that you want to use up all of your data and build a model that's accurate and is not just using a sample of the data, but using all the data that's available to it, essentially. So, you can build that model. The model building process would again go through the same technique. It would actually be distributed across all nodes in a cluster, and it would be using up all the threads and processes available to it within those nodes. So, really fast model training, but let's say you wanted to deploy it on an edge node and maybe do prediction closer to where the data was being generated, so you can export that model in a PMML format and all deploy it on the edge node. So, it's really helpful for a lot of use cases. And just some rising takeaways from our discussion today. So, Vertica's a really powerful tool for machine learning, for data preparation, model training, prediction, and deployment. You might want to use Vertica for all of these steps or some of these steps. Either way, Vertica supports both approaches. In the upcoming releases, we are planning to have more import and export capability through PMML models. Initially, we're supporting kmeans, linear, and logistic regression, but we keep on adding more algorithms and the plan is to actually move to supporting custom models. If you want to do that with the upcoming release, our TensorFlow indication is always there which you can use, but with PMML, this is the starting point for us and we keep on improving that. Vertica model can be exported in PMML format for scoring on other platforms, and similarly, models that get build in other tools can be imported for in-DB machine learning and in-DB scoring within Vertica. There are a lot of critical model management tools that are provided in Vertica and there are a lot of them on the roadmap, as well, which would keep on developing. Many ML functions and algorithms, they're already part of the in-DB library and we keep on adding to that, as well. So, thank you so much for joining the discussion today and if you have any questions we'd love to take them now. Back to you, Sue.

Published Date : Mar 30 2020

SUMMARY :

and thank you for joining us today and the limit, you can hit that in terms of cost,

ENTITIES

Entity	Category	Confidence
Vertica	ORGANIZATION	0.99+
Waqas Dhillon	PERSON	0.99+
70 days	QUANTITY	0.99+
Sue LeClaire	PERSON	0.99+
two points	QUANTITY	0.99+
two days	QUANTITY	0.99+
Sue	PERSON	0.99+
seven days	QUANTITY	0.99+
one week	QUANTITY	0.99+
five days	QUANTITY	0.99+
Sunday	DATE	0.99+
two parts	QUANTITY	0.99+
second part	QUANTITY	0.99+
Saturday	DATE	0.99+
Excel	TITLE	0.99+
50 columns	QUANTITY	0.99+
4/2	DATE	0.99+
First	QUANTITY	0.99+
Python	TITLE	0.99+
each	QUANTITY	0.99+
each node	QUANTITY	0.99+
Today	DATE	0.99+
first factor	QUANTITY	0.99+
less than 0.27 seconds	QUANTITY	0.99+
Vertica	TITLE	0.99+
first	QUANTITY	0.99+
Friday	DATE	0.99+
Monday	DATE	0.99+
second aspect	QUANTITY	0.99+
eighth	QUANTITY	0.99+
today	DATE	0.99+
one day	QUANTITY	0.99+
two data points	QUANTITY	0.99+
third consideration	QUANTITY	0.99+
one	QUANTITY	0.99+
first step	QUANTITY	0.98+
first part	QUANTITY	0.98+
first one	QUANTITY	0.98+
zero values	QUANTITY	0.98+
second	QUANTITY	0.98+
both approaches	QUANTITY	0.98+
about 4.6 seconds	QUANTITY	0.98+
third thing	QUANTITY	0.98+
Secondly	QUANTITY	0.98+
one tool	QUANTITY	0.98+
zero	QUANTITY	0.98+
each mode	QUANTITY	0.98+
One	QUANTITY	0.97+
figure B	OTHER	0.97+
figure C	OTHER	0.97+
4.6 value	QUANTITY	0.97+
R	TITLE	0.97+
Machine Learning with Vertica, Data Preparation and Model Management	TITLE	0.97+
Waqas	PERSON	0.97+
each model	QUANTITY	0.97+
two main options	QUANTITY	0.97+
80%	QUANTITY	0.97+
two components	QUANTITY	0.96+
around 90%	QUANTITY	0.96+
two	QUANTITY	0.96+
later this week	DATE	0.95+

Niel Viljoen, Netronome & Nick McKeown, Barefoot Networks - #MWC17 - #theCUBE

(lively techno music) >> Hello, everyone, I'm John Furrier with theCUBE. We are here in Palo Alto to showcase a brand new relationship and technology partnership and technology showcase. We're here with Niel Viljoen, who's the CEO of Netronome. Did I get that right? (Niel mumbles) Almost think that I will let you say it, and Nick McKeown, who's Chief Scientist and Chairman and the co-founder Barefoot Networks. Guys, welcome to the conversation. Obviously, a lot going on in the industry. We're seeing massive change in the industry. Certainly, digital transmissions, the buzzword the analysts all use, but, really, what that means is the entire end-to-end digital space, with networks all the way to the applications are completely transforming. Network transformation is not just moving packets around, it's wireless, it's content, it's everything in between that makes it all work. So let's talk about that, and let's talk about your companies. Niel, talk about your company, what you guys do, Netronome and Nick, same for you, for Barefoot. Start with you guys. >> So as Netronome, our core focus lies around SmartNICs. What we mean by that, these are elements that go into the network servers, which in this sort of cloud and NFV world, gets used for a lot of network services, and that's our area of focus. >> Barefoot is trying to make switches that were previously fixed function, turning them into something that those who own and operate networks can program them for themselves to customize them or add new features or protocols that they need to support. >> And Barefoot, you're walking in the park, you don't want to step in any glass, and get a cut, and I like that, love the name of the company, but brings out the real issue of getting this I/O world if there were NICs, it throws back the old school mindset of just network cards and servers, but if you take that out on the Internet now, that is the I/O channel engine, real time, it's certainly a big part of the edge device, whether that's a human or device, IoT to mobile, and then moving it across the network, and by the way, there's multiple networks, so is this kind of where you guys are showcasing your capabilities? >> So, fundamentally, you need both sides of the line, if I could put it that way, so we, on the server side, and specifically, also giving visibility between virtual machines to virtual machines, also called VNFs to VNFs in a service chaining mechanism, which has what a lot of the NFV customers are deploying today. >> Really, as the entire infrastructure upon which these services are delivered, as that moves into software, and more of it is created by those who own and operate these services for themselves, they either create it, commission it, buy it, download it, and then modify it to best meet their needs. That's true whether it's in the network interface portion, whether it's in the switch, and they've seen it happen in the control plane, and now it's moving down so that they can define all the way down to how packets are processed in the NIC and in the switches, and when they do that, they can then add in their ability to see what's going on in ways that they've never been able to do before, so we really think of ourselves as providing that programmability and that flexibility down, all the way to the way that the packets are processed. >> And what's the impact, Nick, talk about the impact then take us through like an example. You guys are showcasing your capabilities to the world, and so what's the impact and give us an example of what the benefit would be. I mean, what goes on like this instrumentation, certainly, everyone wants to instrument everything. >> Niel: Yes. >> Nick: Yeah. >> But what's the practical benefit. I mean who wins from this and what's the real impact? >> Well, you know, in days gone by, if you're a service provider providing services to your customers, then you would typically do this out of vertically integrated pieces of equipment that you get from equipment vendors. It's closed, it's proprietary, they have their own sort of NetFlow, sFlow, whatever the mechanism that they have for measuring what's going on, and you had to learn to live with the constraints of what they had. As this all gets kind of disaggregated and broken apart, and that the owner of the infrastructure gets to define the behavior in software, they can now chain together the modules and the pieces that they need in order to deliver the service. That's great, but now they've lost that proprietary measurement, so now they need to introduce the measurement that they can get greater visibility. This actually has created a tremendous opportunity and this is what we're demonstrating, is if you can come up with a uniform way of doing this, so that you can see, for example, the path that every packet takes, the delay that it encounters along the way, the rules that it encounters that determines the path that it gets, if it encounters congestion, who else contributed to that congestion, so we know who to go blame, then by giving them that flexibility, they can go and debug systems much more quickly, and change them and modify them. >> It's interesting, it's almost like the aspirin, right? You need, the headache now is, I have good proprietary technology for point measurement and solutions, but yet I need to manage multiple components. >> I think there's an add-on to what Nick said, which is the whole key point here which is the programmability, because there's data, and then there's information. Gathering lots and lots of telemetry data is easy. (John chuckles) The problem is you need to have it at all points, which is Nick's key point, but the programmability allows the DevOps person, in other words, the operational people within the cloud or carrier infrastructure, to actually write code that identifies and isolates the data, the information rather than the data that they need. >> So is this customer-based for you guys, the carriers, the service providers, who's your target audience? >> Yep, I think it's service providers who are applying the NFV technologies, in other words, the cloud-like technologies. I always say the real big story here is the cloud technologies rather than just the cloud. >> Yeah, yeah. >> And how that's-- >> And same for you guys, you guys have this, this joint, same target customer. >> Yeah, I don't think there's any disagreement. >> Okay. (laughs) Well, I want to get drilling to the whole aspirin analogy 'cause it's of the things that you brought up with the programmability because NFV has been that, you know, saving grace, it's been the Holy Grail for how many years now, and you're starting to see the tides shifting now towards where NFV is not a silver bullet, so to speak, but it is actually accelerating some of the change, and I always like to ask people, "Hey, are you an aspirin or you a vitamin?" One guest told me, "I'm a steroid. "We make things grow faster." I'm like, "Okay," but in a way, the aspirin solves a problem, like immediate headaches, so it sounds like a lot of the things that you mentioned. That's an immediate benefit right there on the instrumentation, in an open way, multi-component, multi-vendor kind of, benefits of proprietary but open, but the point about programmability gives a lot of headroom around kind of that vitamin, that steroid piece where it's going to allow for automation, which brings an interesting thing, that's customizable automation, meaning, you can apply software policy to it. Is that kind of like, can you tease that out, is that an area that you guys talking about? >> I think the first thing that we should mention is probably the new language called P4. I think Nick will be too modest to state that but I think Nick has been a key player in, along with his team and many other people, in the definition and the creation of this language, which allows the programmability of all these elements. >> Yeah, just drill down, I mean, toot your own horn here, let's get into it because what is it and what's the benefit and what is the real value, what's the upshot of P4? >> Yeah, the way that hardware that processes packets, whether it's in network interface cards, or in switching, the way that that's been defined in the past, has been by chip designers. At the time that they defined the behavior, they're writing Verilog or VHDL, and as we know, people that design chips, don't operate big networks, so they really know what capabilities to put in-- >> They're good at logic in a vacuum but not necessarily in the real world, right? Is that what you (laughs). >> So what we-- >> Not to insult chip designers, they're great, right? >> So what we've all wanted to do for some time is to come up with a uniform language, a domain-specific language that allows you to define how packets will be processed in interfaces, in switches, in hypervisor switches inside the virtual machine environments, in a uniform way so that someone who's proficient in that language can then describe a behavior that can then operate in different paths of the chained services, so that they can get the same behavior, a uniform behavior, so that they can see the network-wide, the service-wide behavior in a uniform way. The P4 language is merely a way to describe that behavior, and then both Netronome and Barefoot, we each have our own compilers for compiling that down to the specific processing element that operates in the interfaces and in the switches. >> So you're bridging the chip layer with some sort of abstraction layer to give people the ability to do policy programming, so all the heavy lifting stuff in the old network days was configuration management, I mean all the, I mean that was like hard stuff and then, now you got dynamic networks. It even gets harder. Is this kind of where the problem goes away? And this is where automation. >> Exactly, and the key point is the programmability versus configurability. >> John: Yeah. >> In a configurable environment, you're always trying to pre-guess what your customer's going to try to look at. >> (chuckles) Guessing's not good in the networking area. That's not good for five nines. >> In the new world that we're in now, the customer actually wants to define exactly what the information is they want to extract-- >> John: I wanted to get-- >> Which is your whole question around the rules and-- >> So let me see if I can connect the dots here, just kind of connect this for, and so, in the showcase, you guys are going to show this programmability, this kind of efficiency at the layer of bringing instrumentation then using that information, and/or data depending on how it's sliced and diced via the policy and programmability, but this becomes cloud-like, right? So when you start moving, thinking about cloud where service providers are under a lot of pressure to go cloud because Over-The-Top right now is booming, you're seeing a huge content and application market that's super ripe for kind of the, these kinds of services. They need that ability to have the infrastructure be like software, so infrastructure is code, is the DevOps term that we talk about in our DevOps world, but that has been more data-centered kind of language, with developers. Is it going the same trajectory in the service provider world because you have networks, I mean they're bigger, higher scale. What are some of those DevOps dynamics in your world? Can you talk about that and share some color on that? >> I mean, the way in which large service providers are starting to deliver those services is out of something that looks very much like the cloud platform. In fact, it could in fact be exactly the same technology. The same servers, the same switches, same operating systems, a lot of the same techniques. The problem they're trying to solve is slightly different. They're chaining together the means to process a sequence of operations. A little bit like, though the cloud operators are moving towards microservices that get chained together, so there are a lot of similarities here and the problems they face are very similar, but think about the hell that this potentially creates for them. It means that we're giving them so much rope to hang themselves because everything is now got to be put together in a way that's coming from different sources, written and authored by different people with different intent, or from different places across the Internet, and so, being able to see and observe exactly how this is working is even more critical than-- >> So I love that rope to hang yourself analogy because a lot of people will end up breaking stuff as Mark Zuckerberg's famous quote is, "Move fast, break stuff," and then by the way, when they 100 million users and moved, slogan went for, "Move fast, be reliable," so he got on the five nines bandwagon pretty quick, but it's more than just the instrumentation. The key that you're talking about here is that they have to run those networks in really high reliability environments. >> Nick: Correct. >> And so that begs the challenge of, okay, it's not just easy as throwing a docker container at something. I mean that's what people are doing now, like hey, I'm going to just use microservices, that's the answer. They still got stuff under the hood, but underneath microservices. You have orchestration challenges and this kind of looks and feels like the old configuration management problems but moved up the stack, so is that a concern in your market as well? >> So I think that's a very, very good point that you make because the carriers, as you say, tend to be more dependent, almost, on absolute reliability, and very importantly, performance, but in other words, they need to know that this is going to be 100 gigs because that's what they've signed up the SLA with their customer for. (John chuckles) It's not going to be almost 100 gigs 'cause then they're going to end up paying a lot of penalties. >> Yeah, they can't afford breakage. They're OpsDev, not DevOps. Which comes first in their world? >> Yes, so the critical point here is just that this is where the demo that we're doing which shows the ability to capture all this information at line rate, at very high speeds in the switches. (mumbles) >> So let's about this demo you're doing, this showcase that you guys are providing and demonstrating to the marketplace, what's the pitch, I mean what is it, what's the essence of the insight of this demo, what's it proving? >> So I think that the, it's good to think about a scenario in which you would need this, and then this leads into what the demo would be. Very common in an environment like the VNF kind of environment, where something goes wrong, they're trying to figure out very quickly, who's to blame, which part of the infrastructure was the problem? Could it be congestion, could it be a misconfiguration? (John laughs) >> Niel: Who's flow-- >> Everyone pointing finger at the other guy. >> Nick: The typical way-- >> Two days later, what happened, really? >> Typical way that they do this, is they'll bring the people that are responsible for the compute, the networking, and the storage quickly into one room, and say, "Go figure it out." The people that are doing the compute, they'll be modifying and changing and customizing, running experiments, isolating the problem. So are the people that are doing storage. They can program their environment. In the past, the networking people had ping and traceroute. That's the same tools that they had 20 years ago. (John chuckles) What we're doing is changing that by introducing the means where they can program and configure, run different experiments, run different probes, so that they can look and see the things that they need to see, and in the demo in particular, you'll be able to see the packets coming in through a switch, through a NIC, through a couple of VMs, back out through a switch, and then you can look at that packet afterwards, and you can ask questions of the packet itself, something you've never been able to-- >> It's the ultimate debugger. Basically, it's the ultimate debugger. >> Nick: That's right. Go to the packet, say-- >> Niel: Programmable debugger. >> "Which path did you take? "How long did you wait at each NIC, "at each VM, at each switch port as you went through? "What are the rules that you followed "that led you to be here, and if you encountered "some congestion, whose fault was it? "Who did you share that queue with?" so we can go back and apportion the blame-- >> So you get a multiple dimension of path information coming in, not just the standard stovepiped tools-- >> Nick: That's right. >> And then, everyone compares logs and then there's all these holes in it, people don't know what the hell happened. >> And through the programmability, you can isolate the piece of the information-- >> So the experimentation agile is where I think, is that what you're getting at? You can say, you can really get down and dirty into a duplication environment and also run these really fast experiments versus kind of in theory or in-- >> Exactly, which is what, as Nick said, is exactly what people on the server side and on the storage side have been able to do in the past. >> Okay so for people watching that are kind of getting into this and people who aren't, just give me in order maybe through of the impact and the consequences of not taking this approach, vis-a-vis the available, today's available techniques. >> If you wanted to try and figure out who it was that you were sharing a queue with inside an interface or inside a switch, you have no way to do that today, right? No means to do that, and so if you wanted to be able to say it's that aggressive flow over there, that malfunction in service over there, you've got no means to do it. As a consequence, the networking people always get the blame because they can't show that it wasn't them. But if you can say, I can see, in this queue, there were four flows going through or 4,000 flows, and one of them was really badly behaved, and it was that one over there and I can tell you exactly why its packets were ending up here, then you can immediately go in and shut that one down. They have no way that they go and randomly shut-- >> Can I get this for my family, I need this for my household. I mean, I'm going to use this for my kids. I mean I know exactly the bad behavior, I need to prove it. No, but this is what the point is, is this is fast. I mean you're talking speed, too, as another aspect-- >> Niel: It's all about the-- >> What's the speed lag on approach versus taking the old, current approach versus this joint approach you guys are taking? What's the, give me an estimate on just ballpark numbers-- >> Well there's two aspects to the speed. One is the speed at which it's operating, so this is going to be in the demo, it's running at 40 gigabits per seconds, but this can easily run, for example, in the Barefoot switch, it'll run at 6 terabits per second. The interesting thing here is that in this entire environment, this measurement capability does not generate a single extra packet. All of it is self-contained in the packets that are already flowing. >> So there's no latency issues on running this in production. >> If you wanted then change the behavior, you needed to go and modify what was happening in the NIC, modify what was happening in the switch, you can do that in minutes. So that you can say-- >> Now the time it takes for a user now to do this, let's go to that time series. What does that look like? So current method is get everyone in a room, do these things, are we talking, you know. >> I think that today, it's just simply not possible. >> Not possible. >> So it's, yes, new capability. >> I think is the key issue. >> So this is a new capability. >> This is a new capability and exactly as Nick said, it's getting the network to the same level of ability that you always had inside the-- >> So I got to ask you guys, as founders of your companies because this is one of those things that's a great success story, entrepreneurs, you got, it's not just a better mousetrap, it's revolutionary in the sense that no one's ever had the capability before, so when you go to events like Mobile World Congress, you're out in the field, are you shaking people like, "You need me! "I need to cut the line and tell you what's going on." I mean, you must have a sense of urgency that, is it resonating with the folks you're talking to? I mean, what are some of the conversations you're having with folks? They must be pretty excited. Can you share any anecdotal stories? >> Well, yup, I mean we're finding, across the industry, not only in the service providers, the data center companies, Wall Street, the OEM box vendors, everybody is saying, "I need," and have been saying for a long time, "I need the ability to probe into the behavior "of individual packets, and I need whoever is owning "and operating the network to be able to customize "and change that." They've never been able to do that. The name of the technique that we use is called In-band Network Telemetry or INT, and everybody is asking for it now. Actually, whether it's with the two of us, or whether they're asking for it more generally, this is, this is-- >> Game changer. >> You'll see this everywhere. >> John: It's a game changer, right? >> That's right. >> Great, all right, awesome. Well, final question is, is that, what's the business benefits for them because I can imagine you get this nailed down with the proper, the ability to test new apps because obviously, we're in a Wild West environment, tsunami of apps coming, there's always going to be some tripwires in new apps, certainly with microservices and APIs. >> I think the general issues that we're addressing here is absolutely crucial to the successful rollout of NFV infrastructures. In other words, the ability to rapidly change, monitor, and adapt is critical. It goes wider than just this particular demo, but I think-- >> It's all apps on the service provider. >> The ability to handle all the VNFs-- >> Well, in the old days, it was simply network spikes, tons of traffic, I mean, now you have, apps could throw off anomalies anywhere, right? You'd have no idea what the downstream triggers could be. >> And that's the whole notion of the programmable network, which is critical. >> Well guys, any information where people can get some more information on this awesome opportunity? You guys' sites, want to share quick web addresses and places people get whitepapers or information? >> For the general P4 movement, there's P4.org. P, the number four, .org. Nice and easy. They'll find lots of information about the programmability that's possible by programming the, the forwarding being what both of us are doing. In-band Network Telemetry, you'll find descriptions there, P4 programs, and whitepapers describing that, and of course, on the two company websites, Netronome and Barefoot. >> Right. Nick and Niel, thanks for spending some time sharing the insights and congratulations. We'll keep an eye for it, and we'll be talking to you soon. >> Thank you. >> Thank you very much. >> This is theCUBE here in Palo Alto. I'm John Furrier, thanks for watching. (lively techno music)

Published Date : Mar 13 2017

SUMMARY :

and the co-founder Barefoot Networks. that go into the network servers, that they need to support. So, fundamentally, you need both sides of the line, and in the switches, and when they do that, talk about the impact then take us through like an example. I mean who wins from this and what's the real impact? and broken apart, and that the owner It's interesting, it's almost like the aspirin, right? that identifies and isolates the data, is the cloud technologies rather than just the cloud. And same for you guys, you guys have this, 'cause it's of the things that you brought up in the definition and the creation of this language, in the past, has been by chip designers. Is that what you (laughs). that operates in the interfaces and in the switches. so all the heavy lifting stuff in the old network days Exactly, and the key point is the programmability what your customer's going to try to look at. (chuckles) Guessing's not good in the networking area. in the showcase, you guys are going to show and the problems they face are very similar, is that they have to run those networks And so that begs the challenge of, okay, because the carriers, as you say, Which comes first in their world? in the switches. Very common in an environment like the VNF and see the things that they need to see, Basically, it's the ultimate debugger. Go to the packet, say-- and then there's all these holes in it, and on the storage side have been able to do in the past. of the impact and the consequences always get the blame because they can't show I mean I know exactly the bad behavior, I need to prove it. One is the speed at which it's operating, So there's no latency issues on running this in the NIC, modify what was happening in the switch, Now the time it takes for a user now to do this, that no one's ever had the capability before, "I need the ability to probe into the behavior because I can imagine you get this nailed down is absolutely crucial to the successful rollout Well, in the old days, it was simply network spikes, And that's the whole notion of the programmable network, and of course, on the two company websites, sharing the insights and congratulations. This is theCUBE here in Palo Alto.

ENTITIES

Entity	Category	Confidence
Nick McKeown	PERSON	0.99+
Niel Viljoen	PERSON	0.99+
Niel	PERSON	0.99+
Nick	PERSON	0.99+
John	PERSON	0.99+
John Furrier	PERSON	0.99+
100 gigs	QUANTITY	0.99+
Palo Alto	LOCATION	0.99+
Barefoot Networks	ORGANIZATION	0.99+
Netronome	ORGANIZATION	0.99+
two	QUANTITY	0.99+
Mark Zuckerberg	PERSON	0.99+
Barefoot	ORGANIZATION	0.99+
two aspects	QUANTITY	0.99+
Mobile World Congress	EVENT	0.99+
both	QUANTITY	0.99+
#MWC17	EVENT	0.99+
two company	QUANTITY	0.98+
each VM	QUANTITY	0.98+
One	QUANTITY	0.98+
today	DATE	0.98+
one	QUANTITY	0.98+
100 million users	QUANTITY	0.98+
each switch	QUANTITY	0.98+
Two days later	DATE	0.98+
20 years ago	DATE	0.98+
four	QUANTITY	0.97+
one room	QUANTITY	0.96+
first thing	QUANTITY	0.96+
both sides	QUANTITY	0.96+
each	QUANTITY	0.96+
each NIC	QUANTITY	0.96+
One guest	QUANTITY	0.95+
.org.	OTHER	0.95+
first	QUANTITY	0.94+
6 terabits per second	QUANTITY	0.94+
single extra packet	QUANTITY	0.91+
4,000 flows	QUANTITY	0.88+
P4	TITLE	0.88+
40 gigabits per seconds	QUANTITY	0.85+
five nines bandwagon	QUANTITY	0.84+
five nines	QUANTITY	0.84+
theCUBE	ORGANIZATION	0.76+
almost 100 gigs	QUANTITY	0.76+
DevOps	TITLE	0.75+
#theCUBE	ORGANIZATION	0.69+
Verilog	TITLE	0.67+
NetFlow	ORGANIZATION	0.66+
OpsDev	ORGANIZATION	0.64+
VNFs	TITLE	0.62+
P4	OTHER	0.61+
agile	TITLE	0.59+
P4	ORGANIZATION	0.58+
Wall	ORGANIZATION	0.56+
P4.org	TITLE	0.5+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for 200 work flows: