Robert Nishihara, Anyscale | AWS re:Invent 2022 - Global Startup Program
>>Well, hello everybody. John Walls here and continuing our coverage here at AWS Reinvent 22 on the queue. We continue our segments here in the Global Startup program, which of course is sponsored by AWS Startup Showcase, and with us to talk about any scale as the co-founder and CEO of the company, Robert and n, you are Robert. Good to see you. Thanks for joining us. >>Yeah, great. And thank you. >>You bet. Yeah. Glad to have you aboard here. So let's talk about Annie Scale, first off, for those at home and might not be familiar with what you do. Yeah. Because you've only been around for a short period of time, you're telling me >>Company's about >>Three years now. Three >>Years old, >>Yeah. Yeah. So tell us all about it. Yeah, >>Absolutely. So one of the biggest things happening in computing right now is the proliferation of ai. AI is just spreading throughout every industry has the potential to transform every industry. But the thing about doing AI is that it's incredibly computationally intensive. So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, you're doing it across many machines, many gpu, many compute resources, and that's incredibly hard to do. It requires a lot of software engineering expertise, a lot of infrastructure expertise, a lot of cloud computing expertise to build the software infrastructure and distributed systems to really scale AI across all of the, across the cloud. And to do it in a way where you're really getting value out of ai. And so that is the, the problem statement that AI has tremendous potential. It's incredibly hard to do because of the, the scale required. >>And what we are building at any scale is really trying to make that easy. So trying to get to the point where, as a developer, if you know how to program on your laptop, then if you know how to program saying Python on your laptop, then that's enough, right? Then you can do ai, you can get value out of it, you can scale it, you can build the kinds of, you know, incredibly powerful applica AI applications that companies like Google and, and Facebook and others can build. But you don't have to learn about all of the distributed systems and infrastructure. It just, you know, we'll handle that for you. So that's, if we're successful, you know, that's what we're trying to achieve here. >>Yeah. What, what makes AI so hard to work with? I mean, you talk about the complexity. Yeah. A lot of moving parts. I mean, literally moving parts, but, but what is it in, in your mind that, that gets people's eyes spinning a little bit when they, they look at great potential. Yeah. But also they look at the downside of maybe having to work your way through Pike mere of sorts. >>So, so the potential is definitely there, but it's important to remember that a lot of AI initiatives fail. Like a lot of initiative AI initiatives, something like 80 or 90% don't make it out of, you know, the research or prototyping phase and inter production. Hmm. So, some of the things that are hard about AI and the reasons that AI initiatives can fail, one is the scale required, you know, moving. It's one thing to develop something on your laptop, it's another thing to run it across thousands of machines. So that's scale, right? Another is the transition from development and prototyping to production. Those are very different, have very different requirements. Absolutely. A lot of times it's different teams within a company. They have different tech stacks, different software they're using. You know, we hear companies say that when they move from develop, you know, once they prototype and develop a model, it could take six to 12 weeks to get that model in production. >>And that often involves rewriting a lot of code and handing it off to another team. So the transition from development to production is, is a big challenge. So the scale, the development to production handoff. And then lastly, a big challenge is around flexibility. So AI's a fast moving field, you see new developments, new algorithms, new models coming out all the time. And a lot of teams we work with, you know, they've, they've built infrastructure. They're using products out there to do ai, but they've found that it's sort of locking them into rigid workflows or specific tools, and they don't have the flexibility to adopt new algorithms or new strategies or approaches as they're being developed as they come out. And so they, but their developers want the flexibility to use the latest tools, the latest strategies. And so those are some of the main problems we see. It's really like, how do you scale scalability? How do you move easily from development and production and back? And how do you remain flexible? How do you adapt and, and use the best tools that are coming out? And so those are, yeah, just those are and often reasons that people start to use Ray, which is our open source project in any scale, which is our, our product. So tell >>Me about Ray, right? Yeah. Opensource project. I think you said you worked on it >>At Berkeley. That's right. Yeah. So before this company, I did a PhD in machine learning at Berkeley. And one of the challenges that we were running into ourselves, we were trying to do machine learning. We actually weren't infrastructure or distributed systems people, but we found ourselves in order to do machine learning, we found ourselves building all sorts of tools, ad hoc tools and systems to scale the machine learning, to be able to run it in a reasonable amount of time and to be able to leverage the compute that we needed. And it wasn't just us people all across, you know, machine learning researchers, machine learning practitioners were building their own tooling and infrastructure. And that was one of the things that we felt was really holding back progress. And so that's how we slowly and kind of gradually got into saying, Hey, we could build better tools here. >>We could build, we could try to make this easier to do so that all of these people don't have to build their own infrastructure. They can focus on the actual machine learning applications that they're trying to build. And so we started, Ray started this open source project for basically scaling Python applications and scaling machine learning applications. And, well, initially we were running around Berkeley trying to get all of our friends to try it out and, and adopt it and, you know, and give us feedback. And if it didn't work, we would debug it right away. And that slow, you know, that gradually turned into more companies starting to adopt it, bigger teams starting to adopt it, external contributors starting to, to contribute back to the open source project and make it better. And, you know, before you know it, we were hosting meetups, giving to talks, running tutorials, and the project was just taking off. And so that's a big part of what we continue to develop today at any scale, is like really fostering this open source community, growing the open source user base, making sure Ray is just the best way to scale Python applications and, and machine learning applications. >>So, so this was a graduate school project That's right. You say on, on your way to getting your doctorate and now you commercializing now, right? Yeah. I mean, so you're being able to offer it, first off, what a journey that was, right? I mean, who would've thought Absolutely. I guess you probably did think that at some point, but >>No, you know, when we started, when we were working on Ray, we actually didn't anticipate becoming a company, or we at least just weren't looking that far ahead. We were really excited about solving this problem of making distributed computing easy, you know, getting to the point where developers just don't have to learn about infrastructure and distributed systems, but get all the benefits. And of course, it wasn't until, you know, later on as we were graduating from Berkeley and we wanted to continue really taking this project further and, and really solving this problem that it, we realized it made sense to start a company. >>So help me out, like, like what, what, and I might have missed this, so I apologize if I did, but in terms of, of Ray's that building block and essential for your, your ML or AI work down the road, you know, what, what is it doing for me or what, what will it allow me to do in either one of those realms that I, I can't do now? >>Yeah. And so, so like why use Ray versus not using Ray? Yeah, I think the, the answer is that you, you know, if you're doing ai, you need to scale. It's becoming, if you don't find that to be the case today, you probably will tomorrow, you know, or the day after that. And so it's really increasingly, it's a requirement. It's not an option. And so if you're scaling, if you're trying to build these scalable applications you are building, you're either going to use Ray or, or something like Ray or you're going to build the infrastructure yourself and building the infrastructure yourself, that's a long journey. >>So why take that on, right? >>And many of the companies we work with don't want to be in the business of building and managing infrastructure. No. Because, you know, if they, they want their their best engineers to build their product, right? To, to get their product to market faster. >>I want, I want you to do that for me. >>Right? Exactly. And so, you know, we can really accelerate what these teams can do and, you know, and if we can make the infrastructure something they just don't have to think about, that's, that's why you would choose to use Ray. >>Okay. You know, between a and I and ml are, are they different animals in terms of what you're trying to get done or what Ray can do? >>Yeah, and actually I should say like, it's not just, you know, teams that are new teams that are starting out, that are using Ray, many companies that have built, already built their own infrastructure will then switch to using Ray. And to give you a few examples, like Uber runs all their deep learning on Ray, okay. And, you know, open ai, which is really at the frontier of training large models and, and you know, pushing the boundaries of, of ai, they train their largest models using Ray. You know, companies like Shopify rebuilt their entire machine learning platform using Ray, >>But they started somewhere else. >>They had, this is all, you know, like, it's not like the v1, you know, of their, of their machine learning infrastructure. This is like, they did it a different way before, this is like the second version or the third iteration of of, of how they're doing it. And they realize often it's because, you know, I mean in the case of, of Uber, just to give you one example, they built a system called hova for scaling deep learning on a bunch of GPUs. Right Now, as you scale deep learning on GPUs for them, the bottleneck shifted away from, you know, as you scale GPU's training, the bottleneck shifted away from training and to the data ingest and pre-processing. And they wanted to scale data ingest and pre-processing on CPUs. So now Hova, it's a deep learning framework. It doesn't do the data ingest and pre-processing on CPUs, but you can, if you run Hova on top of Ray, you can scale training on GPUs. >>And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. You can pipeline them together. And that allowed them to train larger models on more data before, just to take one example, ETA prediction, if you get in an Uber, it tells you what time you're supposed to arrive. Sure. That uses a deep learning model called d eta. And before they were able to train on about two weeks worth of data. Now, you know, using Ray and for scaling the data, ingestive pre-processing and training, they can train on much more data. You know, you can get more accurate ETA predictions. So that's just one example of the kind of benefit they were able to get. Right. Also, because it's running on top of, of Ray and Ray has this ecosystem of libraries, you know, they can also use Ray's hyper parameter tuning library to do hyper parameter tuning for their deep learning models. >>They can also use it for inference and you know, because these are all built on top of Ray, they inherit the like, elasticity and fault tolerance of running on top of Ray. So really it simplifies things on the infrastructure side cuz there's just, if you have Ray as common infrastructure for your machine learning workloads, there's just one system to, to kind of manage and operate. And if you are, it simplifies things for the end users like the developers because from their perspective, they're just writing a Python application. They don't have to learn how to use three different distributed systems and stitch them together and all of this. >>So aws, before I let you go, how do they come into play here for you? I mean, are you part of the showcase, a startup showcase? So obviously a major partner and major figure in the offering that you're presenting >>People? Yeah, well you can run. So any scale is a managed ray service. Like any scale is just the best way to run Ray and deploy Ray. And we run on top of aws. So many of our customers are, you know, using Ray through any scale on aws. And so we work very closely together and, and you know, we have, we have joint customers and basically, and you know, a lot of the value that any scale is adding on top of Ray is around the production story. So basically, you know, things like high availability, things like failure handling, retry alerting, persistence, reproducibility, these are a lot of the value, the values of, you know, the value that our platform adds on top of the open source project. A lot of stuff as well around collaboration, you know, imagine you are, you, something goes wrong with your application, your production job, you want to debug it, you can just share the URL with your, your coworker. They can click a button, reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. And also a lot around, one thing that's, that's important for a lot of our customers is efficiency around cost. And so we >>Support every customer. >>Exactly. A lot of people are spending a lot of money on, on aws. Yeah. Right? And so any scale supports running out of the box on cheaper like spot instances, these preempt instances, which, you know, just reduce costs by quite a bit. And so things like that. >>Well, the company is any scale and you're on the show floor, right? So if you're having a chance to watch this during reinvent, go down and check 'em out. Robert Ashihara joining us here, the co-founder and ceo and Robert, thanks for being with us. Yeah. Here on the cube. Really enjoyed it. Me too. Thanks so much. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you go. Very nicely done. All right, we're gonna continue our coverage here on the Cube with more here from Las Vegas. We're the Venetian, we're AWS Reinvent 22 and you're watching the Cube, the leader in high tech coverage.
SUMMARY :
scale as the co-founder and CEO of the company, Robert and n, you are Robert. And thank you. for those at home and might not be familiar with what you do. Three years now. Yeah, So if you wanna do do ai, you're not, you're probably not just doing it on your laptop, It just, you know, we'll handle that for you. I mean, you talk about the complexity. can fail, one is the scale required, you know, moving. And how do you remain flexible? I think you said you worked on it you know, machine learning researchers, machine learning practitioners were building their own tooling And, you know, before you know it, we were hosting meetups, I guess you probably did think that at some point, distributed computing easy, you know, getting to the point where developers just don't have to learn It's becoming, if you don't find that to be the case today, No. Because, you know, if they, they want their their best engineers to build their product, And so, you know, we can really accelerate what these teams can do to get done or what Ray can do? And to give you a few examples, like Uber runs all their deep learning on Ray, They had, this is all, you know, like, it's not like the v1, And then Ray has another library called Ray Data you can, that lets you scale the ingest and pre-processing on CPUs. And if you are, it simplifies things for the end users reproduce the exact same thing, look at the same logs, you know, and, and, and figure out what's going on. these preempt instances, which, you know, just reduce costs by quite a bit. Boy, three years graduate program and boom, here you are, you know, with off to the enterprise you
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Robert | PERSON | 0.99+ |
Robert Nishihara | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
Robert Ashihara | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Ray | PERSON | 0.99+ |
Las Vegas | LOCATION | 0.99+ |
Annie Scale | PERSON | 0.99+ |
90% | QUANTITY | 0.99+ |
Three | QUANTITY | 0.99+ |
Berkeley | LOCATION | 0.99+ |
80 | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Three years | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
second version | QUANTITY | 0.99+ |
tomorrow | DATE | 0.99+ |
ORGANIZATION | 0.99+ | |
Shopify | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
12 weeks | QUANTITY | 0.99+ |
today | DATE | 0.99+ |
third iteration | QUANTITY | 0.99+ |
one system | QUANTITY | 0.99+ |
one example | QUANTITY | 0.99+ |
Ray | ORGANIZATION | 0.98+ |
three years | QUANTITY | 0.98+ |
one | QUANTITY | 0.97+ |
about two weeks | QUANTITY | 0.96+ |
first | QUANTITY | 0.96+ |
thousands of machines | QUANTITY | 0.92+ |
aws | ORGANIZATION | 0.91+ |
one thing | QUANTITY | 0.91+ |
Anyscale | PERSON | 0.9+ |
hova | TITLE | 0.84+ |
Hova | TITLE | 0.83+ |
Venetian | LOCATION | 0.81+ |
money | QUANTITY | 0.79+ |
Reinvent 22 | EVENT | 0.78+ |
Invent | EVENT | 0.76+ |
three | QUANTITY | 0.74+ |
Startup Showcase | EVENT | 0.71+ |
Ray | TITLE | 0.67+ |
Reinvent 22 | TITLE | 0.65+ |
2022 - Global Startup Program | TITLE | 0.63+ |
things | QUANTITY | 0.62+ |
ceo | PERSON | 0.58+ |
Berkeley | ORGANIZATION | 0.55+ |
v1 | TITLE | 0.47+ |
Startup | OTHER | 0.38+ |
Soni Jiandani and David Hughes | Aruba & Pensando Announce New Innovations
>>I'm john free with the Q we are here. It's exciting news around the next evolution switching, Sony jean Donny, co founder and chief business officer Pensando and David Hughes chief product and technology officer Aruba HP. Welcome back. We just heard from Antonio neary and john Chambers about the HPV Ruba partnership with Pensando and the new switching platform. Tell me more about the exciting news you're announcing? >>Yeah, I'm really excited today to be introducing the CX 10,000 distributed services switch. It's a brand new class of switch way bringing together the best of Aruba switching technology adding to R C X portfolio combining with Pence Sandoz technology that technology embedded in the platform. The problem we're solving is that in a traditional data center, all of those services like fire walling and low balancing provided by centralized appliances. And while that might be okay for north south traffic traffic that's going in and out of the data center. It's not scalable and it's not cost effective to apply to every service in every port to every flow traversing their data center As we all know with microservices more and more of the traffickers east west over 70% today and growing and so what we're doing here with the C X 10,000 is giving enterprises away to take the smart nick technology that's been proven out by hyper scholars and introduce it into their data centers in a very cost effective and easy to deploy way we're embedding that capability in the top of rack switch so that we can apply Fireable services, low balancing services to every port To every flow, delivering 100 times a scale in terms of a CLS 10 times of performance, in terms of encryption at a third of the cost of those traditional network architectures. So it's a super exciting time, >>love the speed, love the energy there. But I gotta ask what makes this a new category of switch. >>Well if you take a look at the journey we have been on as we have evolved our data centers and the applications have evolved for our customers. Uh and the world is now a bold new world of multi cloud. Uh the architecture is in the data center which are leaves spine architectures have become the new norm. Software defined, networking is pervasively deployed by our customers but as this journey began five or seven or even about 10 years ago uh and has culminated into a much more mature set of building blocks. We have taken the problem from one space of automating networks in the data center to then introducing lots and lots of expensive appliances to bring about security for example, or the state full services, whether it's load balancing or whether it's encryption and visibility and telemetry types of services. Now the customers had to try, you know, trombone all the traffic in and out of these appliances driving up the cost uh and the complexity and when time comes to troubleshoot these environments, it's extremely complex because you're trying to rationalize fabrics coming from one place appliances coming from four or five different vendors, maintaining all the software elements that need to be kept track off. Uh and as more and more customers want to aspire towards zero trust security model. Uh we need to start to embrace a lot of the principles that have been implemented by the hyper scholars and the cloud vendors, which is doing away with the appliances doing away with agent technology on servers, but instead to bring that technology for east west uh into play as well as to ensure that if there are bad actors that are landing inside of the data centers that they do not have the ability to, you know, create attack surfaces with complete lateral movement. Today, that is possible. Uh if you look at 70% of all the attacks that have been happening here in the past few years, it's as a result of having a attack surface which is pretty large in the data centers. And that gets further complicated when you move towards a multi cloud environment where the perimeter of the data center is now moving into the edge. Whether that edges, whether fleet resides for our customers or whether that edge happens to be a co location edge where you're building your own rampant off ramps. So I think the compelling event essentially is driven by the whole notion of distribution of services and having them available from a security and from a services point of view and these are state full services as close to the workload as you possibly can get them. >>So you guys really hit on some key points, their cloud, native microservices East west, north south, um no perimeter edge. These are topics that we would talk about kind of individually over the years, it's happening now all at the same time, this is causing a lot of complexities and then the security challenges you just laid out are everywhere. This brings up a big conversation around solving this. How does this new architecture, this solution solve the complexity and the security challenges in the data center. >>If you look at the use cases that our customers are talking about. The first, the initial use case really is to bring about security and state full security for east west traffic right into the fabric of their data centers. So having the ability to deliver that while eliminating the complex appliances only to do the job which they do very well, which is not South protection of services. Uh that also allows us the ability then to start to deliver visibility and telemetry at the same time that we're delivering state full security firewall and micro segmentation services because what I cannot see, I cannot secure. Uh so those two elements are initial use cases out of the box for our customers as we deliver this platform to them and then as more and more use cases that are becoming evident to us through customer interactions come into play. For example, the co location edge that I would like. David to walk you through a bit more in terms of how we help solve for that use case. >>So for the cooler use case, I think we're moving from a world where people talk about data centers to now talking about centers of data and those centers of data. Yes, they can be in a core private data center, they could be in the cloud but more and more they're going to be distributed around the edge in co location environments. And what we need to be able to do is extend those services that were provided in the data center to be provided in those Kahlo's at the edge And again we want to do that without having to deploy a whole rack of appliances that may be cost more than a computer itself and so with the CX- 10,000 we can have that as a top of rack switch for that polo. And from that switch deploy all of the encryption and firewall ng services that that polo requires. And what's important is that we're doing it with the same policy framework under the same management system across the whole enterprise in the data center as well as in these co location environments and out into the cloud. >>So you guys mentioned visibility and a quick follow up on this question because you mentioned visibility can't see it, you can't protect it. But also there's a lot of workloads that people are trying to automate. These are two factors. Can you guys just double down on that? I want to just get that out there because I think this becomes a big thing. >>I think policy having the ability to have an intent based policy that is a foundational technology building block that we are brought together is a very important element. And then when you map it back to tools that Aruba is extending support for including this platform, become very valuable. So David, why don't you walk us >>through? You know, I think one of the advantages that we bring is that this is an extension of the Aruba C X switching portfolio. So yeah, it's a cloud native microservices, very modern switch architecture and we have a comprehensive management platform, the Aruba fabric controller. And so what we are doing is making sure that everything fits together nicely, that we're delivering a complete solution to our customers. But one important thing to mention here is that we are thinking about how customers can do this step by step. So no, we're not requiring them to rebuild their entire data center, They can do this one rack at a time. We can work with their existing spine and deploy one leaf at a time in a very measured way. And so we think it's a great way for enterprises to be able to consume this modern distributed platform. >>That's a great segment. The next question. I mean I totally see this as you guys are talking about the cloud native trend, driving a cloud operational model to every edge. The data center is just another edge. It's a center of data. Love that. I love that line. So I have to kind of ask the operational side of the question, how would an enterprise customers manage all this take us through the nuts and bolts of deploying and managing of his gum? A customer >>That's a very good question. If you take a look at the customer's deployment models and let's let's take the example of they want to now bring in this technology and build a part or highly secure part with it for east west and to make sure that they're protecting 100% of that east west traffic. I think that leveraging all the building blocks that we have innovated between us and Aruba. We want to make sure that the ecosystem that the customer has built, they want whether they have built it with companies like Splunk and service now or Guardianco, they want integration points will be made available to them. If you take a look at, take a step back and say for these environments as you aspire to go toward zero trade security. The issues of inserting security appliances into network flows and having the ability to map it to the knowledge of applications and their dependencies for policy becomes an important function to tackle. So once you accept that, Okay, I have state full security functions built into this top of rack device available for my applications and all workloads, whether they're container workloads, bare metal workload, virtualized workloads uh and I have complete visibility into those workloads without compromising on connectivity and I can control through enforcement of policy where I need it because now security is part of the fabric, it's not a bolt on. Then comes the job of integration with an ecosystem. So whether you're looking at seem and sold companies where we are delivering in close collaboration with Splunk, A Pensando app for Splunk there's also going to be the availability of an elastic module, A plug in module. Uh then turn attention to what's more automation and devops and civil playbooks for the C X 10-K will be made available day one so that where you do not have the ability to deploy the A. F. C. You can use your existing answerable toolkit and they're making those playbooks available to our customers. Uh They want integration with application discovery mapping companies like Guardianco, allowing them to discover who's talking to whom and push and enforce that policy through the C X 10-K will allow for more automated deployments of those policies and finally, compliance integration with vendors like too thin for continuous security compliance monitoring becomes extremely important as the screen depicts a lot of lot of visualization capabilities with companies like Elk which are in beta today and answerable and Splunk and Elk will all be targeted at first customer shipment. So again, telemetry visibility with the integration of the ecosystem. Uh, it becomes a very powerful combination for the customers as they look to operationalize this for day to day three and they, you know, day one, day two, day three automation. >>That's awesome. David, I'd like to let you weigh in on this whole question of operations because you're hitting all the marks here that are relevant cloud, native microservices, apps, explosion and data volume and velocity, hyper scale operational cloud operations, performance, price point security all in this one solution. This is big. Um, it's not like you mentioned earlier, it's not a rip and replace but you can roll it out how how do you see a customer best operational izing this new, >>You know, I think the answer is a little bit different for each customer but you are very careful at the beginning, we introduced this. It's an evolution of switching. It's not a revolution where we have to replace everything and I think that's really exciting is that it builds on the foundational architecture of leaf and spine. And what we're able to do is let that customer introduced these new capabilities one leaf at a time. So maybe when they're upgrading from 10 gigs to 25 gigs, it's a great time for them to introduce this capability into their data center um and then depending on their application, you know, it may be, as Sony said that they've got one particular application, a crown jewel application and so they want to build out that in one rack and provide, you know, very, very robust East west as well as north south um security around that application, but there's so many different ways that customers can deploy this technology and what's really exciting is now is we're beginning to work with our customers, learning about these new use cases and then feeding that back into our roadmap and we all >>know, as you get down lower in the network layer, security is distributed architecture. So everything is paramount like security, super relevant, great conversation, I gotta ask what's next with this technology. Yeah, >>well, you know the teams, the two engineering teams are working together and this is step one on, on a really exciting new path, I don't know, Sony, what would you say? >>I think there's a lot more to come here. This is just a starting point. We have an incredibly strong partnership and go to market partnership here with Uber team with this platform. It is just the beginning uh and it will lead our customers onto the multi cloud journey. Uh and last but not least, I would like to say that you know, in closing uh that are seldom opportunities where you look at disrupting the way things are happening while fitting into customers existing models. So this is, as I said with everything being software defined, you will continue to see as delivering at great velocity more and more software defined services, whether it's encryption, Lord balancing and other state full services over time. Making this technology easier to deploy by fitting into the existing ecosystem and continuing to provide them with the 100 extra scale, 10 X. The performance as well as the ability to do it at a third of the same, you know, at the third of the cost of what they would need to if they had to build this uh today with disparate devices, >>exciting news in the industry. You guys are the pros you've seen all the waves of innovation over the years. I guess my final final question would be, how would you summarize this point in time right now? This is pretty um exciting all this is all happening At the same time, customers are having opportunity to innovate the pandemic has shown a lot of scale and and the need for stability and security. This is a special moment. How would you guys weigh in on that? >>Yeah, I think about it every decade, there's a change in how data centers a belt. And so this is the change that's happening this decade. Moving to a distributed services, switch. The other big mega trend that I see is this move, as I said from data centers to stand as a data and the opportunity for customers to use this technology as they move out to the edge. Have distributed compute and tell us, what do you think Sony? >>I think I couldn't agree more. I think there are so many various technology transitions occurring now. The cloud being the biggest one. Uh the explosion of data and uh, you know, the customers making decisions of having a distributed model And if indeed two thirds, if not 75% of all data will be processed at the edge over the next few years. This architecture is prime for the enterprise to go leverage their best practices of today while they can gradually move that architecture is for the future, which is a multi cloud future >>centers of data, large scale cloud operations automation. The speed of innovation has never seen this before. Uh It's exciting time. Sunny, thank you for coming on. And David, thanks for chatting about this exciting new announcement. Thank you very much. >>Thank you. Thank you. >>This is the power of and hp. Ruba and Pensando partnership. I'm john forward the cube. Thanks for watching. Mhm
SUMMARY :
about the HPV Ruba partnership with Pensando and the new switching platform. port to every flow traversing their data center As we all know with microservices love the speed, love the energy there. Now the customers had to try, you know, trombone all the traffic in and out of these appliances about kind of individually over the years, it's happening now all at the same time, So having the ability to deliver that while eliminating the complex appliances So for the cooler use case, I think we're moving from a world where people talk about data centers So you guys mentioned visibility and a quick follow up on this question because you mentioned visibility can't see it, I think policy having the ability to have an intent based policy that is a But one important thing to mention here is that we are thinking about So I have to kind of ask the operational side of the question, how would an enterprise customers manage all this for the customers as they look to operationalize this for day to day three and they, David, I'd like to let you weigh in on this whole question of operations because you're hitting all the marks here that are relevant You know, I think the answer is a little bit different for each customer but you are very careful at the beginning, know, as you get down lower in the network layer, security is distributed architecture. to do it at a third of the same, you know, at the third of the cost of what they would need to of scale and and the need for stability and security. this technology as they move out to the edge. This architecture is prime for the enterprise to go leverage their best Thank you very much. Thank you. This is the power of and hp.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
David | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
David Hughes | PERSON | 0.99+ |
10 gigs | QUANTITY | 0.99+ |
100 times | QUANTITY | 0.99+ |
75% | QUANTITY | 0.99+ |
70% | QUANTITY | 0.99+ |
100% | QUANTITY | 0.99+ |
Splunk | ORGANIZATION | 0.99+ |
Sony | ORGANIZATION | 0.99+ |
two factors | QUANTITY | 0.99+ |
Aruba | ORGANIZATION | 0.99+ |
25 gigs | QUANTITY | 0.99+ |
Guardianco | ORGANIZATION | 0.99+ |
Pensando | ORGANIZATION | 0.99+ |
four | QUANTITY | 0.99+ |
Soni Jiandani | PERSON | 0.99+ |
Elk | ORGANIZATION | 0.99+ |
two elements | QUANTITY | 0.99+ |
10 times | QUANTITY | 0.99+ |
john Chambers | PERSON | 0.99+ |
Today | DATE | 0.99+ |
Sunny | PERSON | 0.99+ |
Aruba | LOCATION | 0.99+ |
third | QUANTITY | 0.99+ |
Ruba | ORGANIZATION | 0.99+ |
CX- 10,000 | COMMERCIAL_ITEM | 0.98+ |
today | DATE | 0.98+ |
CX 10,000 | COMMERCIAL_ITEM | 0.98+ |
first | QUANTITY | 0.98+ |
over 70% | QUANTITY | 0.98+ |
jean Donny | PERSON | 0.98+ |
C X 10,000 | COMMERCIAL_ITEM | 0.98+ |
each customer | QUANTITY | 0.97+ |
hp | ORGANIZATION | 0.97+ |
one rack | QUANTITY | 0.96+ |
one | QUANTITY | 0.96+ |
five different vendors | QUANTITY | 0.96+ |
two thirds | QUANTITY | 0.95+ |
john | PERSON | 0.95+ |
one solution | QUANTITY | 0.94+ |
day three | QUANTITY | 0.93+ |
day two | QUANTITY | 0.93+ |
Antonio neary | PERSON | 0.93+ |
C X 10-K | TITLE | 0.91+ |
step one | QUANTITY | 0.9+ |
one leaf | QUANTITY | 0.9+ |
pandemic | EVENT | 0.89+ |
day one | QUANTITY | 0.89+ |
about 10 years ago | DATE | 0.89+ |
100 extra scale | QUANTITY | 0.88+ |
HPV Ruba | ORGANIZATION | 0.88+ |
10 X. | QUANTITY | 0.87+ |
two engineering teams | QUANTITY | 0.86+ |
Aruba HP | ORGANIZATION | 0.86+ |
one particular application | QUANTITY | 0.84+ |
one space | QUANTITY | 0.82+ |
past few years | DATE | 0.81+ |
seven | QUANTITY | 0.81+ |
this decade | DATE | 0.81+ |
first customer | QUANTITY | 0.8+ |
zero | QUANTITY | 0.79+ |
five | QUANTITY | 0.79+ |
one place | QUANTITY | 0.76+ |
Pence Sandoz | OTHER | 0.75+ |
west | LOCATION | 0.74+ |
john free | PERSON | 0.72+ |
next few years | DATE | 0.71+ |
one important thing | QUANTITY | 0.71+ |
polo | ORGANIZATION | 0.68+ |
service | ORGANIZATION | 0.61+ |
UNLIST TILL 4/2 - Vertica @ Uber Scale
>> Sue: Hi, everybody. Thank you for joining us today, for the Virtual Vertica BDC 2020. This breakout session is entitled "Vertica @ Uber Scale" My name is Sue LeClaire, Director of Marketing at Vertica. And I'll be your host for this webinar. Joining me is Girish Baliga, Director I'm sorry, user, Uber Engineering Manager of Big Data at Uber. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slides and click Submit. There will be a Q and A session, at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer offline. Alternately, you can also Vertica forums to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. And as a reminder, you can maximize your screen by clicking the double arrow button, in the lower right corner of the slides. And yet, this virtual session is being recorded, and you'll be able to view on demand this week. We'll send you a notification as soon as it's ready. So let's get started. Girish over to you. >> Girish: Thanks a lot Sue. Good afternoon, everyone. Thanks a lot for joining this session. My name is Girish Baliga. And as Sue mentioned, I manage interactive and real time analytics teams at Uber. Vertica is one of the main platforms that we support, and Vertica powers a lot of core business use cases. In today's talk, I wanted to cover two main things. First, how Vertica is powering critical business use cases, across a variety of orgs in the company. And second, how we are able to do this at scale and with reliability, using some of the additional functionalities and systems that we have built into the Vertica ecosystem at Uber. And towards the end, I also have a little extra bonus for all of you. I will be sharing an easy way for you to take advantage of, many of the ideas and solutions that I'm going to present today, that you can apply to your own Vertica deployments in your companies. So stick around and put on your seat belts, and let's go start on the ride. At Uber, our mission is to ignite opportunity by setting the world in motion. So we are focused on solving mobility problems, and enabling people all over the world to solve their local problems, their local needs, their local issues, in a manner that's efficient, fast and reliable. As our CEO Dara has said, we want to become the mobile operating system of local cities and communities throughout the world. As of today, Uber is operational in over 10,000 cities around the world. So, across our various business lines, we have over 110 million monthly users, who use our rides, services, or eat services, and a whole bunch of other services that we provide to Uber. And just to give you a scale of our daily operations, we in the ride business, have over 20 million trips per day. And that each business is also catching up, particularly during the recent times that we've been having. And so, I hope these numbers give you a scale of the amount of data, that we process each and every day. And support our users in their analytical and business reporting needs. So who are these users at Uber? Let's take a quick look. So, Uber to describe it very briefly, is a lot like Amazon. We are largely an operation and logistics company. And employee work based reflects that. So over 70% of our employees work in teams, which come under the umbrella of Community Operations and Centers of Excellence. So these are all folks working in various cities and towns that we operate around the world, and run the Uber businesses, as somewhat local businesses responding to local needs, local market conditions, local regulation and so forth. And Vertica is one of the most important tools, that these folks use in their day to day business activities. So they use Vertica to get insights into how their businesses are going, to deeply into any issues that they want to triage , to generate reports, to plan for the future, a whole lot of use cases. The second big class of users, are in our marketplace team. So marketplace is the engineering team, that backs our ride shared business. And as part of this, running this business, a key problem that they have to solve, is how to determine what prices to set, for particular rides, so that we have a good match between supply and demand. So obviously the real time pricing decisions they're made by serving systems, with very detailed and well crafted machine learning models. However, the training data that goes into this models, the historical trends, the insights that go into building these models, a lot of these things are powered by the data that we store, and serve out of Vertica. Similarly, in each business, we have use cases spanning all the way from engineering and back-end systems, to support operations, incentives, growth, and a whole bunch of other domains. So the big class of applications that we support across a lot of these business lines, is dashboards and reporting. So we have a lot of dashboards, which are built by core data analysts teams and shared with a whole bunch of our operations and other teams. So these are dashboards and reports that run, periodically say once a week or once a day even, depending on the frequency of data that they need. And many of these are powered by the data, and the analytics support that we provide on our Vertica platform. Another big category of use cases is for growth marketing. So this is to understand historical trends, figure out what are various business lines, various customer segments, various geographical areas, doing in terms of growth, where it is necessary for us to reinvest or provide some additional incentives, or marketing support, and so forth. So the analysis that backs a lot of these decisions, is powered by queries running on Vertica. And finally, the heart and soul of Uber is data science. So data science is, how we provide best in class algorithms, pricing, and matching. And a lot of the analysis that goes into, figuring out how to build these systems, how to build the models, how to build the various coefficients and parameters that go into making real time decisions, are based on analysis that data scientists run on Vertica systems. So as you can see, Vertica usage spans a whole bunch of organizations and users, all across the different Uber teams and ecosystems. Just to give you some quick numbers, we have over 5000 weekly active, people who run queries at least once a week, to do some critical business role or problem to solve, that they have in their day to day operations. So next, let's see how Vertica fits into the Uber data ecosystem. So when users open up their apps, and request for a ride or order food delivery on each platform, the apps are talking to our serving systems. And the serving systems use online storage systems, to store the data as the trips and eat orders are getting processed in real time. So for this, we primarily use an in house built, key value storage system called Schemaless, and an open source system called Cassandra. We also have other systems like MySQL and Redis, which we use for storing various bits of data to support serving systems. So all of this operations generates a lot of data, that we then want to process and analyze, and use for our operational improvements. So, we have ingestion systems that periodically pull in data from our serving systems and land them in our data lake. So at Uber a data lake is powered by Hadoop, with files stored on HDFS clusters. So once the raw data lines on the data lake, we then have ETL jobs that process these raw datasets, and generate, modeled and customize datasets which we then use for further analysis. So once these model datasets are available, we load them into our data warehouse, which is entirely powered by Vertica. So then we have a business intelligence layer. So with internal tools, like QueryBuilder, which is a UI interface to write queries, and look at results. And it read over the front-end sites, and Dashbuilder, which is a dash, board building tool, and report management tool. So these are all various tools that we have built within Uber. And these can talk to Vertica and run SQL queries to power, whatever, dashboards and reports that they are supporting. So this is what the data ecosystem looks like at Uber. So why Vertica and what does it really do for us? So it powers insights, that we show on dashboards as folks use, and it also powers reports that we run periodically. But more importantly, we have some core, properties and core feature sets that Vertica provides, which allows us to support many of these use cases, very well and at scale. So let me take a brief tour of what these are. So as I mentioned, Vertica powers Uber's data warehouse. So what this means is that we load our core fact and dimension tables onto Vertica. The core fact tables are all the trips, all the each orders and all these other line items for various businesses from Uber, stored as partitioned tables. So think of having one partition per day, as well as dimension tables like cities, users, riders, career partners and so forth. So we have both these two kinds of datasets, which will load into Vertica. And we have full historical data, all the way since we launched these businesses to today. So that folks can do deeper longitudinal analysis, so they can look at patterns, like how the business has grown from month to month, year to year, the same month, over a year, over multiple years, and so forth. And, the really powerful thing about Vertica, is that most of these queries, you run the deep longitudinal queries, run very, very fast. And that's really why we love Vertica. Because we see query latency P90s. That is 90 percentile of all queries that we run on our platform, typically finish in under a minute. So that's very important for us because Vertica is used, primarily for interactive analytics use cases. And providing SQL query execution times under a minute, is critical for our users and business owners to get the most out of analytics and Big Data platforms. Vertica also provides a few advanced features that we use very heavily. So as you might imagine, at Uber, one of the most important set of use cases we have is around geospatial analytics. In particular, we have some critical internal dashboards, that rely very heavily on being able to restrict datasets by geographic areas, cities, source destination pairs, heat maps, and so forth. And Vertica has a rich array of functions that we use very heavily. We also have, support for custom projections in Vertica. And this really helps us, have very good performance for critical datasets. So for instance, in some of our core fact tables, we have done a lot of query and analysis to figure out, how users run their queries, what kind of columns they use, what combination of columns they use, and what joints they do for typical queries. And then we have laid out our custom projections to maximize performance on these particular dimensions. And the ability to do that through Vertica, is very valuable for us. So we've also had some very successful collaborations, with the Vertica engineering team. About a year and a half back, we had open-sourced a Python Client, that we had built in house to talk to Vertica. We were using this Python Client in our business intelligence layer that I'd shown on the previous slide. And we had open-sourced it after working closely with Eng team. And now Vertica formally supports the Python Client as an open-source project, which you can download to and integrate into your systems. Another more recent example of collaboration is the Vertica Eon mode on GCP. So as most of or at least some of you know, Vertica Eon mode is formally supported on AWS. And at Uber, we were also looking to see if we could run our data infrastructure on GCP. So Vertica team hustled on this, and provided us early preview version, which we've been testing out to see how performance, is impacted by running on the Cloud, and on GCP. And so far, I think things are going pretty well, but we should have some numbers about this very soon. So here I have a visualization of an internal dashboard, that is powered solely by data and queries running on Vertica. So this GIF has sequence have different visualizations supported by this tool. So for instance, here you see a heat map, downgrading heat map of source of traffic demand for ride shares. And then you will see a bunch of arrows here about source destination pairs and the trip lines. And then you can see how demand moves around. So, as the cycles through the various animations, you can basically see all the different kinds of insights, and query shapes that we send to Vertica, which powers this critical business dashboard for our operations teams. All right, so now how do we do all of this at scale? So, we started off with a single Vertica cluster, a few years back. So we had our data lake, the data would land into Vertica. So these are the core fact and dimension tables that I just spoke about. And then Vertica powers queries at our business intelligence layer, right? So this is a very simple, and effective architecture for most use cases. But at Uber scale, we ran into a few problems. So the first issue that we have is that, Uber is a pretty big company at this point, with a lot of users sending almost millions of queries every week. And at that scale, what we began to see was that a single cluster was not able to handle all the query traffic. So for those of you who have done an introductory course, on queueing theory, you will realize that basically, even though you could have all the query is processed through a single serving system. You will tend to see larger and larger queue wait times, as the number of queries pile up. And what this means in practice for end users, is that they are basically just seeing longer and longer query latencies. But even though the actual query execution time on Vertica itself, is probably less than a minute, their query sitting in the queue for a bunch of minutes, and that's the end user perceived latency. So this was a huge problem for us. The second problem we had was that the cluster becomes a single point of failure. Now Vertica can handle single node failures very gracefully, and it can probably also handle like two or three node failures depending on your cluster size and your application. But very soon, you will see that, when you basically have beyond a certain number of failures or nodes in maintenance, then your cluster will probably need to be restarted or you will start seeing some down times due to other issues. So another example of why you would have to have a downtime, is when you're upgrading software in your clusters. So, essentially we're a global company, and we have users all around the world, we really cannot afford to have downtime, even for one hour slot. So that turned out to be a big problem for us. And as I mentioned, we could have hardware issues. So we we might need to upgrade our machines, or we might need to replace storage or memory due to issues with the hardware in there, due to normal wear and tear, or due to abnormal issues. And so because of all of these things, having a single point of failure, having a single cluster was not really practical for us. So the next thing we did, was we set up multiple clusters, right? So we had a bunch of identities clusters, all of which have the same datasets. So then we would basically load data using ingestion pipelines from our data lake, onto each of these clusters. And then the business intelligence layer would be able to query any of these clusters. So this actually solved most of the issues that I pointed out in the previous slide. So we no longer had a single point of failure. Anytime we had to do version upgrades, we would just take off one cluster offline, upgrade the software on it. If we had node failures, we would probably just take out one cluster, if we had to, or we would just have some spare nodes, which would rotate into our production clusters and so forth. However, having multiple clusters, led to a new set of issues. So the first problem was that since we have multiple clusters, you would end up with inconsistent schema. So one of the things to understand about our platform, is that we are an infrastructure team. So we don't actually own or manage any of the data that is served on Vertica clusters. So we have dataset owners and publishers, who manage their own datasets. Now exposing multiple clusters to these dataset owners. Turns out, it's not a great idea, right? Because they are not really aware of, the importance of having consistency of schemas and datasets across different clusters. So over time, what we saw was that the schema for the same tables would basically get out of order, because they were all the updates are not consistently applied on all clusters. Or maybe they were just experimenting some new columns or some new tables in one cluster, but they forgot to delete it, whatever the case might be. We basically ended up in a situation where, we saw a lot of inconsistent schemas, even across some of our core tables in our different clusters. A second issue was, since we had ingestion pipelines that were ingesting data independently into all these clusters, these pipelines could fail independently as well. So what this meant is that if, for instance, the ingestion pipeline into cluster B failed, then the data there would be older than clusters A and C. So, when a query comes in from the BI layer, and if it happens to hit B, you would probably see different results, than you would if you went to a or C. And this was obviously not an ideal situation for our end users, because they would end up seeing slightly inconsistent, slightly different counts. But then that would lead to a bad situation for them where they would not able to fully trust the data that was, and the results and insights that were being returned by the SQL queries and Vertica systems. And then the third problem was, we had a lot of extra replication. So the 20/80 Rule, or maybe even the 90/10 Rule, applies to datasets on our clusters as well. So less than 10% of our datasets, for instance, in 90% of the queries, right? And so it doesn't really make sense for us to replicate all of our data on all the clusters. And so having this set up where we had to do that, was obviously very suboptimal for us. So then what we did, was we basically built some additional systems to solve these problems. So this brings us to our Vertica ecosystem that we have in production today. So on the ingestion side, we built a system called Vertica Data Manager, which basically manages all the ingestion into various clusters. So at this point, people who are managing datasets or dataset owners and publishers, they no longer have to be aware of individual clusters. They just set up their ingestion pipelines with an endpoint in Vertica Data Manager. And the Vertica Data Manager ensures that, all the schemas and data is consistent across all our clusters. And on the query side, we built a proxy layer. So what this ensures is that, when queries come in from the BI layer, the query was forwarded, smartly and with knowledge and data about which cluster up, which clusters are down, which clusters are available, which clusters are loaded, and so forth. So with these two layers of abstraction between our ingestion and our query, we were able to have a very consistent, almost single system view of our entire Vertica deployment. And the third bit, we had put in place, was the data manifest, which were the communication mechanism between ingestion and proxy. So the data manifest basically is a listing of, which tables are available on which clusters, which clusters are up to date, and so forth. So with this ecosystem in place, we were also able to solve the extra replication problem. So now we basically have some big clusters, where all the core tables, and all the tables, in fact, are served. So any query that hits 90%, less so tables, goes to the big clusters. And most of the queries which hit 10% heavily queried important tables, can also be served by many other small clusters, so much more efficient use of resources. So this basically is the view that we have today, of Vertica within Uber, so external to our team, folks, just have an endpoint, where they basically set up their ingestion jobs, and another endpoint where they can forward their Vertica SQL queries. And they are so to a proxy layer. So let's get a little more into details, about each of these layers. So, on the data management side, as I mentioned, we have two kinds of tables. So we have dimension tables. So these tables are updated every cycle, so the list of cities list of drivers, the list of users and so forth. So these change not so frequently, maybe once a day or so. And so we are able to, and since these datasets are not very big, we basically swap them out on every single cycle. Whereas the fact tables, so these are tables which have information about our trips or each orders and so forth. So these are partition. So we have one partition roughly per day, for the last couple of years, and then we have more of a hierarchical partitions set up for older data. So what we do is we load the partitions for the last three days on every cycle. The reason we do that, is because not all our data comes in at the same time. So we have updates for trips, going over the past two or three days, for instance, where people add ratings to their trips, or provide feedback for drivers and so forth. So we want to capture them all in the row corresponding to that particular trip. And so we upload partitions for the last few days to make sure we capture all those updates. And we also update older partitions, if for instance, records were deleted for retention purposes, or GDPR purposes, for instance, or other regulatory reasons. So we do this less frequently, but these are also updated if necessary. So there are endpoints which allow dataset owners to specify what partitions they want to update. And as I mentioned, data is typically managed using a hierarchical partitioning scheme. So in this way, we are able to make sure that, we take advantage of the data being clustered by day, so that we don't have to update all the data at once. So when we are recovering from an cluster event, like a version upgrade or software upgrade, or hardware fix or failure handling, or even when we are adding a new cluster to the system, the data manager takes care of updating the tables, and copying all the new partitions, making sure the schemas are all right. And then we update the data and schema consistency and make sure everything is up to date before we, add this cluster to our serving pool, and the proxy starts sending traffic to it. The second thing that the data manager provides is consistency. So the main thing we do here, is we do atomic updates of our tables and partitions for fact tables using a two-phase commit scheme. So what we do is we load all the new data in temp tables, in all the clusters in phase one. And then when all the clusters give us access signals, then we basically promote them to primary and set them as the main serving tables for incoming queries. We also optimize the load, using Vertica Data Copy. So what this means is earlier, in a parallel pipelines scheme, we had to ingest data individually from HDFS clusters into each of the Vertica clusters. That took a lot of HDFS bandwidth. But using this nice feature that Vertica provides called Vertica Data Copy, we just load it data into one cluster and then much more efficiently copy it, to the other clusters. So this has significantly reduced our ingestion overheads, and speed it up our load process. And as I mentioned as the second phase of the commit, all data is promoted at the same time. Finally, we make sure that all the data is up to date, by doing some checks around the number of rows and various other key signals for freshness and correctness, which we compare with the data in the data lake. So in terms of schema changes, VDM automatically applies these consistently across all the clusters. So first, what we do is we stage these changes to make sure that these are correct. So this catches errors that are trying to do, an incompatible update, like changing a column type or something like that. So we make sure that schema changes are validated. And then we apply them to all clusters atomically again for consistency. And provide a overall consistent view of our data to all our users. So on the proxy side, we have transparent support for, replicated clusters to all our users. So the way we handle that is, as I mentioned, the cluster to table mapping is maintained in the manifest database. And when we have an incoming query, the proxy is able to see which cluster has all the tables in that query, and route the query to the appropriate cluster based on the manifest information. Also the proxy is aware of the health of individual clusters. So if for some reason a cluster is down for maintenance or upgrades, the proxy is aware of this information. And it does the monitoring based on query response and execution times as well. And it uses this information to route queries to healthy clusters, and do some load balancing to ensure that we award hotspots on various clusters. So the key takeaways that I have from the stock, are primarily these. So we started off with single cluster mode on Vertica, and we ran into a bunch of issues around scaling and availability due to cluster downtime. We had then set up a bunch of replicated clusters to handle the scaling and availability issues. Then we run into issues around schema consistency, data staleness, and data replication. So we built an entire ecosystem around Vertica, with abstraction layers around data management and ingestion, and proxy. And with this setup, we were able to enforce consistency and improve storage utilization. So, hopefully this gives you all a brief idea of how we have been able to scale Vertica usage at Uber, and power some of our most business critical and important use cases. So as I mentioned at the beginning, I have a interesting and simple extra update for you. So an easy way in which you all can take advantage of many of the features that we have built into our ecosystem, is to use the Vertica Eon mode. So the Vertica Eon mode, allows you to set up multiple clusters with consistent data updates, and set them up at various different sizes to handle different query loads. And it automatically handles many of these issues that I mentioned in our ecosystem. So do check it out. We've also been, trying it out on DCP, and initial results look very, very promising. So thank you all for joining me on this talk today. I hope you guys learned something new. And hopefully you took away something that you can also apply to your systems. We have a few more time for some questions. So I'll pause for now and take any questions.
SUMMARY :
Any questions that we don't address, So the first issue that we have is that,
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Girish Baliga | PERSON | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
Girish | PERSON | 0.99+ |
10% | QUANTITY | 0.99+ |
one hour | QUANTITY | 0.99+ |
Sue LeClaire | PERSON | 0.99+ |
90% | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
Sue | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Vertica | ORGANIZATION | 0.99+ |
Dara | PERSON | 0.99+ |
first issue | QUANTITY | 0.99+ |
less than a minute | QUANTITY | 0.99+ |
MySQL | TITLE | 0.99+ |
First | QUANTITY | 0.99+ |
first problem | QUANTITY | 0.99+ |
third problem | QUANTITY | 0.99+ |
third bit | QUANTITY | 0.99+ |
less than 10% | QUANTITY | 0.99+ |
each platform | QUANTITY | 0.99+ |
second | QUANTITY | 0.99+ |
one cluster | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
second issue | QUANTITY | 0.99+ |
Python | TITLE | 0.99+ |
today | DATE | 0.99+ |
second phase | QUANTITY | 0.99+ |
two kinds | QUANTITY | 0.99+ |
over 10,000 cities | QUANTITY | 0.99+ |
over 70% | QUANTITY | 0.99+ |
each business | QUANTITY | 0.99+ |
second thing | QUANTITY | 0.98+ |
second problem | QUANTITY | 0.98+ |
Vertica | TITLE | 0.98+ |
both | QUANTITY | 0.98+ |
Vertica Data Manager | TITLE | 0.98+ |
two-phase | QUANTITY | 0.98+ |
first | QUANTITY | 0.98+ |
90 percentile | QUANTITY | 0.98+ |
once a week | QUANTITY | 0.98+ |
each | QUANTITY | 0.98+ |
single point | QUANTITY | 0.97+ |
SQL | TITLE | 0.97+ |
once a day | QUANTITY | 0.97+ |
Redis | TITLE | 0.97+ |
one partition | QUANTITY | 0.97+ |
under a minute | QUANTITY | 0.97+ |
@ Uber Scale | ORGANIZATION | 0.96+ |
Aparna Sinha, Google Cloud | KubeCon 2018
>> From Seattle, Washington, it's theCUBE. Covering KubeCon and CloudNativeCon North America 2018. Brought to you by Red Hat. The Cloud Native Computing Foundation and it's ecosystem partners. [techno Music] >> Okay, welcome back everyone. It's theCUBE's live coverage in Seattle for KubeCon and CloudNativeCon 2018. I'm John Furrier with theCUBE. Stu Miniman. Breaking down all the action. Talking to all the thought leaders, all the experts, all the people making it happen. We're here with Aparna Sinha who's the group product manager, Kubernetes, Google Cloud. Also one of the power women of the Cloud at Google, according the Forbes. I wrote the story. Great to see you again. >> Thank you, great to be here with you. >> Thanks for coming on. >> CUBE alumni. Great to have you on. I want to get your prospective. One when you've seen a lot of action, certainly overseeing the group engineering team at Google and all the Kubernetes action. A lot of contribution, a lot of activity, that you guys are leading. >> Yes. >> And quite frankly enabling and contributing to the community. So, congratulations and thanks for that work. Kubernetes certainly looking good. People are pumped up. >> Very much. >> 8,000 people. A lot of activity. A lot of new things around that you guys are always kind of bringing into, the Geo, knative, a lot things. You gave a key note. What's your focus here this year? What's the message from Google? >> Yeah, well as you pointed out, this is the largest KubeCon ever. 8,000 people, 2,000 on the wait list. And people are telling me here that this is the... This is here to stay, right? It's in the early majority going to the mainstream very much like you kind of think about virtualization was 10 years ago. So that's the momentum that I'm seeing here, that I'm hearing here. My keynote was about the community. Thanking the community first of all. So I talked about how open-source really, success in contingent on contribution. And so, I started by showing the contribution over the last one year, the companies that are contributing. And 80% of contributions are by at least 10 entities. One of them is individual contributors. 40% percent I think was Google, which is still staggeringly high. And then the next highest was Red Hat. And so I think in many of the keynotes, we've been calling out the contributors because it's really important. 1.13, the 13th release of Kubernetes shipped last week. A lot of stability, a lot of GA features, and the uptake in the enterprise. The other thing I called out was just the amount of job opportunity in Kubernetes >> Yeah >> 230% growth in the last year. You see here so many customers that are here to talk about their experience. But also they're here to hire. >> Yeah. And there recruiters on the floor, so it's been I think a huge economic value add. And we feel very proud of that. >> Yeah, Aparna, great point. We've been talking about the end users. I always loved... There's a job board right outside the hall here and it's just covered. Big giant white board there. Bring us inside a little bit. I mean Google's always fascinating people. What's the hiring situation there? What's your team lookin' like? Is anybody smart enough to actually go work there? >> Google, I think we've been very, very fortunate in that we've had the original board team that started the Kubernetes project. And so we have a really, really deep bench because we've been running containers since the beginning. So now 15 years of experience with that, which many people tell me, I think that the reason that Kubernetes is so successful is because it's not new actually, right? >> Yeah >> It's been tried and true at scale. So, we have quite a bit of that, but we've been building this community and a lot of folks have been hired in through the community-- >> Yeah >> into Google. And really amazing, amazing people. So yeah. >> The thing about we had Brian Grant on yesterday and Tim Hockin -- Yes. >> Who was talking about some of those early board days. >> Yes. I want to ask you your point of about the hiring because I think this is a interesting dynamic. Open-source is key to your strategy. We've talked many times about how you guys are committed to open source, but what's interesting is not just net new jobs are available, we're seeing a revitalization around traditional roles like the network engineer under Kubernetes. Looking at the policy knobs that your folks pointed out that's... They think it's underutilized. And then on top of Kubernetes, new things are going on that's getting the app kind of server guy-- >> Yeah. >> Kind of energized. >> Yeah. >> It's kind of enabling a lot of thing, actions that's transforming existing jobs. >> That's right. >> And bringing new ones. >> Talk about that dynamic because you see it from both sides. >> Yes >> You've got SREs, site reliable engineers. >> Yes >> You've got developers. But, Now enterprises are now trying to adopt... >> That's right >> You guys are hitting that note. Talk about that dynamic. >> That's right, so I've been talking to a lot of customers here, it's been non-stop. I've not been able to attend any talks or keynotes. And I'm seeing two things. One there's the kind of operations now called platform teams. And they're under tremendous pressure. They're doing incredible work. Incredible. And they're energized. They're really... So one of the customers I was talking to was moving from VMs on EC2 to containers on GCE on Kubernetes. Google Cloud. And in the last one year, they looked... Honestly, they looked miserable because they have worked so hard in doing that transfomation. Turning their application from a VM-based application into containers. But you could also see that they were so happy and so successful because of the impact that it's had. And so and then I asked them so like, "What is driving that?" This is different customer. What is driving that? And it's really... As soon they get that environment up and running, and this is a large enterprise bank that I was talking to, this other one, their developers are just all over it. And they have, they have hundreds of services running within six months. And they're like, "Well we just got this platform up. "We still have to figure how we're going to upgrade it." But it's... So those are the two constituents. The developers are happy. >> The integration and delivery changes the makeup of how teams work. So that's one thing we're seeing here. And the other one is just scale. >> Yeah. >> So that seems to be the area. Now I got to ask you, as you guys look at... As you guys are doing the work on the enterprise side, you guys, I know you're working hard, I talk to Jennifer a lot, Jennifer Lynn, as well and we've talked before, are used to doing the work. But there's still a lot more work done. Where do you guys see the work that this community value opportunities for participants in the eco-system to fill white spaces? Where are the value lines starting to be drawn? Can you comment? >> Yeah, so I see two or three different areas. One of the areas is of course hardening. And that's why Janet Quill gave the keynote about "Kubernetes is boring and that's a good thing". And that's been something we've been working on for the last year at least. Adding a lot more security capabilities. Adding a lot more just moving everything to GA, right? Adding a lot more hooks in the enterprise storage and into enterprise networking. Building up the training and building up the partners that'll do the implementations. All of those things I think are very, very healthy. >> Yeah. >> Cause I see them. You probably talked to the CNCF. They're helping a lot with the certification and the training. So that's one piece of enterprise adoption. I think the other piece is the developer experience. And that's where a lot of the talks here, my key note as well, I demoed Istio and Knative on top of GKE. The developer experience is ultimately this whole thing. My perspective, this whole thing is about making your developers more productive. And developers have been driving this transition. Again going back to those customer examples. So that's getting a lot easier. >> Yeah, Aparna, I'd love you to talk a little about Knative. So, I know the excitement is there. Products only been around for five months. I remember at your show last summer it was announce and roll. Trying to understand exactly what it is. It's like, wait, wait is serverless going to kill Kubernetes? And how does this fit? How does this work with all the various services in the Cloud? Maybe just understand where we are. >> Right. >> What it is, what it isn't. >> Right. >> Again, so the heritage of serverless, I'm going to go back to Google, right? We have the first serverless offering in the world like 10 years ago. And so that's based on containers. Underneath it's based on containers. That's why we knew that with Kubernetes that's the right foundation for building serverless. And it actually, I think, we sort of held back for the longest time. And a couple of years ago there were one, two, and then 15, and then 17 serverless frameworks that just kind of all popped up around Kubernetes, on top of Kubernetes. I remember the first demo in the community. Here's this serverless piece. And at some point, a little bit over a year ago we decided that actually serverless is really important to our customers, to our users. The majority of Kubernetes tends to be on-prem, actually. And so it's important to them to have serverless capabilities on-prem. So then we need to make sure it's stable and it's something that's standard. >> I think it's a really important point... I talked to some people that are in the serverless ecosystem that is living on a AWS and they say, "You can't build serverless on-prem "because then you're racking "and stacking and dealing with it." And it's not... We know there's servers underneath of it and it's just system calls and how we consume that. But maybe explain the nuances to how this is important and we understand it. >> Yeah. >> There's not like a solution out there. >> Yeah. >> Server meshes, there's a lot of options out there right now. >> Yeah. >> So. >> A lot of things, because this is an open-source community, a lot of things come from the users. So when the user says, "You know what, actually need "the serverless capability on-prem. "Why? "Because I've got this developer group and I don't want "them to have to muck with the infrastructure. "I don't want them to have access to the infrastructure. "I want to just give them a simple interface "where they're going to write their applications "and the rest is taken care of for them." Right? And then I want to be able to bill them on a per-use basis. So, it's... Yeah there's someone managing the server. Someone building actually the severless capability and that's the platform team. That's the guys that I talked about that are working very hard these days happily. But, working very hard. >> And these are the new personas, by the way-- >> Yeah. >> In the enterprise. This is new kind of new re-architecting of how enterprises are creating value. These new platform teams. >> Right. >> This is the opportunity. Well I got to ask you, you know everyone that watches theCUBE knows I'm a big fan of scale. Love Amazon scale. I love Google scale. I love the enterprise market. And I want to get your thoughts... I want you to take a minute to explain the culture at Google Cloud. Because it's a separate building. Give you an opportunity to share. But you guys are working hard to go after the enterprise. It's not like a new thing. But the enterprise is interesting. It's not so much the best technology that wins. It's grit. It's almost like a street fight. You got to go out. You got to win those battles. Get all the work done. Hit those features. You can't just roll into town and say we've got great technology. We're Google. You guys recognize this. And I want you to share the culture you guys are building and how you guys are attacking the enterprise. What's the guiding principles? What are some of the core tenants? >> Yeah, yeah. So you know my entire life has been spent in enterprise software. >> Yeah. >> I do think that enterprises respect Google Cloud. I work very closely with them. And they respect certainly the engineering prowess. Like, "Wow. I need that." >> Yeah. Right? Especially you see all these enterprises that are being transformed by technology. Their industry is being transformed by technology. Whether that's in transportation, or it's in retail, or it's in media. And they want the best. They want the latest. Right? And they also don't necessarily have the skills, like you said, right? So they're looking for a partner that'll both help them scale up but also provide them all of that guidance. And the one thing you asked about culture at Google. I think we are a revolutionary company. We are willing to do lots of things. Lots of things that you wouldn't expect. And that's why you saw GK on-prem from my team, right? The first, kind of, Kubernetes on-prem offering from a cloud provider. Managed by a cloud provider. And that's really... I mean we've seen tremendous, tremendous interest in that. Tremendous feedback from our users and new customers. People that hadn't thought about it. Hadn't thought about Google, necessarily before that have said, "Wow. If you are going to come and help me on-prem "with this, I'm ready. "Give it to me now. "Because I trust you and I know I want to go to the Cloud. "So it's the right step for me. "You have the right incentives." Right? "And you're the open cloud, which is important to me "because I may want to be multi cloud." So that's the piece that is... >> You got the enterprise chops. You've spent your whole career there. I know Jennifer as well. >> Yes. >> A lot of people you guys have hired. >> Right. >> The good news is you've got a market that's changing. So you don't have to come in and replicate the old IT. So that's an opportunity at Google. How are you guys attacking that, that beachhead? Because you have the check. What's the vibe? What's the grit? What's it like... How you guys attacking the enterprise? What do you see as opportunities knowing the enterprise of old-- >> Yeah >> As it shifts to new kind of method? >> Yeah. >> What's the core? >> I think about the problems the users are having. I think about what is the problem the customer is facing. And so... And then breaking that down and solving that for them. I mean that's what's important, right? And so some of the problems I see is one they need a developer platform. And the developer platform sometimes cannot be in the Cloud. When I talk to large financial institutions, there's so much compliance and regulation and things that have to be on-prem. That it has to be on-prem. And they try to move to the Cloud and some things will do it. But the majority, like 90% is on-prem. And so they need an agile development environment and there's no holding it back. Because, like I said, there's all this transformation. Their developers need that environment today. So you have to provide that. That's one use case. We provide an on-prem development and agile development environment. Best in class. Your developers are super happy. Your business is going to do well. The other thing I see, and I see this a lot in retail, but also in hospitality at some of these very kind of brick and mortar enterprises is the edge. They need a solution at their edge location. Thousands, these are thousands of branch locations. We've even got this use case with Chick-fil-A, right? And a lot of times this is... A lot of different use cases, but a lot of time the common thing is that they're collecting data. They're doing some processing at that site and then they're doing further processing in the Cloud. And so it's a connected, but an intimately, it's not always connected.... Intimately connected environment. So that's the second big use case. Edge retail or just edge. There's so many... For me, it's one of the most exciting. There's so many examples of that. >> Awesome. >> Aparna, first of all, just so many goodness I want to say thank you to Google because everything from I heard at the show Google wasn't giving out swag because it actually went to charitable givings instead of spending that money. One of the things we always look is open-source is, how much more value is being created for the eco-system not just the vendor that started it. And it is a really tough balance. We've seen it fail many times. Do you step too far back? And how much do you engage? How do you strike that bound? For the last five to 10 years, we've been saying, "Where is the independent place where we can have that "conversation about cloud?" We think found it at this show. I mean we've been here for three years now. Google Cloud, phenomenal event. Our teams loves to be there, but this feels like overnight has turned into oh wait, here's the show we were looking at to have that conversation. To have that commons where we can come together and there's so many diversity of people, diversity of projects in here. Many which have very disconnected from original Kubernetes and everything, so. It's been fascinating to watch and have to imagine your team is... When you watch that first piece go and everything that's built around it. It's got to be amazing. >> My team loves this event. We have literally I think 300 people here. And a lot of them are core maintainers. Everybody is a contributor, but they are core maintainers of the Kubernetes project. The Istio project. The Knative project. And I think the best thing here is just interacting with our users. Because this is a developer, this is a developer conference, primarily. There's a lot of businesses here. >> Yeah >> With their kind of director level executives. But primarily it's an action-oriented hands-on audience. And you just... These customer meetings that I have, we review their architecture and we're like... It's an engineer to engineer conversation. >> Yep. >> And so how can we make that better? And sometimes they're contributing back and it makes the whole project better. >> Yeah. The thing, too, is it's an engineering, it's a developer conference, true. But what's interesting about that evolution as it modernizes, those end users are developers. >> That's right. >> And so the end user aspect of this show. >> That's right. >> Is the developer piece. >> That's right. >> It never used to be like that. Used to be COMDEX or some big event. >> Yeah. >> And then people just selling their stuff. >> Yeah. >> Doing business. The end user participation... >> Yes. >> Is not a consumption conversation, it's a contribution. >> Right. And end users are all over the spectrum of sort of really, really hands-on. Very, very smart to just give me something that works and I respect all of that, right? And we were actually very far here in terms of GKE. Giving you something that you really don't need to get in, that's fully managed, right? But then on the other hand we had Uber on stage earlier today in their keynote talking about how they've built all of this advanced capability on GKE. And that's a power user. That's using all their capabilities. Like custom additions and an operator. And it's just really gratifying I think for us to work with them and for us to see the user base as well as the community. So the ecosystem. Google. I thinks it's very important for us to have and create economic opportunity for our partners. And you'll see that with GKE on-prem. We're partnering heavily on that one. And you'll see that also in our marketplace. Our Kubernetes marketplace. So many of the companies that have come out of this ecosystem are now part of selling through Google Cloud. >> Aparna, thank you for your time. I know you've had to move some things around to come here. Great to have you on. I love your leadership at Google, it's phenominal. You've got the enterprise chops building out heavily over there. Congratulations. And for more CUBE interviews check out theCUBE dot net. You can check out Aparna's other good news. Of course search her name on Forbes. I wrote a story about her featuring her. Talking about her background and her passion. Always great to have her on theCUBE and get some commentary from Google. Of course, theCUBE is breaking down live coverage. Been there from the beginning of KubeCon and now CloudNativeCon, the Linux Foundation. Bringing you all the analysis and insight. Be back with more coverage after this short break. [Techno Music]
SUMMARY :
Brought to you by Red Hat. Great to see you again. and all the Kubernetes action. and contributing to the community. A lot of new things around that you guys are always kind of And so, I started by showing the contribution You see here so many customers that are here to And there recruiters on the floor, so it's been I think a There's a job board right outside the hall here that started the Kubernetes project. and a lot of folks have been hired in And really amazing, amazing people. and Tim Hockin -- Yes. that's getting the app kind of server guy-- It's kind of enabling a lot of thing, because you see it from both sides. You've got developers. You guys are hitting that note. And in the last one year, they looked... And the other one is just scale. So that seems to be the area. One of the areas is of course hardening. and the training. So, I know the excitement is there. And so it's important to them to have But maybe explain the nuances to how this is important Server meshes, there's a lot of options and that's the platform team. In the enterprise. And I want you to share the culture you guys are building So you know my entire life has been spent And they respect certainly the engineering prowess. And the one thing you asked about culture at Google. You got the enterprise chops. and replicate the old IT. And so some of the problems I see is For the last five to 10 years, we've been saying, And a lot of them are core maintainers. And you just... and it makes the whole project better. as it modernizes, those end users are developers. Used to be COMDEX or some big event. The end user participation... So many of the companies that have come and now CloudNativeCon, the Linux Foundation.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jennifer | PERSON | 0.99+ |
Tim Hockin | PERSON | 0.99+ |
Jennifer Lynn | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
John Furrier | PERSON | 0.99+ |
Aparna Sinha | PERSON | 0.99+ |
Janet Quill | PERSON | 0.99+ |
Aparna | PERSON | 0.99+ |
ORGANIZATION | 0.99+ | |
One | QUANTITY | 0.99+ |
Seattle | LOCATION | 0.99+ |
15 years | QUANTITY | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
three years | QUANTITY | 0.99+ |
Red Hat | ORGANIZATION | 0.99+ |
80% | QUANTITY | 0.99+ |
90% | QUANTITY | 0.99+ |
last week | DATE | 0.99+ |
hundreds | QUANTITY | 0.99+ |
one | QUANTITY | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
last year | DATE | 0.99+ |
five months | QUANTITY | 0.99+ |
2,000 | QUANTITY | 0.99+ |
Brian Grant | PERSON | 0.99+ |
yesterday | DATE | 0.99+ |
Cloud Native Computing Foundation | ORGANIZATION | 0.99+ |
300 people | QUANTITY | 0.99+ |
8,000 people | QUANTITY | 0.99+ |
17 | QUANTITY | 0.99+ |
both sides | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
first | QUANTITY | 0.99+ |
Thousands | QUANTITY | 0.99+ |
10 years ago | DATE | 0.99+ |
KubeCon | EVENT | 0.99+ |
15 | QUANTITY | 0.99+ |
first piece | QUANTITY | 0.99+ |
GA | LOCATION | 0.99+ |
first demo | QUANTITY | 0.99+ |
two things | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
Seattle, Washington | LOCATION | 0.98+ |
AWS | ORGANIZATION | 0.98+ |
13th release | QUANTITY | 0.98+ |
Knative | ORGANIZATION | 0.98+ |
last summer | DATE | 0.98+ |
two constituents | QUANTITY | 0.98+ |
CUBE | ORGANIZATION | 0.97+ |
Kubernetes | ORGANIZATION | 0.97+ |
Google Cloud | ORGANIZATION | 0.97+ |
CloudNativeCon 2018 | EVENT | 0.97+ |
Kubernetes | TITLE | 0.97+ |
Istio | ORGANIZATION | 0.96+ |
Linux Foundation | ORGANIZATION | 0.96+ |
Chick-fil-A | ORGANIZATION | 0.96+ |
CloudNativeCon North America 2018 | EVENT | 0.96+ |
this year | DATE | 0.96+ |
theCUBE | ORGANIZATION | 0.96+ |
10 years | QUANTITY | 0.95+ |
six months | QUANTITY | 0.95+ |
40% percent | QUANTITY | 0.95+ |
EC2 | TITLE | 0.93+ |
Matt Klein, Lyft | KubeCon 2017
>> Narrator: Live from Austin Texas. It's theCUBE, covering KubeKon and CloudNativeCon 2017. Brought to you by Red Hat, the Linux Foundation, and theCUBE's ecosystem partners. >> Welcome back everyone, live here in Austin Texas, theCUBE's exclusive coverage of CloudNativeConference and KubeKon, for Kubernetes' Conference. I'm John Furrier, co-founder of SiliconANGLE and my co-host Stu Miniman, our analyst. And next is Matt Klein, a software engineer at Lyft, ride-hailing service, car sharing, social network, great company, everyone knows that everyone loves Lyft. Thanks for coming on. >> Thanks very much for having me. >> All right so you're a customer of all this technology. You guys built, and I think this is like the shiny use cases of our generation, entrepreneurs and techies build their own stuff because they can't get product from the general market. You guys had a large-scale demand for the service, you had to go out and build your own with open source and all those tools, you had a problem you had to solve, you build it, used some open source and then give it back to open source and be part of the community, and everybody wins, you donated it back. This is, this is the future, this is what it's going to be like, great community work. What problem were you solving? Obviously Lyft, everyone knows it's hard, they see their car, lot of real time going on, lot of stuff happening >> Matt: Yeah, sure. >> magic's happening behind the scenes, you had to build that. Talk about the problem you solved. >> Well, I think, you know, when people look at Lyft, like you were saying, they look at the app and the car, and I think many people think that it's a relative simple thing. Like how hard could it be to bring up your app and say, I want a ride, and you know, get that car from here to there, but it turns out that it's really complicated. There's a lot of real-time systems involved in actually finding what are all the cars that are near you, and what's the fastest route, all of that stuff. So, I think what people don't realize is that Lyft is a very large, real-time system that, at current scale, operates at millions of requests per second, and has a lot of different use cases around databases, and caching, you know, all those technologies. So, Lyft was built on open source, as you say, and, you know Lyft grew from what I think most companies do, which is a very simple, monolithic stack, you know, it starts with a PHP application, we're a big user of MongoDB, and some load balancer, and then, you know-- >> John: That breaks (laughs) >> Well, well no but but people do that because that's what's very quick to do. And I think what happened, like most companies, is, or that most companies that become very successful, is Lyft grew a lot, and like the few companies that can become very successful, they start to outgrow some of that basic software, or the basic pieces that they're actually using. So, as Lyft started to grow a lot, things just didn't actually start working, so then we had to start fixing and building different things. >> Yeah, Matt, scale is one of those things that gets talked about a lot. But, I mean Lyft, you know, really does operate at a significant scale. >> Matt: Yeah, sure. >> Maybe you can talk a little bit about, you know, what kind of things were breaking, >> Matt: Absolutely, yeah, and then what led to Envoy and why that happened. >> Yeah, sure. I mean, I think there's two different types of scale, and I think this is something that people don't talk about enough. There's scale in terms of things that people talk about, in terms of data throughput or requests per second, or stuff like that. But there's also people scale, right. So, as organizations grow, we go from 10 developers to 50 developers to 100, where Lyft is now many hundreds of developers and we're continuing to grow, and what I think people don't talk about enough is the human scale, so you know, so we have a lot of people that are trying to edit code, and at a certain size, that number of people, you can't all be editing on that same code base. So that's I think the biggest move where people start moving towards this microservice or service-oriented architecture, so you start splitting that apart to get people-scale. People-scale probably usually comes with requests per second scale and data scale and that kind of stuff. But these problems come hand in hand, where as you grow the number of people, you start going into microservices, and then suddenly you have actual scale problems. The database is not working, or the network is not actually reliable. So from Envoy perspective, so Envoy is an open source proxy we built at Lyft, it's now part of CNCF, it's having tremendous uptake across the industry, which is fantastic, and the reason that we built Envoy is what we're seeing now in the industry is people are moving towards polyglot architectures, so they're moving towards architectures with many different applications, or many different languages. And it used to be that you could use Java and you could have one particular library that would do all of your networking and service discovery and load balancing, and now you might have six different languages. So how as an organization do you actually deal with that? And what we decided to do was build an out-of-process proxy, which allows people to build a lot of functionality into one place, around load balancing, and service discovery, and rate limiting, and buffering, and all those kinds of things, and also most importantly, observability. So things like tracing and stats and logging. And that allowed us to actually understand what was going on in the network, so that when problems were happening, we could actually debug what was going on. And what we saw at Lyft, about three years ago, is we had started our microservices journey, but it was actually almost, it was almost stopped, because what people found is they had started to build services because supposedly it was faster than the monolith, but then we would start having problems with tail latency and other things, and they didn't know hot to debug it. So they didn't trust those services, and then at that point they say, not surprisingly, we're just going to go back and we're going to build it back into the monolith. So, we're almost in that situation where things are kind of in that split. >> So Matt I have to think that's the natural, where you led to service mesh, and Istio specifically and Lyft, Google, IBM all working on that. Talk a little bit about, more about what Istio, it was really the buzz coming in with service mesh, there's also there's some competing offerings out there, Conduit, new one announced this week, maybe give us the landscape, kind of where we are, and what you're seeing. >> So I think service mesh is, it's incredible to look around this conference, I think there's 15 or more talks on service mesh between all of the Buoyant talks on Linker D and Conduit and Istio and Envoy, it's super fantastic. I think the reason that service mesh is so compelling to people is that we have these problems where people want to build in five or six languages, they have some common problems around load balancing and other types of things, and this is a great solution for offloading some of those problems into a common place. So, the confusion that I see right now around the industry is service mesh is really split into two pieces. It's split into the data plane, so the proxy, and the control plane. So the proxy's the thing that actually moves the bytes, moves the requests, and the control plane is the thing that actually tells all the proxies what to do, tells it the topology, tells it all the configurations, all the settings. So the landscape right now is essentially that Envoy is a proxy, it's a data plane. Envoy has been built into a bunch of control planes, so Istio is a control plane, it's reference proxy is Envoy, though other companies have shown that they can integrate with Istio. Linker D has shown that, NGINX has shown that. Buoyant just came out with a new combined control plane data plane service mesh called Conduit, that was brand new a couple days ago, and I think we're going to see other companies get in there, because this is a very popular paradigm, so having the competition is good. I think it's going to push everyone to be better. >> How do companies make sense of this, I mean, if I'm just a boring enterprise with complexity, legacy, you know I have a lot of stuff, maybe not the kind of scale in terms of transactions per second, because they're not Lyft, but they still have a lot of stuff. They got servers, they got data center, they got stuff in the cloud, they're trying to put this cloud native package in because the developer movement is clearly pushing the legacy guy, old guard, into cloud. So how does your stuff translate into the mainstream, how would you categorize it? >> Well, what I counsel people is, and I think that's actually a problem that we have within the industry, is that I think sometimes we push people towards complexity that they don't necessarily need yet. And I'm not saying that all of these cloud native technologies aren't great, right, I mean people here are doing fantastic things. >> You know how to drive a car, so to speak, you don't know how to use the tech. >> Right, and I advise companies and organizations to use the technology and the complexity that they need. So I think that service mesh and microservices and tracing and a lot of the stuff that's being talked about at this conference are very important if you have the scale to have a service-oriented microservice architecture. And, you know, some enterprises they're segmented enough where they may not actually need a full microservice real-time architecture. So I think that the thing to actually decide is, number one, do you need a microservice architecture, and it's okay if you don't, that's just fine, take the complexity that you need. If you do need a microservice architecture, then I think you're going to have a set of common problems around things like networking, and databases, and those types of things, and then yes, you are probably going to need to build in more complicated technologies to actually deal with that. But the key takeaway is that as you bring on, as you bring on more complexity, the complexity is a snowballing effect. More complexity yields more complexity. >> So Matt, might be a little bit out of bounds for what we're talking about, but when I think about autonomous vehicles, that's just going to put even more strain on the kind of the distributed natured systems, you know, things that have to have the edge, you know. Are we laying the groundwork at a conference like this? How's Lyft looking at this? >> For sure, and I mean, we're obviously starting to look into autonomous a lot, obviously Uber's doing that a fair amount, and if you actually start looking at the sheer amount of data that is generated by these cars when they're actually moving around, it's terabytes and terabytes of data, you start thinking through the complexity of ingesting that data from the cars into a cloud and actually analyzing it and doing things with it either offline or in real-time, it's pretty incredible. So, yes, I think that these are just more massive scale real-time systems that require more data, more hard drives, more networks, and as you manage more things with more people, it becomes more complicated for sure. >> What are you doing inside Lyft, your job. I mean obviously, you're involved in open source. Like, what are you coding specifically these days, what's the current assignment? >> Yeah, so I'm a software engineer at Lyft, I lead our networking team. Our networking team owns obviously all the stuff that we do with Envoy, we own our edge system, so basically how internet traffic comes into Lyft, all of our service discovery systems, rate limiting, auth between services. We're increasingly owning our GRPC communications, so how people define their APIs, moving from a more polling-based API to a more push-based API. So our team essentially owns the end-to-end pipe from all of our back-end services to the client, so that's APIs, analytics, stats, logging, >> So to the app >> Yeah, right, right, to the app, so, on the phone. So that's my job. I also help a lot with general kind of infrastructure architecture, so we're increasingly moving towards Kubernetes, so that's a big thing that we're doing at Lyft. Like many companies of Lyft's kind of age range, we started on VMs and AWS and we used SaltStack and you know, it's the standard story from companies that were probably six or eight years old. >> Classic dev ops. >> Right, and >> Gen One devops. >> And now we're trying to move into the, as you say, Gen Two world, which is pretty fantastic. So this is becoming, probably, the most applicable conference for us, because we're obviously doing a lot with service mesh, and we're leading the way with Envoy. But as we integrate with technologies like Istio and increasingly use Kubernetes, and all of the different related technologies, we are trying to kind of get rid of all of our bespoke stuff that many companies like Lyft had, and we're trying to get on that general train. >> I mean you guys, I mean this is going to be written in the history books, you look at this time in a generation, I mean this is going to define open source for a long, long time, because, I say Gen one kind of sounds pejorative but it's not. It's really, you need to build your own, you couldn't just buy Oracle database, because, you probably have some maybe Oracle in there, but like, you build your own. Facebook did it, you guys are doing it. Why, because you're badass, you had to. Otherwise you don't build customers. >> Right and I absolutely agree about that. I think we are in a very unique time right now, and I actually think that if you look out 10 years, and you look at some of the services that are coming online, and like Amazon just did Fargate, that whole container scheduling system, and Azure has one, and I think Google has one, but the idea there is that in 10 years' time, people are really going to be writing business logic, they're going to insert that business logic >> They may do a powerpoint slides. >> That would be nice. >> I mean it's easy to me, like powerpoint, it's so easy, that's, I'm not going to say that's coding, but that's the way it should be. >> I absolutely agree, and we'll keep moving towards that, but the way that's going to happen is, more and more plumbing if you will, will get built into these clouds, so that people don't have to worry about all this stuff. But we're in this intermediate time, where people are building these massive scale systems, and the pieces that they need is not necessarily there. >> I've been saying in theCUBE now for multiple events, all through this last year, kind of crystallized and we were talking about with Kelsey about this, Hightower, yesterday, craft is coming back to programming. So you've got software engineering, and you've got craftsmanship. And so, there's real software engineering being done, it's engineering. Application development is going to go back to the old school of real craft. I mean, Agile, all it did was create a treadmill of de-risking rapid build scale, by listening to data and constantly iterating, but it kind of took the craft out of it. >> I agree. >> But that turned into engineering. Now you have developers working on say business logic or just solving, building a healthcare app. That's just awesome software. Do you agree with this craft? >> I absolutely agree, and actually what we say about Envoy, so kind of the catchword buzz phrase of Envoy is to make the network transparent to applications. And I think most of what's happening in infrastructure right now is to get back to a time where application developers can focus on business logic, and not have to worry about how some of this plumbing actually works. And what you see around the industry right now, is it is just too painful for people to operate some of these large systems. And I think we're heading in the right direction, all of the trends are there, but it's going to take a lot more time to actually make that happen. >> I remember when I was graduating college in the 80s, sound old but, not to date myself, but the jobs were for software engineering. I mean that is what they called it, and now we're back to this devops brought it, cloud, the systems kind of engineering, really at a large scale, because you got to think about these things. >> Yeah, and I think what's also kind of interesting is that companies have moved toward this devops culture, or expecting developers to operate their systems, to be on call for them and I think that's fantastic, but what we're not doing as an industry is we're not actually teaching and helping people how to do this. So like we have this expectation that people know how to be on-call and know how to make dashboards, and know how to do all this work, but they don't learn it in school, and actually we come into organizations where we may not help them learn these skills. >> Every company has different cultures, that complicates things. >> So I think we're also, as an industry, we are figuring out how to train people and how to help them actually do this in a way that makes sense. >> Well, fascinating conversation Matt. Congratulations on all your success. Obviously a big fan of Lyft, one of the board members gave a keynote, she's from Palo Alto, from Floodgate. Great investors, great fans of the company. Congratulations, great success story, and again open source, this is the new playbook, community scale contribution, innovation. TheCUBE's doing it's share here live in Austin, Texas, for KubeKon, for Kubernetes conference and CloudNativeCon. I'm John Furrrier, for Stu Miniman, we'll be back with more after this short break. (futuristic music)
SUMMARY :
Brought to you by Red Hat, the Linux Foundation, and KubeKon, for Kubernetes' Conference. and all those tools, you had a problem you had to solve, Talk about the problem you solved. and caching, you know, all those technologies. some of that basic software, or the basic pieces But, I mean Lyft, you know, really does operate and why that happened. is the human scale, so you know, so we have a lot of people where you led to service mesh, and Istio specifically that actually tells all the proxies what to do, you know I have a lot of stuff, maybe not the kind of scale is that I think sometimes we push people towards you don't know how to use the tech. But the key takeaway is that as you bring on, on the kind of the distributed natured systems, you know, amount, and if you actually start looking at the sheer Like, what are you coding specifically these days, from all of our back-end services to the client, and you know, it's the standard story from companies And now we're trying to move into the, as you say, in the history books, you look at this time and I actually think that if you look out 10 years, They may do a powerpoint I mean it's easy to me, like powerpoint, it's so easy, and the pieces that they need is not necessarily there. Application development is going to go back Now you have developers working on say business logic And what you see around the industry right now, I mean that is what they called it, and now we're back and know how to do all this work, but they don't learn it that complicates things. and how to help them actually do this in a way Obviously a big fan of Lyft, one of the board members
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Matt Klein | PERSON | 0.99+ |
five | QUANTITY | 0.99+ |
Stu Miniman | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Uber | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
John Furrrier | PERSON | 0.99+ |
six | QUANTITY | 0.99+ |
ORGANIZATION | 0.99+ | |
Red Hat | ORGANIZATION | 0.99+ |
Matt | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Lyft | ORGANIZATION | 0.99+ |
ORGANIZATION | 0.99+ | |
10 developers | QUANTITY | 0.99+ |
Linux Foundation | ORGANIZATION | 0.99+ |
two pieces | QUANTITY | 0.99+ |
Amazon | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.99+ |
six languages | QUANTITY | 0.99+ |
50 developers | QUANTITY | 0.99+ |
Palo Alto | LOCATION | 0.99+ |
theCUBE | ORGANIZATION | 0.99+ |
Austin Texas | LOCATION | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
10 years | QUANTITY | 0.99+ |
eight years | QUANTITY | 0.99+ |
Java | TITLE | 0.99+ |
AWS | ORGANIZATION | 0.99+ |
10 years' | QUANTITY | 0.99+ |
Conduit | ORGANIZATION | 0.99+ |
100 | QUANTITY | 0.99+ |
CloudNativeConference | EVENT | 0.99+ |
hundreds | QUANTITY | 0.99+ |
SiliconANGLE | ORGANIZATION | 0.99+ |
last year | DATE | 0.98+ |
Austin, Texas | LOCATION | 0.98+ |
Envoy | ORGANIZATION | 0.98+ |
this week | DATE | 0.98+ |
KubeCon | EVENT | 0.98+ |
CloudNativeCon | EVENT | 0.98+ |
Linker D | ORGANIZATION | 0.98+ |
yesterday | DATE | 0.98+ |
Kelsey | PERSON | 0.98+ |
KubeKon | EVENT | 0.98+ |
Istio | ORGANIZATION | 0.97+ |
six different languages | QUANTITY | 0.97+ |
PHP | TITLE | 0.97+ |
MongoDB | TITLE | 0.97+ |
80s | DATE | 0.97+ |
Envoy | TITLE | 0.96+ |
two different types | QUANTITY | 0.96+ |
one place | QUANTITY | 0.94+ |
NGINX | TITLE | 0.94+ |
TheCUBE | ORGANIZATION | 0.93+ |
second scale | QUANTITY | 0.92+ |
CloudNativeCon 2017 | EVENT | 0.92+ |
Floodgate | ORGANIZATION | 0.92+ |
about three years ago | DATE | 0.92+ |