Garima Kapoor, MinIO | KubeCon + CloudNativeCon NA 2022

>>How y'all doing? My name's Savannah Peterson, coming to you from Detroit, Michigan, where the cube is excited to be at Cube Con. Our guest this afternoon is a wonderfully brilliant woman who's been leading in the space for over eight years. Please welcome Gar Kapur. Gar, thanks for being with us. >>Well, thank you for having me to, It's a pleasure. Good >>To see you. So, update what's going on here? Co saw you at VMware Explorer. Yes. Welcome back to the Cube. Yes. What's, what's going on for you guys here? What's the message? What's the story >>Soupcon like I always say, it's our event, it's our audience. So, you know, Minayo, I dunno if you've been keeping track, Mani ha did reach like a billion docker downloads recently. So >>Congratulations. >>This is your tribe right here. Yes, >>It is. It is. Our >>Tribe's native infrastructure. Come on. Yes. >>You know, this audience understands us. We understand them. You know, you were asking when did we start the company? So we started in 2014, and if you see, Kubernetes was born in 2015 in all sorts of ways. So we kind of literally grew up together along with the Kubernetes journey. So all the decisions that we took were just, you know, making sure that we addressed the Kubernetes and the cloud native audiences, the first class citizens when it comes to storage. So I think that has been very instrumental in leading us up to the point where we have reached a billion docker downloads and we are the most loved object storage out >>There. So, So do you like your younger brother Kubernetes? Or not? Is this is It's a family that gets along. >>It does get along. I think in, in Kubernetes space, what we are seeing from customer standpoint as well, right? They're warming up to Kubernetes and you know, they are using Kubernetes as a framework to deploy anything at scale. And especially when you're, you know, offering storage as a service to your, whether it is for your internal audience or to the external audience, Kubernetes becomes extremely instrumental because it makes Multitenancy extremely easy. It makes, you know, access control points extremely easy for different user sets and so on. Yeah. So Kubernetes is definitely the way to go. I think enterprises need to just have little bit more skill set when it comes to Kubernetes overall, because I think there are still little bit areas in which they need to invest in, but I think this is the right direction, This is the right way. If you, if you want multi-tenant, you need Kubernetes for compute, you need Kubernetes for storage. So >>You guys hit an interesting spot here with Kubernetes. You have a product that targets builders. Yes. But also it's a service that's consumed. >>Yes. Yes. >>How do you see those two lanes shaping out as the world starts to grow, the ecosystems growing, You've got products for builders and products for people who are developers consuming services. How do you see that shaking out? Is just, is there intersections there? There is. You seem to be hitting that. >>There is. There is definitely an intersection. And I think it's getting merged because a lot of these users are the ones who dictate what kind of stack they want as part of their application ecosystem overall, right? So that is where, when an application, for example, in the big data workloads, right? They tell their IT or their storage department, this is the S3 compatible storage that they want their applications to run on or sit on. So the bridges definitely like becoming very narrow in that way from builders versus the service consumers overall. And I think, you know, at the end of the day, people need to get their job done from application users perspective. They want to just get in and get out. They don't want to deal with the underlying complexity when it comes to storage or any of the framework, right? So I think what we enable is for the builders to make sure they have extremely easy, simple, high performance software service that they can offer it to their customers, which is as three compatible. So now they can take their applications wherever they need to go, whether it is edge, whether it is on-prem, whether it is any of the public cloud, wherever you need to be, go be with it. With >>Mei, I mean, I wanna get your thoughts on a really big trend that's happening now. That's right. In your area of expertise. That is people are realizing that, hey, I don't necessarily need AWS S3 for storage. I gotta do my own storage or build my own. So there's a cost slash value for commodity storage. Yes. When does a company just dive to what to do there? Do they do their own? You see, CloudFlare, you seeing Wasabi, other companies? Yes. Merging. You guys are here. Yeah, yeah. Common services then there's a differentiator in the cloud. What's the, what's this all about? >>Yeah, so there are a couple of things going on in this space, right? So firstly, I think cloud model is the way to go. And what, what we mean by cloud is not public cloud, it's the cloud operating model overall, right? You need to build the applications the correct way so that they can consume cloud native infrastructure correctly. So I think that is what is going on. And secondly, I think cloud is great for your burst workloads. It's all about productivity. It's all about getting your applications to the market as fast as you can. And that is where of course, MIN IO comes into play when you know you can develop your applications natively on something like mania. And when, when you take it to production, it's very easy no matter where you go. And thirdly, I think when it comes to the cost perspective, you know, what we offer to the customers is predictability of the cost and no surprise in the builds when it comes, which is extremely important to like a CFO of a company because everyone knows that cloud is not the cheapest place to run your sustainable workloads. And there is unpredictability element involved because, you know, people leave their buckets on, people leave their compute nodes on it, it happens all the time. So I think if you take that uncertainty out of it and have more predictability around it, I think that is, that is where the true value lies. >>You're really hitting on a theme that we've been hearing a lot on the cube today, which is standardization, predictability. Yes. We, everyone always wants to move fast, but I think we're actually stepping away from that Mark Zuckerberg parity, move fast and break things and let's move fast, but know how much it's gonna cost and also decrease the complexity. Drugs >>Don't things. >>Yeah, yeah, yeah, exactly. And try, you know, minimize the collateral damage when Yeah. I, I love that you're enabling folks like that. How is, I'm curious because I see that your background, you have a PhD in philosophy, so we don't always see philosophy and DevOps and Kubernetes in the same conversation. Yeah. So how does this translate into your leadership within your team and the, And Min i's culture, >>So it's PhD in financial management and financial economics. So that is where my specialization lies. And I think after that I came to Bay Area. So once you're in Bay Area, you cannot escape technology. It is >>To you, >>It is just the way things are. You cannot escape startups, you cannot escape technology overall. So that's how I got introduced to it. And yeah, that it has been a great journey so far. And from the culture standpoint of view, you know, I always tell like if I can learn technology, anyone can learn technology. So what we look for is the right attitude, the right kind of, you know, passion to learn is what is most important in this world if you want to succeed. And that's what I tell everyone who joins the, who joins win I, two months, three months, you'll be up and going. I, I'm not too worried about it. >>But pet pedigree doesn't always play into it because no, the changing technology you could level up. So for sure you get into those and be contributing. >>I think one of the reasons why we have been successful the way we have been successful with storage is because we've not hired storage experts. Because they come with their own legacy and mindset of how to build things. And we are like, and we always came from a point of view, we are not a storage company. We are a data company and we want to be close to the data. So when you come to that mindset, you build a product directly attacking data, not just like, you know, in traditional appliance world and so on, so forth. So I think those things have been very instrumental in terms of getting the right people on board, making sure that they're very aligned with how we do things and you know, the dnf, the company's, >>That's for passion and that's actually counterintuitive, but it's makes sense. Yes. In new markets it doesn't always seem to take the boiler plate. Yes. Skill set or person. No, we're doing journalism, but we don't hire journalists. No, >>I mean you gotta be, It's adventurers. It is. It's curious. >>Exactly. Exactly. Yeah, I, yeah, I think also, you know, for you to disrupt any space, you cannot approach it from how they approach the problem. You need to completely turn the tables upside down as they say, right? You need to disrupt it and have the surprise element. And I think that is what always makes a technology very special. You cannot follow the path that others have followed. You need to come from a different space, different mindset altogether. So that is where it's important that you, like you said, adventurous are the people >>That that is for sure. Talk to us about the company. Are you growing scaling? How do people find out more? >>Oh yeah, for sure. So people can find out more by visiting our website. Min dot i, we are growing. We just closed last year, end of last year we closed our CDC round unicorn valuation and so on, so forth. So >>She says unicorn valuation, so casually, I just wanna point that out, that, that, that, that's funny. Like a true strong female leader. I love that. I >>Love that. Thank you. Yes. So in terms of, you know, in terms of growth and scalability, we are growing the team. We are, you know, onboarding more commercial customers to the platform. So yeah, it's growth all across growth from the community standpoint, growth from commercial number standpoint. So congratulations. Yeah, thank you. >>Yeah, that's very exciting. Grma, thank you so much for being, >>Being with us. Thank you for >>Having me. Always. Thanks for hanging out and to all of you, thank you so much for tuning into the Cube, especially for this exciting edition for all of us here in Detroit, Michigan, where we're coming to you from Cuban. See you back here in a little bit.

Published Date : Oct 26 2022

SUMMARY :

My name's Savannah Peterson, coming to you from Detroit, Well, thank you for having me to, It's a pleasure. What's, what's going on for you guys here? So, you know, This is your tribe right here. It is. Yes. So all the decisions that we took were just, you know, making sure that we addressed the Kubernetes and the cloud Is this is It's a family that gets along. you know, offering storage as a service to your, whether it is for your internal audience or to the external audience, You have a product that targets builders. How do you see those two lanes shaping out as the world starts to grow, the ecosystems growing, And I think, you know, at the end of the day, people need to get their job done You see, CloudFlare, you seeing Wasabi, other companies? I think when it comes to the cost perspective, you know, what we offer to the but know how much it's gonna cost and also decrease the complexity. And try, you know, minimize the collateral damage when Yeah. And I think after that I came to Bay Area. And from the culture standpoint of view, you know, I always tell like if I can learn technology, But pet pedigree doesn't always play into it because no, the changing technology you could level So when you come to that mindset, In new markets it doesn't always seem to take the boiler plate. I mean you gotta be, It's adventurers. for you to disrupt any space, you cannot approach it from how they approach the problem. Are you growing scaling? So people can find out more by visiting our website. I love that. you know, onboarding more commercial customers to the platform. Grma, thank you so much for being, Thank you for in Detroit, Michigan, where we're coming to you from Cuban.

ENTITIES

Entity	Category	Confidence
2015	DATE	0.99+
2014	DATE	0.99+
Savannah Peterson	PERSON	0.99+
last year	DATE	0.99+
Bay Area	LOCATION	0.99+
Mark Zuckerberg	PERSON	0.99+
three months	QUANTITY	0.99+
two months	QUANTITY	0.99+
Minayo	PERSON	0.99+
Garima Kapoor	PERSON	0.99+
two lanes	QUANTITY	0.99+
Detroit, Michigan	LOCATION	0.99+
Gar Kapur	PERSON	0.99+
KubeCon	EVENT	0.99+
AWS	ORGANIZATION	0.99+
Gar	PERSON	0.99+
CloudNativeCon	EVENT	0.98+
Kubernetes	TITLE	0.98+
Kubernetes	PERSON	0.98+
Cuban	LOCATION	0.98+
Wasabi	ORGANIZATION	0.98+
Detroit, Michigan	LOCATION	0.98+
one	QUANTITY	0.97+
over eight years	QUANTITY	0.97+
three	QUANTITY	0.97+
MIN IO	TITLE	0.97+
today	DATE	0.97+
CDC	ORGANIZATION	0.94+
firstly	QUANTITY	0.94+
VMware Explorer	ORGANIZATION	0.93+
Grma	PERSON	0.9+
end of last year	DATE	0.9+
a billion docker downloads	QUANTITY	0.9+
thirdly	QUANTITY	0.86+
this afternoon	DATE	0.86+
S3	TITLE	0.85+
Cube Con.	EVENT	0.82+
NA 2022	EVENT	0.82+
Mani	PERSON	0.81+
MinIO	ORGANIZATION	0.76+
secondly	QUANTITY	0.74+
first class	QUANTITY	0.74+
Cube	COMMERCIAL_ITEM	0.65+
billion docker	QUANTITY	0.59+
DevOps	TITLE	0.53+
CloudFlare	ORGANIZATION	0.53+
Soupcon	ORGANIZATION	0.43+

Garima Kapoor, Minio | VMware Explore 2022

>>Hey, welcome back everyone. Through the cubes coverage of VMware Explorer, 22, I'm John Fett, Dave ante, formerly world, our 12th year extracting the signal from the noise. A lot of great guests. It's very vibrant right here. The floor's great. The expo halls booming, the keynotes went great. We just had a keynote announce. So our next first guest here on day one is car Capor C co-founder and COO min IO. Welcome to the cube. Thanks for joining us. >>Thank you for having >>Me. You're also angel investor of variety of companies of Q alumnis and been in the valley for a long time. Thanks for coming on sharing. What's going on. So, first of all, obviously VMware still on the wave. They've always been relevant and they've always been part of it. Yes. But as that's changing a lot's going on security data's big conversation. Yeah. And now with their multi-cloud we call super cloud. But their multi-cloud it's it's about hyperscaler participation. Yes. Yes. Cloud universal. Yes. It's clear that VMware has to be successful in every cloud. Okay. And that's really important. And storage is one of it. You guys do that? So talk about how you guys relate with min IO, the vision, how that connects with what's happening here. >>Yeah. So like you already said, right? Most of the enterprises are become data enterprises in itself and storage is a foundation layer of how, and you do need a system that is simple, scalable, and high perform it at scale. Right? So that's where min IO fits into the picture. And we are software defined, open source. So, you know, like VMware has traditionally been focused on enterprise it, but that world is fast changing. They are making a move in terms, developer first approach and min IO, because it's open source. It's simple enough to start, get, start deploying object storage and cloud native applications on top. So that's where we come in. We have around 1.3 million DACA downloads a day. So we own the developer market overall. And that is where I feel the partnership with VMware as they are coming into multi-cloud on their own min IO is a foundational layer. >>So just to elaborate on it, whenever you talk about multi-cloud, there are two pieces to it. One is the compute side and one is on the storage side. So compute Kubernetes takes care of the compute sites. Once you containerize an application, you can deploy it any cloud, but the data has gravity and all the clouds that you see AWS, your Google cloud, they're inherently incompatible with each other. So you need a consistent storage layer with industry standard APIs that you can just deploy it around with your application without a single line of code change. So that's what we >>Do. Oh, so you got a great value proposition, love the story. So just kind of connect on something. So we heard the keynote today. We gotta win the developers. They didn't say that, but they said, they said that they have the ops lockdown, but DevOps is now the new developer. Yes. We've been covering a lot of the poop coupon as you know, and shifting left everyone's in the C I C D pipeline. So developers are driving all the action and it has to be self-service. Absolutely. It has to be high velocity. Can't be slow. Yes. Gotta be fast. So that sounds like you're winning that piece. >>Yes. Yes. And I think more than that, what is most important is it needs to be simple. It needs to get your job done in a very simple and efficient way. And I think that is very important to the developers overall. They don't like complex appliances or complex piece of software. They just want to get their job done and move on the next thing in order to build their application and deploy it successfully. So whatever you do, it needs to be very simple. And of course, you know, it needs to be feature rich and high performant and whatnot that comes with the, with the flow in itself. But I think simplicity is what wins, the developers, hearts and minds overall. >>So object storage always been simple, get put right. Pretty simple, you know, paradigm. Yes. But it was sort of the backwater before, you know, Amazon, you know, launched. Yes. You know, it's cloud. How have you seen object evolve? You mentioned performance. So I presume yes. Yes. You're not just for cheap and deep you're for cheap bin performance. So you could describe that a little bit if you would, >>For, for sure. Like you mentioned, right. When AWS was launched, S3 was the foundation layer. They launched S3 first and then came everything else around it. So object storage is the foundation of any cloud that you go with. And over a period of time, when we started the company back in 20 end of 2014, beginning 2015, it was all about cheap and deep storage. You know, you just get, put it into one basket, but over years, if you see, because the scale of data has increased quite a bit, new applications have emerged as well. That require high performance. That is where we partnered very closely with Intel early on. And I have to give it to them. Intel was the one who convinced us that you need to do high performance. You need to optimize your software with all the AVX five, 12 instruction set and so on. >>So we partnered very closely with them and we were the first one to come up with, you know, you need high performance, object storage and that in collaboration with Intel. So that's something that we take a lot of pride in, in terms of being the leader in that direction of bringing high performance object storage to the market, especially for big data workloads, AI ML, workloads, they're all object first, like even, you know, new age applications like snowflake and data bricks, they are not built on sand or file system. Right. They're all built on object storage rates. So that's where the, you need >>Performance. And I think the, I think the data bricks, snowflake examples. Good. And then you mentioned in 2014, when you started yes. At that time, big data was Hudu and you know, data, legs, data swamp. Yes. Yes. But the ones that were successful, the ones who optimize had the right bets, like you guys. Yeah. Now we're in an era. Okay. I gotta deploy this. So you got great downloads and update from developers. Now we see ops struggling to keep up yes. With the velocity of the development cycle. Yes. And with DevOps driving the cloud native yeah. Security data ops becomes important. Okay. Exactly. Security and data. A lot with storage going on there. Yes. How do you guys see that emerging? Cuz that becomes a lot of the conversations now in the architecture of the ops teams. I want to be supportive in enablement of dev. Yes. Yes. Do you guys target that world too? Or >>Yeah, we, we do target that. So the good thing about object storage is that if you look at the architecture in itself, it's very granular in terms of the controls that it can give to the end user. Right? So you can really customize in terms of, you know, what objects need to be accessible to whom what kind of policies you need to implement on the bucket level, what kind of access controls and provisions that you need to do. And especially like with ransomware attacks and what not, you can enable immutability and so on, so forth. So that's an important part of it. Especially I think the ransomware threats have increased quite a bit, especially with, you know, the macro, you know, situation with war and stuff. So we see that come up quite a bit. And that's where I think, you know, the data IU immutability, the data governance and compliance becomes extremely, extremely important for organizations. So we, we are partnering very closely with a lot of big organizations just for this use case itself. >>So how's it work if I want to build some kind of multi-cloud whatever X, right. Okay. I, I can use S three APIs or Azure blah. Okay. And I, and are all different. Yes. But if I want to use min IO, what's the experience like describe how I go about doing >>So if you've had any experience working with AWS, you don't need to even change a single line of code with us. You can just bring your applications directly onto min IO and it just behaves and act same way transparently what you would've experienced in AWS. Now you can just lift and shift that application and deploy it wherever you need it to be. Whether it is Azure, blah, whether it is Google cloud or even on edge. Like what we are seeing is that data is getting generated outside of public cloud. And most of the data that, you know, the emerging trend is that we see that data gets generated on edge quite a bit, whether it is autonomous cars, whether it is IOT, manufacturing units and so on. And you cannot push all that data back in the central cloud, it's extremely expensive for bandwidth and latency reasons. >>So you need to have an environment that looks and feels exactly what you have experienced at the central cloud on the edge itself. So a lot of our use cases are also getting deployed with Mani on the edge itself, whether it is on top of VMware because of the footprint of that VMware has within all these organizations itself. So we see that emerging quite a bit as well. And then you can tier the data off to any cloud, whether it is mid IO cloud, whether it is AWS, Azure, Google cloud, and so on. So you can have like a true multi-cloud environment. >>So you would follow VMware to the edge and be the object store there, or not necessarily if it's not VMware Kubernetes or whatever. >>Exactly. Exactly. Depending on the skill set that the organization has within, within their setup, if their DevOps savvy Kubernetes is becomes a very natural choice. If they are traditional enterprise, it, VMware is an ideal choice. So yeah. >>So you're seeing a lot of edge action you're saying, and we, >>We, we have seen starting it increasing yes. And >>Are customers. So they're persisting data at the edge. Yes. Yes they >>Are. Okay. >>It's not just the femoral and >>No, they are not because what the cost of putting all the data through bandwidth is extremely expansive to push all the data in central cloud and then process it and then store it. So we see that the data gets persisted on edge cloud as well in terms of processing and only the data that you need for, for the processing through whatever application systems that you, whether it is snowflake or data, bricks and whatnot, you know, you choose what applications from compute side, you want to bring on top of storage. And that can just seamlessly and transparently work. Yeah. >>Maria, you were saying that multi-cloud yeah. Games around Kubernetes. You, yes. That Kubernetes is all about multi-cloud that's the game. >>Yes. >>Yes. Can you explain what you mean by that? Why is multi-cloud a Kubernetes game? >>So multi-cloud has two foundations to it. One is the compute side. Another one is the storage side. Compute Kubernetes makes it extremely simple to deploy any application that is containerized. Once you containerize an application, it's no longer tied to the underlying infrastructure. You can actually deploy it no matter where you go. So Kubernetes makes that task extremely easy. And from storage standpoint, you know, the state of applications need to be held somewhere. You know, it's it, people say it's cloud, but it's computer somewhere. Right? So >>Exactly it's the >>Container. It needs, it needs to be stored somewhere. So that's where, you know, storage systems like man IO come into play where you can just take the storage and deploy it wherever you go. So it gets tightly bound with application itself, just like Kubernetes is for compute. Mano is for storage. >>I saw Scott Johnson, the CEO of Docker in Palo Alto last week did yeah. The spring to his step. So to speak Dockers doing pretty well as a result, they got, you know, starting to see certifications. Yes. So people are really rallying around containers in a more open way. Yes. But that's open source, but it's the Kubernetes, that's the action. Absolutely. That the container's really there now Docker's got a great business. Yes. Right now going yes. With how they're handling. I thought they did a great job. Yeah. But the Docker's now lingua Franco, right? Yes. That's the standard. It >>Is. It is. And I think where Kubernetes really makes it easy is in terms of when the scale is involved. Right. If there are, if the scale is small, it's okay. You can, you can work around it. But Kubernetes makes it extremely simple. If you have the right Kubernetes skill, I just need to put a disclaimer around there because not lot of people are Kubernetes expert, at least not yet. So if you have the expertise, Kubernetes makes the task extremely simple, predictable and automate and automated scale. I think that is what is >>The, so take me through a use case, cuz I've talked to a lot of enterprises, multiple versions, we're lifting and shifting to the cloud, that's kind of the, you know, get started, get your feet wet. Yes. Then there's like, okay, now we're refactoring really doing some native development and they're like, we don't have a staff on Kubernetes. We do a managed service. Yeah. So how does, how do you see that evolution piece taking place? Cause that's a critical adoption component as they start figuring out their Kubernetes relationship yes. To compute yes. How they roll it out. Yes. How do you see that playing out as a big part of this growth for a customer? >>Yeah. So we see a mix, you know, we see organizations that are born within cloud. Like they have just been in mono cloud like AWS. Now they are thinking about two things, right. With the economy being, you know, and the state that it is, they're getting hurt on the margin. Some of the SaaS companies that were born in cloud. So they are now actively thinking in terms of what mode they can do to bring the cost down. So they are partnering with min IO either to, you know, be in a colocation at Equinix, like data centers or go to other clouds to optimize for the compute modes and so on. So that's one thing that we see increasingly amongst enterprise. Second thing that we see is that because you know of that whole multi-cloud and cloud does go down, it's not like it, you know, and it's been evident over the last year or so that, you know, we've seen instances where Amazon was down or Google cloud was down. So they want to make sure that the data is available across the clouds in a consistent way. So with man IO, with the active, active application and so on, you can make the data available across the cloud. So your applications, even if one cloud is down for Dr. Purposes and so on, you can, you know, transparently, move the applications to another cloud and make sure that your business is not affected. So from business continuity reasons as well, the customers are partnering with us. So like I said, it's a mix. >>So the Tansu, you know, 1.3, the application development platform that we heard in the keynotes this morning, critical, you have to have that for cross cloud services. If you don't have a consistent experience, absolutely forget it. I mean it's table stake. Absolutely. But there's a lot of chatter on Twitter. A lot of skepticism that VMware can appeal to developers, some folk John as well chimed in saying, well, you know, it's, don't forget about the op side of the equation as well. They need security and consistency. Yes. What are you seeing in the marketplace in terms of VMware, specifically their customers and, and what do you, what do you, how do you rate their chances in terms of them being able to track the developer crowd, your, your peeps? >>Yeah. So VMware has a very strong hold on enterprise. It, you know, you have to give it to them. I don't come across any organization that does not have VMware, you know, for, with 500,000 customers. Right. Right. So they have done something really right for themselves. And if you have such a strong hold on the customers, it's not that hard to make the transition over to the developer mindset as well. And that is where with VMware partnership with partners like us, they can make, make that jump happen. So we partnered with them very closely for the data persistence layer and they wanted to bring Kubernetes the VMware tan natively to the VSAN interface itself. So we partnered with them, you know, we were their design partner and in, I think, 2020 or something, and we were their launch partner for that platform service. So now through the vCenter itself, you can provision object storage as a service for the developers. So I think they are working in terms of bridging the gap and they have the right mindset. It's all about execution like this. Right. >>They gotta get it >>Justed >>And it's the execution and timing. Exactly. And if they overshoot and the, it shifts over here, you know, this comes up a lot in our conversations. I want to get your reaction to this because I think that's a really great point. You guys are a nice foundational element. Yes. For VMware that plugs into them. That makes everything kind of float for them. Yes. Now we would, we were comparing OpenStack back in the day, how that had so much promise. Yes it did. If you remember, and storage was a big part of that conversation. It, it did. But the one thing that a lot of people didn't factor in on those industry discussions was Amazon was just ramping. Yes. So assuming that the hyper scales aren't stopping, innovating. Yeah. How does the multi-cloud fit with the constant struggles? Cuz abs is not rah multi-cloud cause they're there for the cloud, but customers are using Azure for yeah. Say office productivity teams or whatever, and then they have apps over here and then I'll see on private, private. Right. So hybrids there we get hybrid. Yeah. The clouds aren't changing. Yes. How does that change the dynamics in the market? Because it's a moving train. Some say, >>You know, it is, I would not characterize it like that because you know, AWS strength is that it is AWS, but also that it is not outside of AWS. Right. So it comes with the strengths and weaknesses and same goes for Azure. And same goes for Google cloud where VMware strength lies is the enterprise customers that it has. And I think if they can bridge the gap between the developers, enterprise customers and also the cloud, I think they have a really fair shot at, you know, making sure that the organizations and enterprise have the right experiences in terms of, you know, everyone needs to innovate. There is just no nothing that you can just sit back and relax. Everyone needs to innovate. And I think the good part about VMware is the partnership ecosystem that they have developed over the years and also making sure that their partners are successful along with them. And I think that is, that is going to be a key determining factor in terms of how well and how fast they can execute because nobody can do it alone in, in the enterprise world. So I think that that would be the >>Key, well, gua you're a great guest. Thanks for coming on and sharing you for having perspective on the cube. And obviously you've been on a, this from day 1, 20 15. Yes. I mean that's early and you guys made some great moves. Thank you. In a great position with VMware. Thank you. I like how you're the connective tissue and bridge to developers without a lot of disruption. Right? Real enablement. I think the question is can the VMware customers get there? So congratulations. No, thank you. And we got a couple minutes left. Take a minute to explain what's going on with the company that you co-founded, the team what's going on. Any updates funding very well, well funded. Yeah. How many people do you have? What's new. Are you gonna hire where take a minute to give the plug, give the commercial real quick >>For sure. So we started in 24 15, so it has been like seven, eight years now that we are at it. And I think we've been just very focused with the S3 compatible object storage, being AWS S3 for rest of the world. Like we get characterized at and over the years we've been like now we, we are used 60% in fortune 500 companies in some shape or format. So in terms of the scale and growth, we couldn't be more happier. We are about to touch a billion dollar billion Docker downloads in September. So that's something that we, we are very excited about. And in terms of the funding, we closed the, our series B sometime I think end of December last year and it's a billion dollar valuation and we have great partners in Intel capital and Dell ventures and soft bank. So we couldn't be in a more happier >>Spot. You're a unicorn soon to be decor. Right. >>What's next? Yes. I think, I think what is exciting for us is that the market, we could not be more happier with how the market is coming together with our vision, what we saw in 2015 and how everything is coming together nicely with, from the, the organization, realizing that multi-cloud is the core foundation and strategy of whatever they do next and lot has been accelerated due to COVID as well. Yeah. So in those terms, I think from market and product alignment, we just couldn't be more happier. >>Yeah. We think multi-cloud hybrids here. Steady state multi-cloud is gonna be a reality. Yeah. It becomes super cloud with the new dynamics. And again, David and I were talking last night, storage, networking, compute never goes away, never goes the operating. System's still gonna be out there. Just gonna be looked different and that >>Differently. Yes. I mean, yeah. And like, you know, in 10 years from now, Kubernetes might or might not be there as the foundation for, you know, compute, but storage is something that is always going to be there. People still need to persist the data. People still need a performance data store. People still need something that can scale to hundreds and hundreds of petabytes. So we are here. You bet against data >>As indie gross head once, you know, let chaos rain, rain in the chaos. There you go. Chaos cloud is gonna be simplified. Yeah. That's what innovation looks like. That's, >>That's what it is. >>Thanks for coming on the queue. Appreciate thank you for having me more coverage here. I'm John furrier with Dave Alane. Thanks for watching. More coverage. Three days just getting started. We'll be right back.

Published Date : Aug 30 2022

SUMMARY :

So our next first guest here on day one is car Capor So talk about how you guys relate with and storage is a foundation layer of how, and you do need a system that is simple, So just to elaborate on it, whenever you talk about multi-cloud, there are two pieces to it. as you know, and shifting left everyone's in the C I C D pipeline. And of course, you know, it needs to be feature rich and high performant and whatnot that comes with the, So you could describe that a little bit if you would, So object storage is the foundation of any cloud that you go with. So we partnered very closely with them and we were the first one to come up with, you know, you need high performance, So you got great downloads and update from developers. So the good thing about object storage is that if you look at So how's it work if I want to build some kind of multi-cloud whatever X, right. And most of the data that, you know, the emerging trend is that we see that data gets generated So you need to have an environment that looks and feels exactly what you have experienced at the central cloud on So you would follow VMware to the edge and be the object store there, or not necessarily if So yeah. We, we have seen starting it increasing yes. So they're persisting data at the edge. data that you need for, for the processing through whatever application systems that you, Maria, you were saying that multi-cloud yeah. Why is multi-cloud a Kubernetes game? And from storage standpoint, you know, the state of applications need to be held somewhere. So that's where, you know, So to speak Dockers doing pretty well as a result, they got, you know, starting to see certifications. So if you have the expertise, Kubernetes makes the task extremely So how does, how do you see that evolution piece taking With the economy being, you know, and the state that it is, they're getting hurt on the margin. So the Tansu, you know, 1.3, the application development platform that we heard in the keynotes So we partnered with them, you know, we were their design partner and So assuming that the hyper scales aren't stopping, innovating. the cloud, I think they have a really fair shot at, you know, Take a minute to explain what's going on with the company that you co-founded, the team what's going on. So in terms of the scale and growth, we couldn't be more happier. Right. So in those terms, I think from market and product alignment, we just couldn't be more happier. networking, compute never goes away, never goes the operating. And like, you know, As indie gross head once, you know, let chaos rain, rain in the chaos. Appreciate thank you for having me more coverage here.

ENTITIES

Entity	Category	Confidence
David	PERSON	0.99+
2015	DATE	0.99+
Dave Alane	PERSON	0.99+
Amazon	ORGANIZATION	0.99+
September	DATE	0.99+
2014	DATE	0.99+
Maria	PERSON	0.99+
Garima Kapoor	PERSON	0.99+
AWS	ORGANIZATION	0.99+
John Fett	PERSON	0.99+
Palo Alto	LOCATION	0.99+
VMware	ORGANIZATION	0.99+
60%	QUANTITY	0.99+
two pieces	QUANTITY	0.99+
Scott Johnson	PERSON	0.99+
Dave	PERSON	0.99+
John	PERSON	0.99+
seven	QUANTITY	0.99+
Docker	ORGANIZATION	0.99+
last week	DATE	0.99+
Dell	ORGANIZATION	0.99+
Equinix	ORGANIZATION	0.99+
20 end of 2014	DATE	0.99+
12th year	QUANTITY	0.99+
2020	DATE	0.99+
Three days	QUANTITY	0.99+
500,000 customers	QUANTITY	0.99+
One	QUANTITY	0.99+
two things	QUANTITY	0.99+
one thing	QUANTITY	0.99+
Second thing	QUANTITY	0.99+
12 instruction	QUANTITY	0.99+
eight years	QUANTITY	0.99+
last year	DATE	0.98+
Intel	ORGANIZATION	0.98+
John furrier	PERSON	0.98+
today	DATE	0.98+
first guest	QUANTITY	0.98+
500 companies	QUANTITY	0.97+
one basket	QUANTITY	0.97+
first one	QUANTITY	0.97+
last night	DATE	0.97+
around 1.3 million	QUANTITY	0.97+
Kubernetes	TITLE	0.97+
20 15	DATE	0.97+
one	QUANTITY	0.96+
single line	QUANTITY	0.96+
end of December last year	DATE	0.96+
S3	TITLE	0.96+
Twitter	ORGANIZATION	0.96+
DevOps	TITLE	0.96+
S3	COMMERCIAL_ITEM	0.95+
Tansu	ORGANIZATION	0.95+
Minio	PERSON	0.94+
two foundations	QUANTITY	0.94+
Azure	TITLE	0.92+
a day	QUANTITY	0.9+
OpenStack	TITLE	0.9+
AVX five	COMMERCIAL_ITEM	0.9+
this morning	DATE	0.89+
Google	ORGANIZATION	0.88+
first	QUANTITY	0.88+
vCenter	TITLE	0.87+
COVID	OTHER	0.86+
Hudu	ORGANIZATION	0.86+
billion dollar	QUANTITY	0.86+
DACA	TITLE	0.85+

Anand Babu Periasamy, MinIO | VMworld 2020

>>from around the globe. It's the Cube with digital coverage of VM World 2020 brought to you by VM Ware and its ecosystem partners. Welcome back. I'm stew Minuteman, and this is we've actually reached the end of the cubes coverage of VM World 2020. Hard to believe. 11 years we've done lots of interviews here has been great to be able to engage with the audience talk, talk to the executives, talk some customers, but saving one more for you. So happy to welcome to the program is the first time on the Cube. But we've been talking to him since they came out of stealth. So I have the co founder and CEO of Minhai. Oh, and that is a non Babu Harry Asami A B. So nice to see you. Thanks so much for joining us. Thank >>you too. Thank you for having me on the show. >>Alright. So we love when we get to talk to the founders of companies were gonna dig into your company. But before we do just frame for us, you're not really high performance. I Oh, I oh, is in the name of your company. Um, men might make me think that there's some miniaturization, but give us the VM Ware connection. Obviously, VM Ware talked a lot about Cloud this week. They've talked about going deep into a I and computing. So we know this ecosystem has changed a lot in the 11 years that we've been covering it. Tell us how you and your company high end >>sounds good. Yeah. So men in many of those stands for minimalism right somehow in the enterprise like it has always been like shiny, heavy, complex things, find complex solutions to simple problems and charge them a lot. That has been the trend in the past, right? That's what Cloud has recent in the Enterprise and men on mini Iot is actually about solving that data storage problem. A very large scale. And the solution is like find simple solutions to complex problems. And we grew in the cloud in the both in the public and Private Cloud, and we are now the fastest growing object storage for the private cloud. And now we, um, we're coming into the government, the territory we actually CVM where is set to lead the kubernetes race. And in the Cooper Natives, if you look for an object storage pretty much, many ways standard. And this is where we bring our ecosystem toe. Be aware. And we, um where brings the enterprise market of cloud And this is the start off the private cloud. In the long run, I think public and private cloud will look alike. >>Yeah, absolutely. We've We've been writing about this for for for for many years a b We saw the enterprises taking on more of the characteristics of the hyper scholars, the hyper scholars. Of course, they're coming more to the enterprise. Ah, lot of discussion about hybrid and multi cloud these days. But what I want you to explain a little bit when? When When when your company was formed. You talk about, you know, doing these kubernetes environment. You do partner with AWS and azure, but ah, lot of what you do is on premises and that strikes people as a little bit unconventional in the thing. Or definitely 2017 and even for 2020. So help us understand. You know what it is exactly that you know the technology bring and why you think it's the fit for if you extend making private cloud on par with public. >>Yeah, it's not surprising to us at all, but it made no sense when we started with the rest of the world, right? Even the investors like not our other investors but the typical venture community toe the rest of the world. They thought that an object storage if it is not useful inside AWS, there is no use but an object storage at all. And we our question was very simple that the amount of data the world will produce in the next 10 years bulk off the data. Where is it going to be? Right? And it's not going to be in the public cloud. And it didn't sound obvious back then, right? And we saw that in the long run, public and private cloud will look alike but bulk of the data if it's going to be generated outside AWS while AWS s three sets the standard, the rest of the world what are they going to do? So many who was raised to be the S three for the rest of the world and the rest of the world is the biggest market. And back then there was no private cloud. There was public cloud and public cloud. What meant only AWS, right? And this was not so long ago. We're talking like 56 years, right? And then soon multi cloud came from multi cloud private cloud came what really accelerated. This is basically kubernetes and containers, right? In fact, containers started the trend and then Coburn It has accelerated it further nowadays. If you if you see why it's no longer a dream, are a faith based model, right, it's actually we're we're talking about, like a $540,000. Actually, 540,000 doctors pulled a day, right? And 400 like 400 well million or so Dr Pools in aggregate. That shows that the entire industry has changed, and it's already the Coburn. It is even public or private cloud. It is the one hybrid infrastructure layer, and now it has now it's no longer private Cloud is that question right? And customers are now able to move between public and private cloud. The trend is hybrid hybrid cloud. I think it's irreversible. >>Alright, you talked about Dr Poles and the code there, so let's make sure our audience understand exactly what you are. Sounds like your software sounds like open source is a piece of it. Help us understand. You know how you fit with Because if we're talking about object storage, there's gotta be some infrastructure underneath that. What does mean I owe provide and where do you turn to the partners? >>Yeah, so just like server less, it means that it's not like there is no server, right? It's about a software problem. Similarly, storage right When store when object storage is containerized, we still need drives, right? That is where VM ware V Sand comes. Descends Job is to virtualized the physical layer toe the basically container layer. But end of the day if you see the it is a software problem and what may I would just like a database would solve the metadata data store problem. I mean, I will solve the blob data problem. And in the public, cloud object storage is the foundational piece. It is the primary storage, but we saw this as a software problem, and when customers started building these applications, they actually containerized their application and use Cooper notice to roll out their application infrastructure. And when they do that, they cannot possibly by a hardware appliance on the public cloud. And even on the on the private cloud, they when they when they completely orchestrate two containers, they cannot roll out hardware appliances. This is where the the industry the cloud native community always saw this as a software problem. It was obvious to them for the enterprise I t it was not so clear. And the storage industry giants, if you see everyone off them is a hardware appliance play, and they are in for a total shock. And we were basically as a as reset with their seven or to update one, if there is a lot of interesting things to come. >>All right, So if if I understand Here you sit from a VM Ware environment, I've got V sand underneath. I've got Tangguh above, and you're you're providing that object service in between. So for our for our friends in the in the channel market on when thinking about gear, anything that V san can sit on, you just can come along for the ride. Do I have that right? >>Yeah. So underneath the sand is basically bunch of J boards, right? These are like Dell and HP servers with the drives in them on This is not a hardware appliance anymore, right? You look at the storage market, it is. Stand our NASA plans. That is how the enterprise I t operated not in the club world. And as we and we're moves into the cloud world, everything looks cloud native and in this case, the sand. NASA plans have no role to play. Even the object storage hardware appliance has no role to play because we and we're becomes the end where Visa becomes the new block storage layer. And then they have positioned object storage database. Everything as a data data store are a data persist since layer. So only this software only the software that is contained race gets to play on top of, um, where in the new World, including the storage itself. And it's No, there is no appliance here. >>All right, so and your your solution is is listed as kubernetes kubernetes native. So now you mentioned VCR seven, VCR seven, update one Now house full kubernetes support. I'm assuming Then you can plug into tansy you you can plug into, uh, Amazon Azure. Other kubernetes options out there. Is that the case? >>Yeah, So from a customer point of view, right? If you are on the enterprise, I d. Environment Now from I t administrator point off you. Nothing changes much other than from the V Center console itself. You now get to see me, and I will in in the first suspend data services. You click and deploy entirely as a software without even learning to spell Cooper notice. You can build a private cloud storage multi tenant exactly like how public cloud storage outrage. And that is from the private cloud point a few right, and it's purely software. You're not waiting for six months, but the hardware to arrive and long procurement cycles and provisioning all that is now provisioned as a software container. In just five minutes, you can actually set up a private cloud in Prospector. That's for the private cloud, right? But why? The reason why customers want this to be a software problem is they roll out their software on the on the private cloud on the public cloud for burst, wear clothes and sustained work clothes on private cloud burst workloads on public cloud. Noncritical jobs are anything that is fast moving on, convenience based. They push it to public cloud. Customers do want tohave one leg here and one like there. And nowadays even the edge on decentralized on the from the telco space toe video on other other areas even the edges now growing toe. They want a your software solution. The entire data center software is now containerized. They can roll out Public cloud Our private cloud are on the edge On with me No, we solve the data side the compute side Then we're already has done a wonderful job on the networking side. They have done it on on the beast on the storage site dated the physical toe container layer movies. And now the data storage part is what we solved. Now what does this do to the end user? Now they can build software and truly deploy on public private our age without any modification on entirely it is a software problem. This >>great. What do you find? Or some of the more prevalent use cases, you know, sitting on top, What applications or the key ones that people are deploying your solution for >>Yeah, So in the public cloud, if you see, that's that. That's actually a good place to start if you see in the public cloud, right, starting from even simple static website hosting toe aml, big data, workloads toe. Even the modern databases like Snowflake, for example is built on object storage in the public cloud. It has become a truly horizontal play. And that is how it started right there. W started with history and then came everything else. And now that trend is beginning to percolate into the enterprise. And surprisingly, we found that the enterprise was the explosion of data. Growth is actually not about like cat videos, right? What? What are these touring? Mostly We found that bulk of the data that is drowning that crisis messing generated data. And these are basically like some kind of log data event data data streams that are continuously produced on that actually can grow from 10 terabytes to 10 petabytes in a very short time. This is where clearly object storage has become the right choice, just like in the public cloud. But customers are now adopting object storage as the primary storage and now multiple applications. Whether it is the cloud native applications in like the Hangzhou Application Service like spring boot and like all the clothes on re stack from their toe. So all the m l big data workloads pretty much everybody has been verging to object storage as there foundation. >>Yeah, absolutely. You seen some of those use cases very prevalent here in the VM Ware community. I heard you talking about it. I was expecting to hear you talk about Splunk data protection, something that's been a big topic of conversation in the last few years. Obviously, VM Ware has a number of key partners. So I'm assuming many of those air who you are also working with. >>Look, it felt good broad Splunk Splunk itself is actually is an important move that what we did recently with VM where finally we can run Splunk natively on BM where at large scale and without any performance penalty and at a price point that it becomes really attractive Now comparing Splunk Cloud, where's the Splunk on Prem? We can actually show like at least like one third off what it would cost to run on Splunk load. So I don't know Splunk themselves would like it, But I think Splunk as a company would like what customers like, right? And this is where Splunk actually now can sit on many, many us, all the all their data stores. They call it smart store underneath underneath me. I will now, when the previous original Visa incarnation, we couldn't actually your huge amounts of data. But now, with the visa and direct, we actually have access to the local drives and you can attach as many drives as you want. Then if you want more capacity, more more number of servers so you can pack thousands and thousands off drives at a price point that even public cloud cannot be anywhere closer. And this is actually important. Yeah, environment for the Splunk customers. Because for them, not only the cost right, even the data is sensitive for them. They cannot really, really push it to the public cloud data generated outside of the public cloud. If data generated inside Public Cloud, probably Amazon has their own solution, and Splunk cloud makes sense. But when data is produced outside, these are sensitive data and it's huge volume, and they produce on an average, like the kind of users VCs center about. It's a day on on, then it's only growing at an accelerated pace. And this is where the Visa and Direct and Mini Oh, you can now bring that workload onto the number. Finally, the ICTY can control the control, the Splunk deployments. This is something important for I t right in the past, if you see big data workloads always ran on bare metal and silos, something I d hated right This time it is flexible that it's not just flexible, exactly gets better. >>Well, it sure sounds like the technology maturation has finally caught up on the VM ware standpoint with the vision that you and the team had. So give us a little bit. Look forward now that you've got kubernetes really being embraced by VM where on and starting to see maturation in this space. Where do we go from here? >>So we were actually, If you see what they brought to the table this time, they didn't actually catch up with others, right? Typically, the innovation in the recent times happened in the open source space and then the large vendors will come and innovate. Startups and open source started the innovation large, large. When the large winters come in later. But this time around, remember, actually did the innovation part and these and direct. It's actually a big step forward in the Covenant of CSE space. And the reason why it's a big step is C s A. Traditionally is designed for the sand gnats vendors and using the same C s. A model, remember, was able to bring in large work clothes and that allowed entirely to use the local drive possibility. Right now it moving forward. What What we will see. What were said to see is the cloud native workload. Actually a ran as a silo in the Enterprise, right? There was big data workloads. There was the applications team that ran Cooper knitters and containers on their own. There are on their on their own develop shop on enterprise. I'd ran the idea introspect These three were not connected on finally this time around. By bringing cover natives native into the I T infrastructure, there is going to be a convergence. You will not. The silos will get eliminated. Big data, big data workloads, ml wear clothes on bare metal will now come toe come toe. Then I will be aware that the Governor disk combination and you will see the the coordinative applications space. They will hand over the physical layer infrastructure onto the VM Ware e and everybody coming together. I think it's the best. Big step forward. >>Well, maybe. I sure hope you're right. We love to see the breaking down of silos. Things coming together. We've been a little bit concerned over the last few years that we're rebuilding the silos in the cloud. We've got different skill sets different there, but we always love some good tech optimism here, uh, to say that we're gonna move these sorts of Thank you so much. Great to catch up with you and definitely look forward to hearing more from you and your customers in the future. >>Thank you to this. Wonderful to be on your show. >>All right. We want to thank everybody for joining VM World 2020 for day. Volonte John, for your big thanks to the whole production team and of course, VM Ware and our sponsors for helping us to bring this content to you. As always, I'm stew Minuteman and thank you for joining us on the Cube

Published Date : Sep 30 2020

SUMMARY :

So I have the co founder and Thank you for having me on the show. I Oh, I oh, is in the name of your company. And in the Cooper Natives, if you look for an object storage know the technology bring and why you think it's the fit for if you extend making but bulk of the data if it's going to be generated outside AWS while AWS You know how you fit with Because if we're talking about object And even on the on the private So for our for our friends in the in the channel market on when thinking Even the object storage hardware appliance has no role to play Is that the case? And that is from the private cloud point a few right, and it's purely software. Or some of the more prevalent use cases, Yeah, So in the public cloud, if you see, that's that. I was expecting to hear you talk about Splunk data protection, This is something important for I t right in the past, if you see big data workloads always ran on the VM ware standpoint with the vision that you and the team had. And the Great to catch up with you and Thank you to this. As always, I'm stew Minuteman and thank you for joining us on the Cube

ENTITIES

Entity	Category	Confidence
Amazon	ORGANIZATION	0.99+
six months	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
HP	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
$540,000	QUANTITY	0.99+
NASA	ORGANIZATION	0.99+
Volonte John	PERSON	0.99+
10 terabytes	QUANTITY	0.99+
five minutes	QUANTITY	0.99+
10 petabytes	QUANTITY	0.99+
2020	DATE	0.99+
2017	DATE	0.99+
thousands	QUANTITY	0.99+
VM Ware	ORGANIZATION	0.99+
VM World 2020	EVENT	0.99+
11 years	QUANTITY	0.99+
Visa	ORGANIZATION	0.99+
first	QUANTITY	0.99+
400	QUANTITY	0.99+
Minhai	ORGANIZATION	0.99+
56 years	QUANTITY	0.98+
VMworld 2020	EVENT	0.98+
first time	QUANTITY	0.98+
seven	QUANTITY	0.98+
both	QUANTITY	0.97+
Splunk	ORGANIZATION	0.97+
VM Ware	TITLE	0.97+
one third	QUANTITY	0.96+
540,000 doctors	QUANTITY	0.96+
three	QUANTITY	0.96+
two containers	QUANTITY	0.96+
telco	ORGANIZATION	0.96+
ICTY	ORGANIZATION	0.96+
VCR seven	TITLE	0.95+
this week	DATE	0.95+
azure	ORGANIZATION	0.95+
Cooper	ORGANIZATION	0.94+
Direct	ORGANIZATION	0.94+
one leg	QUANTITY	0.91+
stew Minuteman	PERSON	0.9+
Snowflake	TITLE	0.9+
Cooper Natives	ORGANIZATION	0.89+
Harry Asami A B.	PERSON	0.89+
spring boot	TITLE	0.89+
MinIO	ORGANIZATION	0.88+
a day	QUANTITY	0.85+
Mini Oh	ORGANIZATION	0.83+
Dr Pools	ORGANIZATION	0.83+
Cloud	TITLE	0.81+
one	QUANTITY	0.8+
million	QUANTITY	0.79+
S three	COMMERCIAL_ITEM	0.78+
last few years	DATE	0.77+
C s A.	TITLE	0.75+
V Sand	TITLE	0.75+
VM World 2020	EVENT	0.73+
Splunk	TITLE	0.72+
Cooper	PERSON	0.72+
Covenant	TITLE	0.7+
Prem	ORGANIZATION	0.7+
next 10 years	DATE	0.7+
Coburn	ORGANIZATION	0.67+
VM Ware e	TITLE	0.67+
C s.	TITLE	0.67+
Iot	TITLE	0.66+
Babu	PERSON	0.65+
last	DATE	0.64+
Dr	ORGANIZATION	0.63+
one hybrid	QUANTITY	0.62+
Cube Anand	ORGANIZATION	0.61+
CSE	TITLE	0.6+

Vittorio Viarengo, VP of Cross Cloud Services, VMware | VMware Explore 2022

(gentle music intro) >> Okay, we're back. We're live here at theCUBE and at VMworld, VMware Explore, formally VMworld. I'm John Furrier with Dave Vellante. Three days of wall to wall coverage, we've got Vittorio Viarengo, the vice president of Cross-Cloud Services at VMware. Vittorio, great to see you, and thanks for coming on theCUBE right after your keynote. I can't get that off my tongue, VMworld. 12 years of CUBE coverage. This is the first year of VMware Explore, formerly VMworld. Raghu said in his keynote, he explained the VMworld community now with multi-clouds that you're in charge of at VMworld, VMware, is now the Explore brand's going to explore the multi-cloud, that's a big part of Raghu's vision and VMware. You're driving it and you are on the stage just now. What's, what's going on? >> Yeah, what I said at my keynote note is that our customers have been the explorer of IT, new IT frontier, always challenging the status quo. And we've been, our legendary engineering team, been behind the scenes, providing them with the tools of the technology to be successful in that journey to the private cloud. And Kelsey said it. What we built was the foundation for the cloud. And now it's time to start a new journey in the multi-cloud. >> Now, one of the things that we heard today clearly was: multi-cloud's a reality. Cloud chaos, Kit Colbert was talking about that and we've been saying, you know, people are chaotic. We believe that. Andy Grove once said, "Reign in the chaos. Let chaos reign, then reign in the chaos." That's the opportunity. The complexity of cross-cloud is being solved. You guys have a vision, take us through how you see that happening. A lot of people want to see this cross-cloud abstraction happen. What's the story from your standpoint, how you see that evolving? >> I think that IT history repeats itself, right? Every starts nice and neat. "Oh, I'm going to buy a bunch of HP servers and my life is going to be good, and oh, this store." >> Spin up an EC2. >> Yeah. Eventually everything goes like this in IT because every vendor do what they do, they innovate. And so that could create complexity. And in the cloud is the complexity on steroid because you have six major cloud, all the local clouds, the cloud pro- local cloud providers, and each of these cloud brings their own way of doing management security. And I think now it's time. Every customer that I talk to, they want more simplicity. You know, how do I go fast but be able to manage the complexity? So that's where cross-cloud services- Last year, we launched a vision, with a sprinkle of software behind it, of building a set of cloud-native services that allow our customers to build, run, manage, secure, and access any application consistently across any cloud. >> Yeah, so you're a year in now, it's not like, I mean, you know, when you come together in a physical event like this, it resonates more, you got the attention. When you're watching the virtual events, you get doing a lot of different things. So it's not like you just stumbled upon this last week. Okay, so what have you learned in the last year in terms of post that launch. >> What we learned is what we have been building for the last five years, right? Because we started, we saw multi-cloud happening before anybody else, I would argue. With our announcement with AWS five, six years ago, right? And then our first journey to multi-cloud was let's bring vSphere on all the clouds. And that's a great purpose to help our customers accelerate their journey of their "legacy" application. Their application actually deliver business to the cloud. But then around two, three years ago, I think Raghu realized that to add value, we needed- customers were already in the cloud, we needed to embrace the native cloud. And that's where Tanzu came in as a way to build application. Tanzu manage, way to secure manage application. And now with Aria, we now have more differentiated software to actually manage this application across- >> Yeah, and Aria is the management plane. That's the rebrand. It's not a new product per se. It's a collection of the VMware stuff, right? Isn't it like- >> No, it's, it's a... >> It's a new product? >> There is a new innovation there because basically they, the engineering team built this graph and Raghu compared it to the graph that Google builds up around about the web. So we go out and crawl all your assets across any cloud and we'll build you this model that now allows you to see what are your assets, how you can manage them, what are the performance and all that, so. No, it's more than a brand. It's, it's a new innovation and integration of a technology that we had. >> And that's a critical component of cross-cloud. So I want to get back to what you said about Raghu and what he's been focused on. You know, I remember interviewing him in 2016 with Andy Jassy at AWS, and that helped clear up the cloud game. But even before that Raghu and I had talked, Dave, on theCUBE, I think it was like 2014? >> Yeah. >> Pat Gelson was just getting on board as the CEO of VMware. Hybrid was very much on the conversation then. Even then it was early. Hybrid was early, you guys are seeing multi-cloud early. >> It was private cloud. >> Totally give you props on that. So VMware gets total props on that, being right on that. Where are we in that journey? 'Cause super cloud, as we're talking about, you were contributing to that initiative in the open with our open source project. What is multi-cloud? Where is it in the status of the customer? I think everyone will agree, multi-cloud is an outcome that's going to happen. It's happening. Everyone has multiple clouds and they configure things differently. Where are we on the progress bar in your mind? >> I think I want to answer that question and go back to your question, which I didn't address, you know, what we are learning from customers. I think that most customers are at the very, very beginning. They're either in the denial stage, like yesterday talked to a customer, I said, "Are you multi-cloud, are you on your multi-cloud journey?" And he said, "Oh we are on-prem and a little bit of Azure." I said, "Oh really? So the bus- "Oh no, well the business unit is using AWS, right? And we are required company that is using-" I said, "Okay, so you are... that customer is in cloud first stage." >> Like you said, we've seen this movie before. It comes around, right? >> Yeah. >> Somebody's going to have to clean that up at some point. >> Yeah, I think a lot, a lot of- the majority customers are either in denial or in the cloud chaos. And some customers are pushing the envelope like SMP. SMP Global, we heard this morning. Somebody has done all the journey in the private cloud with us, and now I said, and I talked to him a few months ago, he told me, "I had to get in front of my developers. Enough of this, you know, wild west. I had to lay down the tracks and galleries for them to build multi-cloud in a way that was, give them choice, but for me, as an operator and a security person, being able to manage it and secure it." And so I think most customers are in that chaos phase right now. Very early. >> So at our Supercloud22 event, we were riffing and I was asking you about, are you going to hide the complexity, yes. But you're also going to give access to the, to the developers if they want access to the primitives. And I said to you, "It sounds like you want to have your cake and eat it too." And you said, "And want to lose weight." And I never followed up with you, so I want to follow up now. By "lose weight," I presume you mean be essentially that platform of choice, right? So you're going to, you're going to simplify, but you're going to give access to the developers for those primitives, if in fact they want one. And you're going to be the super cloud, my word of choice. So my question to you is why, first of all, is that correct, your "lose weight"? And why VMware? >> When I say you, you want a cake, eat it and lose weight, I, and I'm going to sound a little arrogant, it's hard to be humble when you're good. But now I work for a company, I work for a company that does that. Has done it over and over and over again. We have done stuff, I... Sometimes when I go before customers, I say, "And our technology does this." Then the customer gets on stage and I go, "Oh my God, oh my God." And then the customers say, "Yeah, plus I realize that I could also do this." So that's, you know, that's the kind of company that we are. And I think that we were so busy being successful with on-prem and that, you know, that we kind of... the cloud happened. Under our eyes. But now with the multi-cloud, I think there is opportunity for VMware to do it all over again. And we are the right company to do it for two reasons. One, we have the right DNA. We have those engineers that know how to make stuff that was not designed to work together work together and the right partnership because everybody partners with us. >> But, you know, a lot of companies like, oh, they missed cloud, they missed mobile. They missed that, whatever it was. VMware was very much aware of this. You made an effort to do kind of your own cloud initiative, backed off from- and everybody was like, this is a disaster waiting to happen and of course it was. And so then you realize that, you learn from your mistakes, and then you embraced the AWS deal. And that changed everything, it changed... It cleared it up for your customers. I'm not hearing anybody saying that the cross-cloud services strategy, what we call multi, uh, super cloud is wrong. Nobody's saying that's like a failed, you know, strategy. Now the execution obviously is very important. So that's why I'm saying it's different this time around. It's not like you don't have your pulse on it. I mean, you tried before, okay, the strategy wasn't right, it backfired, okay, and then you embraced it. But now people are generally in agreement that there's either a problem or there's going to be a problem. And so you kind of just addressed why VMware, because you've always been in the catbird seat to solve those problems. >> But it is a testament to the pragmatism of the company. Right? You try- In technology, you cannot always get it right, right? When you don't get it right, say, "Okay, that didn't work. What is the next?" And I think now we're onto something. It's a very ambitious vision for sure. But I think if you look at the companies out there that have the muscles and the DNA and the resources to do it, I think VMware is one. >> One of the risks to the success, what's been, you know you watch the Twitter chatter is, "Oh, can VMware actually attract the developers?" John chimed in and said, >> Yeah. >> It's not just the devs. I mean, just devs. But also when you think of DevOps, the ops, right? When you think about securing and having that consistent platform. So when you think about the critical factors for you to execute, you have to have that pass platform, no question. Well, how do you think about, okay, where are the gaps that we really have to get right? >> I think that for us to go and get the developers on board, it's too late. And it's too late for most companies. Developers go with the open source, they go with the path of least resistance. So our way into that, and as Kelsey Hightower said, building new application, more applications, is a team sport. And part of that team is the Ops team. And there we have an entry, I think. Because that's what- >> I think it possible. I think you, I think you're hitting it. And my dev comment, by the way, I've been kind of snarky on Twitter about this, but I say, "Oh, Dev's got it easy. They're sitting in the beach with sunglasses on, you know, having focaccia. >> Doing whatever they want. >> Happy doing whatever they want. No, it's better life for the developer now. Open source is the software industry, that's going great. Shift left in CI/CD pipeline. Developers are faster than ever, they're innovating. It's all self-service, it's all DevOps. It's looking good for the developers right now. And that's why everyone's focused on that. They're driving the change. The Ops team, that was traditional IT Ops, is now DevOps with developers. So the seed change of data and security, which is core, we're hearing a lot of those. And if you look at all the big successes, Snowflake, Databricks, MinIO, who was on earlier with the S3 cloud storage anywhere, this is the new connective tissue that VMware can connect to and extend the operational platform of IT and connect developers. You don't need to win them all over. You just connect to them. >> You just have to embrace the tools that they're using. >> Exactly. >> You just got to connect to them. >> You know, you bring up an interesting point. Snowflake has to win the developers, 'cause they're basically saying, "Hey, we're building an application development platform on top of our proprietary system." You're not saying that. You're saying we're embracing the open source tools that developers are using, so use them. >> Well, we gave it a single pane of glass to manage your application everywhere. And going back to your point about not hiding the underlying primitives, we manage that application, right? That application could be moving around, but nobody prevents that application to use that API underneath. I mean, that's, that can always do that. >> Right, right. >> And, and one of the reason why we had Kelsey Hightower and my keynote and the main keynote was that I think he shows that the template, the blueprint for our customers, our operators, if you want to have- even propel your career forward, look at what he did, right? VI admin, going up the stack storage and everything else, and then eventually embrace Kubernetes, became an expert. Really took the time to understand how modern application were- are built. And now he's a luminary in the industry. So we don't have, all have to become luminary, but you can- our customers right here, doing the labs upstairs, they can propel the career forward in this. >> So summarize what you guys are announcing around cross cloud-services. Obviously Aria, another version, 1.3 of Tanzu. Lay out the sort of news. >> Yeah, so we- With Tanzu, we have one step forward with our developer experience so that, speaking of meeting where they are, with application templates, with ability to plug into their idea of choice. So a lot of innovation there. Then on the IR side, I think that's the name of the game in multi-cloud, is having that object model allows you to manage anything across anything. And then, we talk about cross-cloud services being a vision last year, I, when I launched it, I thought security and networking up there as a cloud, but it was still down here as ploy technology. And now with NSX, the latest version, we brought that control plane in the cloud as a cloud native cross-cloud service. So, lot of meat around the three pillars, development, the management, and security. >> And then the complementary component of vSphere 8 and vSAN 8 and the whole DPU thing, 'cause that's, 'cause that's cloud, right? I mean, we saw what AWS did with Nitro. >> Yeah. >> Five, seven years ago. >> That's the consumption model cloud. >> That's the future of computing architecture. >> And the licensing model underneath. >> Oh yeah, explain that. Right, the universal licensing model. >> Yeah, so basically what we did when we launch cloud universal was, okay, you can buy our software using credit that you have on AWS. And I said, okay, that's kind of hybrid cloud, it's not multi-cloud, right? But then we brought in Google and now the latest was Microsoft. Now you can buy our software for credits and investment that our customers already have with these great partners of ours and use it to consume as a subscription. >> So that kind of changes your go-to-market and you're not just chasing an ELA renewal now. You're sort of thinking, you're probably talking to different people within the organizations as well, right? So if I can use credits for whatever, Google, for Azure, for on-prem, for AWS, right? Those are different factions necessarily in the organization. >> So not just the technology's multi-cloud, but also the consumption model is truly multi-cloud. >> Okay, Vittorio, what's next? What's the game plan? What do you have going on? It's getting good traction here again, like Dave said, no one's poo-pooing cross-cloud services. It is kind of a timing market forces. We were just talking before you came on. Oh, customers don't- may not think they have a problem, whether they're the frog boiling water or not, they will have the problem coming up or they don't think they have a problem, but they have chaos reigning. So what's next? What are you doing? Is it going to be new tech, new market? What is the plan? >> So I think for, if I take my bombastic kind of marketing side of me hat off and I look at the technology, I think the customers in these scales wants to be told what to do. And so I think what we need to do going forward is articulate these cross-cloud services use cases. Like okay, what does mean to have an application that uses a service over here, a service over there, and then show the value of getting this component from one company? Because cross-cloud services at your event, how many vendors were there? 20? 30? >> Yeah. >> So the market is there. I mean, these are all revenue-generating companies, right, but they provide a piece of the puzzle. Our ambition is to provide a platform approach. And so we need to articulate better, what are the advantages of getting these components management, security, from- >> And Kit, Kit was saying, it's a hybrid kind of scenario. I was kind of saying, oh, putting my little business school scenario hat on, oh yeah, you go hardcore competitive, best product wins, kill or be killed, compete and win. Or you go open and you create a keiretsu, create a consortium, and get support, standardize or defacto standardize a bunch of it, and then let everyone monetize or participate. >> Yeah, we cannot do it alone. >> What's the approach? What's the approach you guys want to take? >> So I think whatever possible, first of all, we're not going to do it alone. Right, so the ecosystem is going to play a part and if the ecosystem can come together around the consortium or a standard that makes sense for customers? Absolutely. >> Well, and you say, nobody's poo-pooing it, and I stand by that. But they are saying, and I think it is true, it's hard, right? It's a very challenging, ambitious goal that you have. But yeah, you've got a track record of- >> I mean the old playbook, >> Exactly! >> The old playbooks are out. I mean, I always say, the old kill and be highly competitive strategy. Proprietary is dead. And then if you look at the old way of winning was, okay, you know, we're going to lock customers in- >> What do you mean proprietary is dead? Proprietary's not dead. >> No, I mean like, I'm talking- Okay, I'm talking about how people sell. Enterprise companies love to create, simplify, create value with chaos like okay, complexity with more complexity. So that's over, you think that's how people are marketing? >> No, no, it's true. But I mean, we see a lot of proprietary out there. >> Like what? >> It's still happening. Snowflake. (laughing) >> Tell that to the entire open store software industry. >> Right, well, but that's not your play. I mean, you have to have some kind of proprietary advantage. >> The enterprise playbook used to be solve complexity with complexity, lock the customers in. Cloud changed all that with open. You're a seasoned marketer, you're also an executive. You have an interesting new wave. How do you market to the enterprise in this new open way? How do you win? >> For us, I think we have that relationship with the C-level and we have delivered for them over and over again. So our challenge from a marketing perspective is to educate these executives about all that. And the fact that we didn't have this user conference in person didn't help, right? And then show that value to the operator so that they can help us just like we did in the past. I mean, our sales motion in the past was we made these people- I told them today, you were the heroes. When you virtualized, when you brought down 1000 servers to 80, you were the hero, right? So we need to empower them with the technology and the know-how to be heroes again in multi-cloud. And I think the business will take care of itself. >> Okay final question from me, and Dave might have another one of his, everybody wanted to know this year at VMworld, VMware Explore, which is the new name, what would it look like? What would the vibe be? Would people show up? Would it be vibrant? Would cross-cloud hunt? Would super cloud be relevant? I got to say looking at the floor last night, looking at the keynotes, looking at the perspective, it seems to look like, oh, people are on board. What is your take on this? You've been talking to customers, you're talking to people in the hallways. You've been brief talking to all the analysts. What is the vibe about this year's Explore? >> I think, you've been covering us for a long time, this is a religious following we have. And we don't take it for granted. I told the audience today, this to us is a family reunion and we couldn't be, so we got a sense of like, that's what I feels like the family is back together. >> And there's a wave coming too. It's not like business is dying. It's like a whole 'nother. Another wave is coming. >> It's funny you mention about the heroes. 'Cause I go back, I don't really have my last question, but it's just the last thought is, I remember the first time I saw a demo of VMware and I went, "Holy crap, wow. This is totally game changing." I was blown away. Right, like you said, 80 servers down to just a couple of handfuls. This is going to change everything. And that's where it all started. You know, I mean, I know it started in workstations, but that's when it really became transformational. >> Yeah, so I think we have an opportunity to do it over again with the family that is here today, of which you guys consider family as well. >> All right, favorite part of the keynote and then we'll wrap up. What was your favorite part of the keynote today? >> I think the excitement from the developer people that were up there. Kelsey- >> The guy who came after Kelsey, what was his name? I didn't catch it, but he was really good. >> Yeah, I mean, it's, what it's all about, right? People that are passionate about solving hard problems and then cannot wait to share it with the community, with the family. >> Yeah. I love the one line, "You kids have it easy today. We walk to school barefoot in the snow back in the day." >> Uphill, both ways. >> Broke the ice to wash our face. >> Vittorio, great to see you, great friend of theCUBE, CUBE alumni, vice president of cross-cloud serves at VMware. A critical new area that's harvesting the fruits coming off the tree as VMware invested in cloud native many years ago. It's all coming to the market, let's see how it develops. Congratulations, good luck, and we'll be back with more coverage here at VMware Explore. I'm John Furrier with Dave Vellante. Stay with us after the short break. (gentle music)

Published Date : Aug 30 2022

SUMMARY :

is now the Explore brand's going And now it's time to start a What's the story from your standpoint, and my life is going to be And in the cloud is the I mean, you know, when you come together for the last five years, right? Yeah, and Aria is the management plane. and Raghu compared it to the and that helped clear up the cloud game. on board as the CEO of VMware. in the open with our open source project. I said, "Okay, so you are... Like you said, we've Somebody's going to have to in the private cloud with us, So my question to you is why, and the right partnership that the cross-cloud services strategy, and the resources to do it, of DevOps, the ops, right? And part of that team is the Ops team. And my dev comment, by the way, and extend the operational platform of IT the tools that they're using. the open source tools And going back to your point And now he's a luminary in the industry. Lay out the sort of news. So, lot of meat around the three pillars, I mean, we saw what AWS did with Nitro. That's the future of Right, the universal licensing model. and now the latest was Microsoft. in the organization. So not just the What is the plan? and I look at the technology, So the market is there. oh yeah, you go hardcore and if the ecosystem can come Well, and you say, And then if you look at What do you mean proprietary is dead? So that's over, you think But I mean, we see a lot It's still happening. Tell that to the entire I mean, you have to have some lock the customers in. and the know-how to be What is the vibe about the family is back together. And there's a wave coming too. I remember the first time to do it over again with the All right, favorite part of the keynote from the developer people I didn't catch it, but he was really good. and then cannot wait to I love the one line, "You that's harvesting the

ENTITIES

Entity	Category	Confidence
Dave	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Andy Grove	PERSON	0.99+
Pat Gelson	PERSON	0.99+
Kelsey	PERSON	0.99+
Andy Jassy	PERSON	0.99+
John	PERSON	0.99+
Vittorio Viarengo	PERSON	0.99+
2016	DATE	0.99+
AWS	ORGANIZATION	0.99+
2014	DATE	0.99+
VMware	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
Kit Colbert	PERSON	0.99+
Vittorio	PERSON	0.99+
SMP Global	ORGANIZATION	0.99+
Raghu	PERSON	0.99+
Google	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Last year	DATE	0.99+
HP	ORGANIZATION	0.99+
Cross Cloud Services	ORGANIZATION	0.99+
12 years	QUANTITY	0.99+
One	QUANTITY	0.99+
CUBE	ORGANIZATION	0.99+
last year	DATE	0.99+
Databricks	ORGANIZATION	0.99+
80	QUANTITY	0.99+
each	QUANTITY	0.99+
Kelsey Hightower	PERSON	0.99+
VMworld	ORGANIZATION	0.99+
80 servers	QUANTITY	0.99+
today	DATE	0.99+
yesterday	DATE	0.99+
two reasons	QUANTITY	0.99+
Tanzu	ORGANIZATION	0.99+
Snowflake	ORGANIZATION	0.98+
last night	DATE	0.98+
VMware Explore	ORGANIZATION	0.98+
Five	DATE	0.98+
first year	QUANTITY	0.98+
six years ago	DATE	0.98+
first stage	QUANTITY	0.98+
MinIO	ORGANIZATION	0.98+
vSphere	TITLE	0.98+
last week	DATE	0.98+
1000 servers	QUANTITY	0.98+
SMP	ORGANIZATION	0.98+
seven years ago	DATE	0.98+
this year	DATE	0.97+
both ways	QUANTITY	0.97+
Three days	QUANTITY	0.97+
three years ago	DATE	0.97+
Aria	ORGANIZATION	0.97+
first journey	QUANTITY	0.97+
Supercloud22	EVENT	0.97+
NSX	ORGANIZATION	0.97+

UNLIST TILL 4/2 - Migrating Your Vertica Cluster to the Cloud

>> Jeff: Hello everybody, and thank you for joining us today for the virtual Vertica BDC 2020. Today's break-out session has been titled, "Migrating Your Vertica Cluster to the Cloud." I'm Jeff Healey, and I'm in Vertica marketing. I'll be your host for this break-out session. Joining me here are Sumeet Keswani and Chris Daly, Vertica product technology engineers and key members of our customer success team. Before we begin, I encourage you to submit questions and comments during the virtual session. You don't have to wait, just type your question or comment in the question box below the slides and click Submit. As always, there will be a Q&A session at the end of the presentation. We'll answer as many questions as we're able to during that time. Any questions that we don't address, we'll do our best to answer them offline. And alternatively, you can visit Vertica forums at forum.vertica.com to post your questions there after the session. Our engineering team is planning to join the forums to keep the conversation going. Also as a reminder that you can maximize your screen by clicking the double arrow button in the lower right corner of the slides. And yes, this virtual session is being recorded and will be available to view on demand this week. We'll send you a notification as soon as it's ready. Now let's get started. Over to you, Sumeet. >> Sumeet: Thank you, Jeff. Hello everyone, my name is Sumeet Keswani, and I will be talking about planning to deploy or migrate your Vertica cluster to the Cloud. So you may be moving an on-prem cluster or setting up a new cluster in the Cloud. And there are several design and operational considerations that will come into play. You know, some of these are cost, which industry you are in, or which expertise you have, in which Cloud platform. And there may be a personal preference too. After that, you know, there will be some operational considerations like VM and cluster sizing, what Vertica mode you want to deploy, Eon or Enterprise. It depends on your use keys. What are the DevOps skills available, you know, what elasticity, separation you need, you know, what is your backup and DR strategy, what do you want in terms of high availability. And you will have to think about, you know, how much data you have and where it's going to live. And in order to understand the cost, or the cost and the benefit of deployment and you will have to understand the access patterns, and how you are moving data from and to the Cloud. So things to consider before you move a deployment, a Vertica deployment to the Cloud, right, is one thing to keep in mind is, virtual CPUs, or CPUs in the Cloud, are not the same as the usual CPUs that you've been familiar with in your data center. A vCPU is half of a CPU because of hyperthreading. There is definitely the noisy neighbor effect. There is, depending on what other things are hosted in the Cloud environment, you may see performance, you may occasionally see performance issues. There are I/O limitations on the instance that you provision, so that what that really means is you can't always scale up. You might have to scale up, basically, you have to add more instances rather than getting bigger or the right size instances. Finally, there is an important distinction here. Virtualization is not free. There can be significant overhead to virtualization. It could be as much as 30%, so when you size and scale your clusters, you must keep that in mind. Now the other important aspect is, you know, where you put Vertica cluster is important. The choice of the region, how far it is from your various office locations. Where will the data live with respect to the cluster. And remember, popular locations can fill up. So if you want to scale out, additional capacity may or may not be available. So these are things you have to keep in mind when picking or choosing your Cloud platform and your deployment. So at this point, I want to make a plug for Eon mode. Eon mode is the latest mode, is a Cloud mode from Vertica. It has been designed with Cloud economics in mind. It uses shared storage, which is durable, available, and very cheap, like S3 storage or Google Cloud storage. It has been designed for quick scaling, like scale out, and highly elastic deployments. It has also been designed for high workload isolation, where each application or user group can be isolated from the other ones, so that they'll be paid and monitored separately, without affecting each other. But there are some disadvantages, or perhaps, you know, there's a cost for using Eon mode. Storage in S3 is neither cheap nor efficient. So there is a high latency of I/O when accessing data from S3. There is API and data access cost. There is API and data access cost associated with accessing your data in S3. Vertica in Eon mode has a pay as you go model, which you know, works for some people and does not work for others. And so therefore it is important to keep that in mind. And performance can be a little bit variable here, because it depends on cache, it depends on the local depot, which is a cache, and it is not as predictable as EE mode, so that's another trade-off. So let's spend about a minute and see how a Vertica cluster in Eon mode looks like. A Vertica cluster in Eon mode has S3 as the durability layer where all the data sits. There are subclusters, which are essentially just aggregation groups, which is separated compute, which will service different workloads. So for in this example, you may have two subclusters, one servicing ETL workload and the other one servicing (mic interference obscures speaking). These clusters are isolated, and they do not affect each other's performance. This allows you to scale them independently and isolate workloads. So this is the new Vertica Eon mode which has been specifically designed by us for use in the Cloud. But beyond this, you can use EE mode or Eon mode in the Cloud, it really depends on what your use case is. But both of these are possible, and we highly recommend Eon mode wherever possible. Okay, let's talk a little bit about what we mean by Vertica support in the Cloud. Now as you know, a Cloud is a shared data center, right. Performance in the Cloud can vary. It can vary between regions, availability zones, time of the day, choice of instance type, what concurrency you use, and of course the noisy neighbor effect. You know, we in Vertica, we performance, load, and stress test our product before every release. We have a bunch of use cases, we go through all of them, make sure that we haven't, you know, regressed any performance, and make sure that it works up to standards and gives you the high performance that you've come to expect. However, your solution or your workload is unique to you, and it is still your responsibility to make sure that it is tuned appropriately. To do this, one of the easiest things you can do is you know, pick a tested operating system, allocate the virtual machine, you know, with enough resources. It's something that we recommend, because we have tested it thoroughly. It goes a long way in giving you predictability. So after this I would like to now go into the various platforms, Cloud platforms, that Vertica has worked on. And I'll start with AWS, and my colleague Chris will speak about Azure and GCP. And our thoughts forward. So without further ado, let's start with the Amazon Web Services platform. So this is Vertica running on the Amazon Web Services platform. So as you probably are all aware, Amazon Web Services is the market leader in this space, and indeed really our biggest provider by far, and have been here for a very long time. And Vertica has a deep integration in the Amazon Web Services space. We provide a marketplace offering which has both pay as you go or a bring your own license model. We have many, you know, knowledge base articles, best practices, scripts, and resources that help you configure and use a Vertica database in the Cloud. We have several customers in the Cloud for many, many years now, and we have managed and console-based point and click deployments, you know, for ease of use in the Cloud. So Vertica has a deep integration in the Amazon space, and has been there for quite a bit now. So we communicate a lot of experience here. So let's talk about sizing on AWS. And sizing on any platform comes down to you know, these four or five different things. It comes down to picking the right instance type, picking the right disk volume and type, tuning and optimizing your networking, and finally, you know, some operational concerns like security, maintainability, and backup. So let's go into each one of these on the AWS ecosystem. So the choice of instance type is one of the important choices that you will make. In Eon mode, you know, you don't really need persistent disk. You can, you should probably choose ephemeral disk because it gives you extra speed, and speed with the instance type. We highly recommend the i3.4x instance types, which are very economical, have a big, 4 terabyte depot or cache per node. The i3.metal is similar to the i3.4, but has got significantly better performance, for those subclusters that need this extra oomph. The i3.2 is good for scale out of small ad hoc clusters. You know, they have a smaller cache and lower performance but it's cheap enough to use very indiscriminately. If you were in EE mode, well we don't use S3 as the layer of durability. Your local volumes is where we persist the data. Hence you do need an EBS volume in EE mode. In order to make sure that, you know, that the instance or the deployment is manageable, you might have to use some sort of a software RAID array over the EBS volumes. The most common instance type you see in EE mode is the r4.4x, the c4, or the m4 instance types. And then of course for temp space and depot we always recommend instance volumes. They're just much faster. Okay. So let's go, let's talk about optimizing your network or tuning your network. So the best, the best thing you can do about tuning your network, especially in Eon mode but in other modes too, is to get a VPC S3 endpoint. This is essentially a route table that makes sure that all traffic between your cluster and S3 goes over an internal fabric. This makes it much faster, you don't pay for egress cost, especially if you're doing external tables or your communal storage, but you do need to create it. Many times people will forget doing it. So you really do have to create it. And best of all, it's free. It doesn't cost you anything extra. You just have to create it during cluster creation time, and there's a significant performance difference for using it. The next thing about tuning your network is, you know, sizing it correctly. Pick the closest geographical region to where you'll consume the data. Pick the right availability zone. We highly recommend using cluster placement groups. In fact, they are required for the stability of the cluster. A cluster placement group is essentially, it operates this notion of rack. Nodes in a cluster placement group, are, you know, physically closer to each other than they would otherwise be. And this allows, you know, a 10 Gbps, bidirectional, TCP/IP flow between the nodes. And this makes sure that, you know, you get a high amount of Gbps per second. As you probably are all aware, the Cloud does not support broadcast or UDP broadcast. Hence you must use point-to-point UDP for spread in the Cloud, or in AWS. Beyond that, you know, point-to-point UDP does not scale very well beyond 20 nodes. So you know, as your cluster sizes increase, you must switch over to large cluster mode. And finally, use instances with enhanced networking or SR-IOV support. Again, it's free, it comes with the choice of the instance type and the operating system. We highly recommend it, it makes a big difference in terms of how your workload will perform. So let's talk a little bit about security, configuration, and orchestration. As I said, we provide CloudFormation scripts to make the ease of deployment. You can use the MC point and click. With regard to security, you know, Vertica does support instance profiles out of the box in Amazon. We recommend you use it. This is highly desirable so that you're not passing access keys and secret keys around. If you use our marketplace image, we have picked the latest operating systems, we have patched them, Amazon actually validates everything on marketplace and scans them for security vulnerabilities. So you get that for free. We do some basic configuration, like we disable root ssh access, we disallow any password access, we turn on encryption. And we run a basic set of security checks to make sure that the image is secure. Of course, it could be made more secure. But we try to balance out security, performance, and convenience. And finally, let's talk about backups. Especially in Eon mode I get the question, "Do we really need to back up our system, "since the data is in S3?" And the answer is yes, you do. Because you know, S3's not going to protect you against an accidental drop table. You know, S3 has a finite amount of reliability, durability, and availability. And you may want to be able to restore data differently. Also, backups are important if you're doing DR, or if you have additional cluster in a different region. The other cluster can be considered a backup. And finally, you know, why not create a backup or a disaster recovery cluster, you know, storage is cheap in the Cloud. So you know, we highly recommend you use it. So with this, I would like to hand it over to my colleague Christopher Daly, who will talk about the other two platforms that we support, that is Google and Azure. Over to you, Chris, thank you. >> Chris: Thanks, Sumeet, and hi everyone. So while there's no argument that we here at Vertica have a long history of running within the Amazon Web Services space, there are other alternative Cloud service providers where we do have a presence, such as Google Cloud Platform, or GCP. For those of you who are unfamiliar with GCP, it's considered the third-largest Cloud service provider in the marketspace, and it's priced very competitively to its peers. Has a lot of similarities to AWS in the products and services that it offers, but it tends to be the go-to place for newer businesses or startups. We officially started supporting GCP a little over a year ago with our first entry into their GCP marketplace. So a solution that deployed a fully-functional and ready-to-use Enterprise mode cluster. We followed up on that with the release and the support of Google storage buckets, and now I'm extremely pleased to announce that with the launch of Vertica 10, we're officially supporting Eon mode architecture in GCP as well. But that's not all, as we're adding additional offerings into the GCP marketplace. With the launch of version 10 we'll be introducing a second listing in the marketplace that allows for the deployment of an Eon mode cluster. It's all being driven by our own management consult. This will allow customers to quickly spin up Eon-based clusters within the GCP space. And if that wasn't enough, I'm also pleased to tell you that very soon after the launch we're going to be offering Vertica by the hour in GCP as well. And while we've done a lot to automate the solutions coming out of the marketplace, we recognize the simple fact that for a lot of you, building your cluster manually is really the only option. So with that in mind, let's talk about the things you need to understand in GCP to get that done. So wag me if you think this slide looks familiar. Well nope, it's not an erroneous duplicate slide from Sumeet's AWS section, it's merely an acknowledgement of all the things you need to consider for running Vertica in the Cloud. In Vertica, the choice of the operational mode will dictate some of the choices you'll need to make in the infrastructure, particularly around storage. Just like on-prem solutions, you'll need to understand the disk and networking capacities to get the most out of your cluster. And one of the most attractive things in GCP is the pricing, as it tends to run a little less than the others. But it does translate into less choices and options within the environment. If nothing else, I want you to take one thing away from this slide, and Sumeet said this earlier. VMs running, about AWS, Sumeet said this about AWS earlier. VMs running in the GCP space run on top of hardware that has hyperthreading enabled. And that a vCPU doesn't equate to a core, but rather a processing thread. This becomes particularly important if you're moving from an on-prem environment into the Cloud. Because a physical Vertica node with 32 cores is not the same thing as a VM with 32 vCPUs. In fact, with 32 vCPUs, you're only getting about 16 cores worth of performance. GCP does offer a handful of VM types, which they categorize by letter, but for us, most of these don't make great choices for Vertica nodes. The M series, however, does offer a good core to memory ratio, especially when you're looking at the high-mem variants. Also keep in mind, performance in I/O, such as network and disk, are partially dependent on the VM size, so customers in GCP space should be focusing on 16 vCPU VMs and above for their Vertica nodes. Disk options in GCP can be broken down into two basic types, persistent disks and local disks, which are ephemeral. Persistent disks come in two forms, standard or SSD. For Vertica in Eon mode, we recommend that customers use persistent SSD disks for the catalog, and either local SSD disks or persistent SSD disks for the depot and the temp space. Couple of things to think about here, though. Persistent disks are provisioned as a single device with a settable size. Local disks are provisioned as multiple disk devices with a fixed size, requiring you to use some kind of software RAIDing to create a single storage device. So while local SSD disks provide much more throughput, you're using CPU resources to maintain that RAID set. So you're giving, it's a little bit of a trade-off. Persistent disks offer redundancy, either within the zone that they exist or within the region, and if you're selecting regional redundancy, the disks are replicated across multiple zones in the region. This does have an effect in the performance to VM, so we don't recommend this. What we do recommend is the zonal redundancy when you're using persistent disks, as it gives you that redundancy level without actually affecting the performance. Remember also, in the Cloud space, all I/O is network I/O, as disks are basically block storage devices. This means that disk actions can and will slow down network traffic. And finally, the storage bucket access in GCP is based on GCP interoperability mode, which means that it's basically compliant with the AWS S3 API. In interoperability mode, access to the bucket is granted by a key pair that GCP refers to as HMAC keys. HMAC keys can be generated for individual users or for service accounts. We will recommend that when you're creating HMAC keys, choose a service account to ensure that the keys are not tied to a single employee. When thinking about storage for Enterprise mode, things change a little bit. We still recommend persistent SSD disks over standard ones. However, the use of local SSD disks for anything other than temp space is highly discouraged. I said it before, local SSD disks are ephemeral, meaning that the data's lost if the machine is turned off or goes down. So not really a place you want to store your data. In GCP, multiple persistent disks placed into a software RAID set does not create more throughput like you can find in other Clouds. The I/O saturation usually hits the VM limit long before it hits the disk limit. In fact, performance of a persistent disk is determined not just by the size of the disk but also by the size of the VM. So a good rule of thumb in GCP is to maximize your I/O throughput for persistent disks, is that the size tends to max out at two terabytes for SSDs and 10 terabytes for standard disks. Network performance in GCP can be thought of in two distinct ways. There's node-to-node traffic, and then there's egress traffic. Node-to-node performance in GCP is really good within the zone, with typical traffic between nodes falling in the 10-15 gigabits per second range. This might vary a little from zone to zone and region to region, but usually it's only limited, they're only limited by the existing traffic where the VMs exist. So kind of a noisy neighbor effect. Egress traffic from a VM, however, is subject to throughput caps, and these are based on the size of the VM. So the speed is set for the number of vCPUs in the VM at two gigabits per second per vCPU, and tops out at 32 gigabits per second. So the larger the VM, the more vCPUs you get, the larger the cap. So some things to consider in the NAV ring space for your Vertica cluster, pick a region that's physically close to you, even if you're connecting to the GCP network from a corporate LAN as opposed to the internet. The further the packets have to travel, the longer it's going to take. Also, GCP, like most Clouds, doesn't support UDP broadcast traffic on their virtual NAV ring, so you do have to use the point-to-point flag for spread when you're creating your cluster. And since the network cap on VMs is set at 32 gigabits per second per VM, maximize your network egress throughput and don't use VMs that are smaller than 16 vCPUs for your Vertica nodes. And that gets us to the one question I get asked the most often. How do I get my data into and out of the Cloud? Well, GCP offers many different methods to support different speeds and different price points for data ingress and egress. There's the obvious one, right, across the internet either directly to the VMs or into the storage bucket. Or you can, you know, light up a VPN tunnel to encrypt all that traffic. But additionally, GCP offers direct network interconnect from your corporate network. These get provided either by Google or by a partner, and they vary in speed. They also offer things called direct or carrier peering, which is connecting the edges of the networks between your network and GCP, and you can use a CDN interconnect, which creates, I believe, an on-demand connection from the GCP network, your network to the GCP network provided by a large host of CDN service providers. So GCP offers a lot of ways to move your data around in and out of the GCP Cloud. It's really a matter of what price point works for you, and what technology your corporation is looking to use. So we've talked about AWS, we've talked about GCP, it really only leaves one more Cloud. So last, and by far not the least, there's the Microsoft Azure environment. Holding on strong to the number two place in the major Cloud providers, Azure offers a very robust Cloud offering that's attractive to customers that already consume services from Microsoft. But what you need to keep in mind is that the underlying foundation of their Cloud is based on the Microsoft Windows products. And this makes their Cloud offering a little bit different in the services and offerings that they have. The good news here, though, is that Microsoft has done a very good job of getting their virtualization drivers baked into the modern kernels of most Linux operating systems, making running Linux-based VMs in Azure fairly seamless. So here's the slide again, but now you're going to notice some slight differences. First off, in Azure we only support Enterprise mode. This is because the Azure storage product is very different from Google Cloud storage and S3 on AWS. So while we're working on getting this supported, and we're starting to focus on this, we're just not there yet. This means that since we're only supporting Enterprise mode in Azure, getting the local disk performance right is one of the keys to success of running Vertica here, with the other major key being making sure that you're getting the appropriate networking speeds. Overall, Azure's a really good platform for Vertica, and its performance and pricing are very much on par with AWS. But keep in mind that the newer versions of the Linux operating systems like RHEL and CentOS run much better here than the older versions. Okay, so first things first again, just like GCP, in Azure VMs are running on top of hardware that has hyperthreading enabled. And because of the way Hyper-V, Azure's virtualization engine works, you can actually see this, right? So if you look down into the CPU information of the VM, you'll actually see how it groups the vCPUs by core and by thread. Azure offers a lot of VM types, and is adding new ones all the time. But for us, we see three VM types that make the most sense for Vertica. For customers that are looking to run production workloads in Azure, the Es_v3 and the Ls_v2 series are the two main recommendations. While they differ slightly in the CPU to memory ratio and the I/O throughput, the Es_v3 series is probably the best recommendation for a generalized Vertica node, with the Ls_v2 series being recommended for workloads with higher I/O requirements. If you're just looking to deploy a sandbox environment, the Ds_v3 series is a very suitable choice that really can reduce your overall Cloud spend. VM storage in Azure is provided by a grouping of four different types of disks, all offering different levels of performance. Introduced at the end of last year, the Ultra Disk option is the highest-performing disk type for VMs in Azure. It was designed for database workloads where high throughput and low latency is very desirable. However, the Ultra Disk option is not available in all regions yet, although that's been changing slowly since their launch. The Premium SSD option, which has been around for a while and is widely available, can also offer really nice performance, especially higher capacities. And just like other Cloud providers, the I/O throughput you get on VMs is dictated not only by the size of the disk, but also by the size of the VM and its type. So a good rule of thumb here, VM types with an S will have a much better throughput rate than ones that don't, meaning, and the larger VMs will have, you know, higher I/O throughput than the smaller ones. You can expand the VM disk throughput by using multiple disks in Azure and using a software RAID. This overcomes limitations of single disk performance, but keep in mind, you're now using CPU cycles to maintain that raid, so it is a bit of a trade-off. The other nice thing in Azure is that all their managed disks are encrypted by default on the server side, so there's really nothing you need to do here to enable that. And of course I mentioned this earlier. There is no native access to Azure storage yet, but it is something we're working on. We have seen folks using third-party applications like MinIO to access Azure's storage as an S3 bucket. So it might be something you want to keep in mind and maybe even test out for yourself. Networking in Azure comes in two different flavors, standard and accelerated. In standard networking, the entire network stack is abstracted and virtualized. So this works really well, however, there are performance limitations. Standard networking tends to top out around four gigabits per second. Accelerated networking in Azure is based on single root I/O virtualization of the Mellanox adapter. This is basically the VM talking directly to the physical network card in the host hardware, and it can produce network speeds up to 20 gigabits per second, so much, much faster. Keep in mind, though, that not all VM types and operating systems actually support accelerated networking, and you know, just like disk throughput, network throughput is based on VM type and size. So what do you need to think about for networking in the Azure space? Again, stay close to home. Pick regions that are geographically close to your location. Yes, the backbones between the regions are very, very fast, but the more hops your packets have to make, the longer it takes. Azure offers two types of groupings of their VMs, availability sets and availability zones. Availability zones offer good redundancy across multiple zones, but this actually increases the node-to-node latency, so we recommend you avoid this. Availability sets, on the other hand, keep all your VMs grouped together within a single zone, but makes sure that no two VMs are running on the same host hardware, for redundancy. And just like the other Clouds, UDP broadcast is not supported. So you have to use the point-to-point flag when you're creating your database to ensure that the spread works properly. Spread time out, okay, this is a good one. So recently, Microsoft has started monthly rolling updates of their environment. What this looks like is VMs running on top of hardware that's receiving an update can be paused. And this becomes problematic when the pausing of the VM exceeds eight seconds, as the unpaused members of the cluster now think the paused VM is down. So consider adjusting the spread time out for your clusters in Azure to 30 seconds, and this will help avoid a little of that. If you're deploying a large cluster in Azure, more than 20 nodes, use large closer mode, as point-to-point for spread doesn't really scale well with a lot of Vertica nodes. And finally, you know, pick VM types and operating systems that support accelerated networking. The difference in the node-to-node speeds can be very dramatic. So how do we move data around in Azure, right? So Microsoft views data egress a little differently than other Clouds, as it classifies any data being transmitted by a VM as egress. However, it only bills for data egress that actually leaves the Azure environment. Egress speed limits in Azure are based entirely on the VM type and size, and then they're limited by your connection to them. While not offering as many pathways to access their Cloud as GCP, Azure does offer a direct network-to-network connection called ExpressRoute. Offered by a large group of third-party processors, partners, the ExpressRoute offers multiple tiers of performance that are based on a flat charge for inbound data and a metered charge for outbound data. And of course you can still access these via the internet, and securely through a VPN gateway. So on behalf of Jeff, Sumeet, and myself, I'd like to thank you for listening to our presentation today, and we're now ready for Q&A.

Published Date : Mar 30 2020

SUMMARY :

Also as a reminder that you can maximize your screen So the best, the best thing you can do and the larger VMs will have, you know,

ENTITIES

Entity	Category	Confidence
Chris	PERSON	0.99+
Sumeet	PERSON	0.99+
Jeff Healey	PERSON	0.99+
Chris Daly	PERSON	0.99+
Jeff	PERSON	0.99+
Christopher Daly	PERSON	0.99+
Sumeet Keswani	PERSON	0.99+
Google	ORGANIZATION	0.99+
Vertica	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Microsoft	ORGANIZATION	0.99+
10 Gbps	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
forum.vertica.com	OTHER	0.99+
30 seconds	QUANTITY	0.99+
Amazon Web Services	ORGANIZATION	0.99+
RHEL	TITLE	0.99+
Today	DATE	0.99+
32 cores	QUANTITY	0.99+
CentOS	TITLE	0.99+
more than 20 nodes	QUANTITY	0.99+
32 vCPUs	QUANTITY	0.99+
two platforms	QUANTITY	0.99+
eight seconds	QUANTITY	0.99+
Vertica	TITLE	0.99+
10 terabytes	QUANTITY	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
both	QUANTITY	0.99+
20 nodes	QUANTITY	0.99+
two terabytes	QUANTITY	0.99+
each application	QUANTITY	0.99+
S3	TITLE	0.99+
two types	QUANTITY	0.99+
Linux	TITLE	0.99+
two subclusters	QUANTITY	0.98+
first entry	QUANTITY	0.98+
one question	QUANTITY	0.98+
four	QUANTITY	0.98+
Azure	TITLE	0.98+
Vertica 10	TITLE	0.98+
4/2	DATE	0.98+
First	QUANTITY	0.98+
16 vCPU	QUANTITY	0.98+
two forms	QUANTITY	0.97+
MinIO	TITLE	0.97+
single employee	QUANTITY	0.97+
first	QUANTITY	0.97+
this week	DATE	0.96+

UNLIST TILL 4/2 - Vertica in Eon Mode: Past, Present, and Future

>> Paige: Hello everybody and thank you for joining us today for the virtual Vertica BDC 2020. Today's breakout session is entitled Vertica in Eon Mode past, present and future. I'm Paige Roberts, open source relations manager at Vertica and I'll be your host for this session. Joining me is Vertica engineer, Yuanzhe Bei and Vertica Product Manager, David Sprogis. Before we begin, I encourage you to submit questions or comments during the virtual session. You don't have to wait till the end. Just type your question or comment as you think of it in the question box, below the slides and click Submit. Q&A session at the end of the presentation. We'll answer as many of your questions as we're able to during that time, and any questions that we don't address, we'll do our best to answer offline. If you wish after the presentation, you can visit the Vertica forums to post your questions there and our engineering team is planning to join the forums to keep the conversation going, just like a Dev Lounge at a normal in person, BDC. So, as a reminder, you can maximize your screen by clicking the double arrow button in the lower right corner of the slides, if you want to see them bigger. And yes, before you ask, this virtual session is being recorded and will be available to view on demand this week. We are supposed to send you a notification as soon as it's ready. All right, let's get started. Over to you, Dave. >> David: Thanks, Paige. Hey, everybody. Let's start with a timeline of the life of Eon Mode. About two years ago, a little bit less than two years ago, we introduced Eon Mode on AWS. Pretty specifically for the purpose of rapid scaling to meet the cloud economics promise. It wasn't long after that we realized that workload isolation, a byproduct of the architecture was very important to our users and going to the third tick, you can see that the importance of that workload isolation was manifest in Eon Mode being made available on-premise using Pure Storage FlashBlade. Moving to the fourth tick mark, we took steps to improve workload isolation, with a new type of subcluster which Yuanzhe will go through and to the fifth tick mark, the introduction of secondary subclusters for faster scaling and other improvements which we will cover in the slides to come. Getting started with, why we created Eon Mode in the first place. Let's imagine that your database is this pie, the pecan pie and we're loading pecan data in through the ETL cutting board in the upper left hand corner. We have a couple of free floating pecans, which we might imagine to be data supporting external tables. As you know, the Vertica has a query engine capability as well which we call external tables. And so if we imagine this pie, we want to serve it with a number of servers. Well, let's say we wanted to serve it with three servers, three nodes, we would need to slice that pie into three segments and we would serve each one of those segments from one of our nodes. Now because the data is important to us and we don't want to lose it, we're going to be saving that data on some kind of raid storage or redundant storage. In case one of the drives goes bad, the data remains available because of the durability of raid. Imagine also, that we care about the availability of the overall database. Imagine that a node goes down, perhaps the second node goes down, we still want to be able to query our data and through nodes one and three, we still have all three shards covered and we can do this because of buddy projections. Each neighbor, each nodes neighbor contains a copy of the data from the node next to it. And so in this case, node one is sharing its segment with node two. So node two can cover node one, node three can cover node two and node one back to node three. Adding a little bit more complexity, we might store the data in different copies, each copy sorted for a different kind of query. We call this projections in Vertica and for each projection, we have another copy of the data sorted differently. Now it gets complex. What happens when we want to add a node? Well, if we wanted to add a fourth node here, what we would have to do, is figure out how to re-slice all of the data in all of the copies that we have. In effect, what we want to do is take our three slices and slice it into four, which means taking a portion of each of our existing thirds and re-segmenting into quarters. Now that looks simple in the graphic here, but when it comes to moving data around, it becomes quite complex because for each copy of each segment we need to replace it and move that data on to the new node. What's more, the fourth node can't have a copy of itself that would be problematic in case it went down. Instead, what we need is we need that buddy to be sitting on another node, a neighboring node. So we need to re-orient the buddies as well. All of this takes a lot of time, it can take 12, 24 or even 36 hours in a period when you do not want your database under high demand. In fact, you may want to stop loading data altogether in order to speed it up. This is a planned event and your applications should probably be down during this period, which makes it difficult. With the advent of cloud computing, we saw that services were coming up and down faster and we determined to re-architect Vertica in a way to accommodate that rapid scaling. Let's see how we did it. So let's start with four nodes now and we've got our four nodes database. Let's add communal storage and move each of the segments of data into communal storage. Now that's the separation that we're talking about. What happens if we run queries against it? Well, it turns out that the communal storage is not necessarily performing and so the IO would be slow, which would make the overall queries slow. In order to compensate for the low performance of communal storage, we need to add back local storage, now it doesn't have to be raid because this is just an ephemeral copy but with the data files, local to the node, the queries will run much faster. In AWS, communal storage really does mean an S3 bucket and here's a simplified version of the diagram. Now, do we need to store all of the data from the segment in the depot? The answer is no and the graphics inside the bucket has changed to reflect that. It looks more like a bullseye, showing just a segment of the data being copied to the cache or to the depot, as we call it on each one of the nodes. How much data do you store on the node? Well, it would be the active data set, the last 30 days, the last 30 minutes or the last. Whatever period of time you're working with. The active working set is the hot data and that's how large you want to size your depot. By architecting this way, when you scale up, you're not re-segmenting the database. What you're doing, is you're adding more compute and more subscriptions to the existing shards of the existing database. So in this case, we've added a complete set of four nodes. So we've doubled our capacity and we've doubled our subscriptions, which means that now, the two nodes can serve the yellow shard, two nodes can serve the red shard and so on. In this way, we're able to run twice as many queries in the same amount of time. So you're doubling the concurrency. How high can you scale? Well, can you scale to 3X, 5X? We tested this in the graphics on the right, which shows concurrent users in the X axis by the number of queries executed in a minute along the Y axis. We've grouped execution in runs of 10 users, 30 users, 50, 70 up to 150 users. Now focusing on any one of these groups, particularly up around 150. You can see through the three bars, starting with the bright purple bar, three nodes and three segments. That as you add nodes to the middle purple bar, six nodes and three segments, you've almost doubled your throughput up to the dark purple bar which is nine nodes and three segments and our tests show that you can go to 5X with pretty linear performance increase. Beyond that, you do continue to get an increase in performance but your incremental performance begins to fall off. Eon architecture does something else for us and that is it provides high availability because each of the nodes can be thought of as ephemeral and in fact, each node has a buddy subscription in a way similar to the prior architecture. So if we lose node four, we're losing the node responsible for the red shard and now node one has to pick up responsibility for the red shard while that node is down. When a query comes in, and let's say it comes into one and one is the initiator then one will look for participants, it'll find a blue shard and a green shard but when it's looking for the red, it finds itself and so the node number one will be doing double duty. This means that your performance will be cut in half approximately, for the query. This is acceptable until you are able to restore the node. Once you restore it and once the depot becomes rehydrated, then your performance goes back to normal. So this is a much simpler way to recover nodes in the event of node failure. By comparison, Enterprise Mode the older architecture. When we lose the fourth node, node one takes over responsibility for the first shard and the yellow shard and the red shard. But it also is responsible for rehydrating the entire data segment of the red shard to node four, this can be very time consuming and imposes even more stress on the first node. So performance will go down even further. Eon Mode has another feature and that is you can scale down completely to zero. We call this hibernation, you shut down your database and your database will maintain full consistency in a rest state in your S3 bucket and then when you need access to your database again, you simply recreate your cluster and revive your database and you can access your database once again. That concludes the rapid scaling portion of, why we created Eon Mode. To take us through workload isolation is Yuanzhe Bei, Yuanzhe. >> Yuanzhe: Thanks Dave, for presenting how Eon works in general. In the next section, I will show you another important capability of Vertica Eon Mode, the workload isolation. Dave used a pecan pie as an example of database. Now let's say it's time for the main course. Does anyone still have a problem with food touching on their plates. Parents know that it's a common problem for kids. Well, we have a similar problem in database as well. So there could be multiple different workloads accessing your database at the same time. Say you have ETL jobs running regularly. While at the same time, there are dashboards running short queries against your data. You may also have the end of month report running and their can be ad hoc data scientists, connect to the database and do whatever the data analysis they want to do and so on. How to make these mixed workload requests not interfere with each other is a real challenge for many DBAs. Vertica Eon Mode provides you the solution. I'm very excited here to introduce to you to the important concept in Eon Mode called subclusters. In Eon Mode, nodes they belong to the predefined subclusters rather than the whole cluster. DBAs can define different subcluster for different kinds of workloads and it redirects those workloads to the specific subclusters. For example, you can have an ETL subcluster, dashboard subcluster, report subcluster and the analytic machine learning subcluster. Vertica Eon subcluster is designed to achieve the three main goals. First of all, strong workload isolation. That means any operation in one subcluster should not affect or be affected by other subclusters. For example, say the subcluster running the report is quite overloaded and already there can be, the data scienctists running crazy analytic jobs, machine learning jobs on the analytics subcluster and making it very slow, even stuck or crash or whatever. In such scenario, your ETL and dashboards subcluster should not be or at least very minimum be impacted by this crisis and which means your ETL job which should not lag behind and dashboard should respond timely. We have done a lot of improvements as of 10.0 release and will continue to deliver improvements in this category. Secondly, fully customized subcluster settings. That means any subcluster can be set up and tuned for very different workloads without affecting other subclusters. Users should be able to tune up, tune down, certain parameters based on the actual needs of the individual subcluster workload requirements. As of today, Vertica already supports few settings that can be done at the subcluster level for example, the depot pinning policy and then we will continue extending more that is like resource pools (mumbles) in the near future. Lastly, Vertica subclusters should be easy to operate and cost efficient. What it means is that the subcluster should be able to turn on, turn off, add or remove or should be available for use according to rapid changing workloads. Let's say in this case, you want to spin up more dashboard subclusters because we need higher scores report, we can do that. You might need to run several report subclusters because you might want to run multiple reports at the same time. While on the other hand, you can shut down your analytic machine learning subcluster because no data scientists need to use it at this moment. So we made automate a lot of change, the improvements in this category, which I'll explain in detail later and one of the ultimate goal is to support auto scaling To sum up, what we really want to deliver for subcluster is very simple. You just need to remember that accessing subclusters should be just like accessing individual clusters. Well, these subclusters do share the same catalog. So you don't have to work out the stale data and don't need to worry about data synchronization. That'd be a nice goal, Vertica upcoming 10.0 release is certainly a milestone towards that goal, which will deliver a large part of the capability in this direction and then we will continue to improve it after 10.0 release. In the next couple of slides, I will highlight some issues about workload isolation in the initial Eon release and show you how we resolve these issues. First issue when we initially released our first or so called subcluster mode, it was implemented using fault groups. Well, fault groups and the subcluster have something in common. Yes, they are both defined as a set of nodes. However, they are very different in all the other ways. So, that was very confusing in the first place, when we implement this. As of 9.3.0 version, we decided to detach subcluster definition from the fault groups, which enabled us to further extend the capability of subclusters. Fault groups in the pre 9.3.0 versions will be converted into subclusters during the upgrade and this was a very important step that enabled us to provide all the amazing, following improvements on subclusters. The second issue in the past was that it's hard to control the execution groups for different types of workloads. There are two types of problems here and I will use some example to explain. The first issue is about control group size. There you allocate six nodes for your dashboard subcluster and what you really want is on the left, the three pairs of nodes as three execution groups, and each pair of nodes will need to subscribe to all the four shards. However, that's not really what you get. What you really get is there on the right side that the first four nodes subscribed to one shard each and the rest two nodes subscribed to two dangling shards. So you won't really get three execusion groups but instead only get one and two extra nodes have no value at all. The solution is to use subclusters. So instead of having a subcluster with six nodes, you can split it up into three smaller ones. Each subcluster will guarantee to subscribe to all the shards and you can further handle this three subcluster using load balancer across them. In this way you achieve the three real exclusion groups. The second issue is that the session participation is non-deterministic. Any session will just pick four random nodes from the subcluster as long as this covers one shard each. In other words, you don't really know which set of nodes will make up your execution group. What's the problem? So in this case, the fourth node will be doubled booked by two concurrent sessions. And you can imagine that the resource usage will be imbalanced and both queries performance will suffer. What is even worse is that these queries of the two concurrent sessions target different table They will cause the issue, that depot efficiency will be reduced, because both session will try to fetch the files on to two tables into the same depot and if your depot is not large enough, they will evict each other, which will be very bad. To solve this the same way, you can solve this by declaring subclusters, in this case, two subclusters and a load balancer group across them. The reason it solved the problem is because the session participation would not go across the boundary. So there won't be a case that any node is double booked and in terms of the depot and if you use the subcluster and avoid using a load balancer group, and carefully send the first workload to the first subcluster and the second to the second subcluster and then the result is that depot isolation is achieved. The first subcluster will maintain the data files for the first query and you don't need to worry about the file being evicted by the second kind of session. Here comes the next issue, it's the scaling down. In the old way of defining subclusters, you may have several execution groups in the subcluster. You want to shut it down, one or two execution groups to save cost. Well, here comes the pain, because you don't know which nodes may be used by which session at any point, it is hard to find the right timing to hit the shutdown button of any of the instances. And if you do and get unlucky, say in this case, you pull the first four nodes, one of the session will fail because it's participating in the node two and node four at that point. User of that session will notice because their query fails and we know that for many business this is critical problem and not acceptable. Again, with subclusters this problem is resolved. Same reason, session cannot go across the subcluster boundary. So all you need to do is just first prevent query sent to the first subcluster and then you can shut down the instances in that subcluster. You are guaranteed to not break any running sessions. Now, you're happy and you want to shut down more subclusters then you hit the issue four, the whole cluster will go down, why? Because the cluster loses quorum. As a distributed system, you need to have at least more than half of a node to be up in order to commit and keep the cluster up. This is to prevent the catalog diversion from happening, which is important. But do you still want to shut down those nodes? Because what's the point of keeping those nodes up and if you are not using them and let them cost you money right. So Vertica has a solution, you can define a subcluster as secondary to allow them to shut down without worrying about quorum. In this case, you can define the first three subclusters as secondary and the fourth one as primary. By doing so, this secondary subclusters will not be counted towards the quorum because we changed the rule. Now instead of requiring more than half of node to be up, it only require more than half of the primary node to be up. Now you can shut down your second subcluster and even shut down your third subcluster as well and keep the remaining primary subcluster to be still running healthily. There are actually more benefits by defining secondary subcluster in addition to the quorum concern, because the secondary subclusters no longer have the voting power, they don't need to persist catalog anymore. This means those nodes are faster to deploy, and can be dropped and re-added. Without the worry about the catalog persistency. For the most the subcluster that only need to read only query, it's the best practice to define them as secondary. The commit will be faster on this secondary subcluster as well, so running this query on the secondary subcluster will have less spikes. Primary subcluster as usual handle everything is responsible for consistency, the background tasks will be running. So DBAs should make sure that the primary subcluster is stable and assume is running all the time. Of course, you need to at least one primary subcluster in your database. Now with the secondary subcluster, user can start and stop as they need, which is very convenient and this further brings up another issue is that if there's an ETL transaction running and in the middle, a subcluster starting and it become up. In older versions, there is no catalog resync mechanism to keep the new subcluster up to date. So Vertica rolls back to ETL session to keep the data consistency. This is actually quite disruptive because real world ETL workloads can sometimes take hours and rolling back at the end means, a large waste of resources. We resolved this issue in 9.3.1 version by introducing a catalog resync mechanism when such situation happens. ETL transactions will not roll back anymore, but instead will take some time to resync the catalog and commit and the problem is resolved. And last issue I would like to talk about is the subscription. Especially for large subcluster when you start it, the startup time is quite long, because the subscription commit used to be serialized. In one of the in our internal testing with large catalogs committing a subscription, you can imagine it takes five minutes. Secondary subcluster is better, because it doesn't need to persist the catalog during the commit but still take about two seconds to commit. So what's the problem here? Let's do the math and look at this chart. The X axis is the time in the minutes and the Y axis is the number of nodes to be subscribed. The dark blues represents your primary subcluster and light blue represents the secondary subcluster. Let's say the subcluster have 16 nodes in total and if you start a secondary subcluster, it will spend about 30 seconds in total, because the 2 seconds times 16 is 32. It's not actually that long time. but if you imagine that starting secondary subcluster, you expect it to be super fast to react to the fast changing workload and 30 seconds is no longer trivial anymore and what is even worse is on the primary subcluster side. Because the commit is much longer than five minutes let's assume, then at the point, you are committing to six nodes subscription all other nodes already waited for 30 minutes for GCLX or we know the global catalog lock, and the Vertica will crash the nodes, if any node cannot get the GCLX for 30 minutes. So the end result is that your whole database crashed. That's a serious problem and we know that and that's why we are already planning for the fix, for the 10.0, so that all the subscription will be batched up and all the nodes will commit at the same time concurrently. And by doing that, you can imagine the primary subcluster can finish commiting in five minutes instead of crashing and the secondary subcluster can be finished even in seconds. That summarizes the highlights for the improvements we have done as of 10.0, and I hope you already get excited about Emerging Eon Deployment Pattern that's shown here. A primary subcluster that handles data loading, ETL jobs and tuple mover jobs is the backbone of the database and you keep it running all the time. At the same time defining different secondary subcluster for different workloads and provision them when the workload requirement arrives and then de-provision them when the workload is done to save the operational cost. So can't wait to play with the subcluster. Here as are some Admin Tools command you can start using. And for more details, check out our Eon subcluster documentation for more details. And thanks everyone for listening and I'll head back to Dave to talk about the Eon on-prem. >> David: Thanks Yuanzhe. At the same time that Yuanzhe and the rest of the dev team were working on the improvements that Yuanzhe described in and other improvements. This guy, John Yovanovich, stood on stage and told us about his deployment at at&t where he was running Eon Mode on-prem. Now this was only six months after we had launched Eon Mode on AWS. So when he told us that he was putting it into production on-prem, we nearly fell out of our chairs. How is this possible? We took a look back at Eon and determined that the workload isolation and the improvement to the operations for restoring nodes and other things had sufficient value that John wanted to run it on-prem. And he was running it on the Pure Storage FlashBlade. Taking a second look at the FlashBlade we thought alright well, does it have the performance? Yes, it does. The FlashBlade is a collection of individual blades, each one of them with NVMe storage on it, which is not only performance but it's scalable and so, we then asked is it durable? The answer is yes. The data safety is implemented with the N+2 redundancy which means that up to two blades can fail and the data remains available. And so with this we realized DBAs can sleep well at night, knowing that their data is safe, after all Eon Mode outsources the durability to the communal storage data store. Does FlashBlade have the capacity for growth? Well, yes it does. You can start as low as 120 terabytes and grow as high as about eight petabytes. So it certainly covers the range for most enterprise usages. And operationally, it couldn't be easier to use. When you want to grow your database. You can simply pop new blades into the FlashBlade unit, and you can do that hot. If one goes bad, you can pull it out and replace it hot. So you don't have to take your data store down and therefore you don't have to take Vertica down. Knowing all of these things we got behind Pure Storage and partnered with them to implement the first version of Eon on-premise. That changed our roadmap a little bit. We were imagining it would start with Amazon and then go to Google and then to Azure and at some point to Alibaba cloud, but as you can see from the left column, we started with Amazon and went to Pure Storage. And then from Pure Storage, we went to Minio and we launched Eon Mode on Minio at the end of last year. Minio is a little bit different than Pure Storage. It's software only, so you can run it on pretty much any x86 servers and you can cluster them with storage to serve up an S3 bucket. It's a great solution for up to about 120 terabytes Beyond that, we're not sure about performance implications cause we haven't tested it but for your dev environments or small production environments, we think it's great. With Vertica 10, we're introducing Eon Mode on Google Cloud. This means not only running Eon Mode in the cloud, but also being able to launch it from the marketplace. We're also offering Eon Mode on HDFS with version 10. If you have a Hadoop environment, and you want to breathe new fresh life into it with the high performance of Vertica, you can do that starting with version 10. Looking forward we'll be moving Eon mode to Microsoft Azure. We expect to have something breathing in the fall and offering it to select customers for beta testing and then we expect to release it sometime in 2021 Following that, further on horizon is Alibaba cloud. Now, to be clear we will be putting, Vertica in Enterprise Mode on Alibaba cloud in 2020 but Eon Mode is going to trail behind whether it lands in 2021 or not, we're not quite sure at this point. Our goal is to deliver Eon Mode anywhere you want to run it, on-prem or in the cloud, or both because that is one of the great value propositions of Vertica is the hybrid capability, the ability to run in both your on prem environment and in the cloud. What's next, I've got three priority and roadmap slides. This is the first of the three. We're going to start with improvements to the core of Vertica. Starting with query crunching, which allows you to run long running queries faster by getting nodes to collaborate, you'll see that coming very soon. We'll be making improvements to large clusters and specifically large cluster mode. The management of large clusters over 60 nodes can be tedious. We intend to improve that. In part, by creating a third network channel to offload some of the communication that we're now loading onto our spread or agreement protocol. We'll be improving depot efficiency. We'll be pushing down more controls to the subcluster level, allowing you to control your resource pools at the subcluster level and we'll be pairing tuple moving with data loading. From an operational flexibility perspective, we want to make it very easy to shut down and revive primaries and secondaries on-prem and in the cloud. Right now, it's a little bit tedious, very doable. We want to make it as easy as a walk in the park. We also want to allow you to be able to revive into a different size subcluster and last but not least, in fact, probably the most important, the ability to change shard count. This has been a sticking point for a lot of people and it puts a lot of pressure on the early decision of how many shards should my database be? Whether it's in 2020 or 2021. We know it's important to you so it's important to us. Ease of use is also important to us and we're making big investments in the management console, to improve managing subclusters, as well as to help you manage your load balancer groups. We also intend to grow and extend Eon Mode to new environments. Now we'll take questions and answers

Published Date : Mar 30 2020

SUMMARY :

and our engineering team is planning to join the forums and going to the third tick, you can see that and the second to the second subcluster and the improvement to the

ENTITIES

Entity	Category	Confidence
David Sprogis	PERSON	0.99+
David	PERSON	0.99+
one	QUANTITY	0.99+
Dave	PERSON	0.99+
John Yovanovich	PERSON	0.99+
10 users	QUANTITY	0.99+
Paige Roberts	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Yuanzhe Bei	PERSON	0.99+
John	PERSON	0.99+
five minutes	QUANTITY	0.99+
2020	DATE	0.99+
Amazon	ORGANIZATION	0.99+
30 seconds	QUANTITY	0.99+
50	QUANTITY	0.99+
second issue	QUANTITY	0.99+
12	QUANTITY	0.99+
Yuanzhe	PERSON	0.99+
120 terabytes	QUANTITY	0.99+
30 users	QUANTITY	0.99+
two types	QUANTITY	0.99+
2021	DATE	0.99+
Paige	PERSON	0.99+
30 minutes	QUANTITY	0.99+
three pairs	QUANTITY	0.99+
second	QUANTITY	0.99+
first	QUANTITY	0.99+
nine nodes	QUANTITY	0.99+
first subcluster	QUANTITY	0.99+
two tables	QUANTITY	0.99+
two nodes	QUANTITY	0.99+
first issue	QUANTITY	0.99+
each copy	QUANTITY	0.99+
2 seconds	QUANTITY	0.99+
36 hours	QUANTITY	0.99+
second subcluster	QUANTITY	0.99+
fourth node	QUANTITY	0.99+
each	QUANTITY	0.99+
six nodes	QUANTITY	0.99+
third subcluster	QUANTITY	0.99+
both	QUANTITY	0.99+
twice	QUANTITY	0.99+
First issue	QUANTITY	0.99+
three segments	QUANTITY	0.99+
today	DATE	0.99+
three bars	QUANTITY	0.99+
24	QUANTITY	0.99+
5X	QUANTITY	0.99+
Today	DATE	0.99+
16 nodes	QUANTITY	0.99+
Alibaba	ORGANIZATION	0.99+
each segment	QUANTITY	0.99+
first node	QUANTITY	0.99+
three slices	QUANTITY	0.99+
Each subcluster	QUANTITY	0.99+
each nodes	QUANTITY	0.99+
three nodes	QUANTITY	0.99+
AWS	ORGANIZATION	0.99+
two subclusters	QUANTITY	0.98+
three servers	QUANTITY	0.98+
four shards	QUANTITY	0.98+
3X	QUANTITY	0.98+
three	QUANTITY	0.98+
two concurrent sessions	QUANTITY	0.98+

UNLIST TILL 4/2 - Vertica Big Data Conference Keynote

>> Joy: Welcome to the Virtual Big Data Conference. Vertica is so excited to host this event. I'm Joy King, and I'll be your host for today's Big Data Conference Keynote Session. It's my honor and my genuine pleasure to lead Vertica's product and go-to-market strategy. And I'm so lucky to have a passionate and committed team who turned our Vertica BDC event, into a virtual event in a very short amount of time. I want to thank the thousands of people, and yes, that's our true number who have registered to attend this virtual event. We were determined to balance your health, safety and your peace of mind with the excitement of the Vertica BDC. This is a very unique event. Because as I hope you all know, we focus on engineering and architecture, best practice sharing and customer stories that will educate and inspire everyone. I also want to thank our top sponsors for the virtual BDC, Arrow, and Pure Storage. Our partnerships are so important to us and to everyone in the audience. Because together, we get things done faster and better. Now for today's keynote, you'll hear from three very important and energizing speakers. First, Colin Mahony, our SVP and General Manager for Vertica, will talk about the market trends that Vertica is betting on to win for our customers. And he'll share the exciting news about our Vertica 10 announcement and how this will benefit our customers. Then you'll hear from Amy Fowler, VP of strategy and solutions for FlashBlade at Pure Storage. Our partnership with Pure Storage is truly unique in the industry, because together modern infrastructure from Pure powers modern analytics from Vertica. And then you'll hear from John Yovanovich, Director of IT at AT&T, who will tell you about the Pure Vertica Symphony that plays live every day at AT&T. Here we go, Colin, over to you. >> Colin: Well, thanks a lot joy. And, I want to echo Joy's thanks to our sponsors, and so many of you who have helped make this happen. This is not an easy time for anyone. We were certainly looking forward to getting together in person in Boston during the Vertica Big Data Conference and Winning with Data. But I think all of you and our team have done a great job, scrambling and putting together a terrific virtual event. So really appreciate your time. I also want to remind people that we will make both the slides and the full recording available after this. So for any of those who weren't able to join live, that is still going to be available. Well, things have been pretty exciting here. And in the analytic space in general, certainly for Vertica, there's a lot happening. There are a lot of problems to solve, a lot of opportunities to make things better, and a lot of data that can really make every business stronger, more efficient, and frankly, more differentiated. For Vertica, though, we know that focusing on the challenges that we can directly address with our platform, and our people, and where we can actually make the biggest difference is where we ought to be putting our energy and our resources. I think one of the things that has made Vertica so strong over the years is our ability to focus on those areas where we can make a great difference. So for us as we look at the market, and we look at where we play, there are really three recent and some not so recent, but certainly picking up a lot of the market trends that have become critical for every industry that wants to Win Big With Data. We've heard this loud and clear from our customers and from the analysts that cover the market. If I were to summarize these three areas, this really is the core focus for us right now. We know that there's massive data growth. And if we can unify the data silos so that people can really take advantage of that data, we can make a huge difference. We know that public clouds offer tremendous advantages, but we also know that balance and flexibility is critical. And we all need the benefit that machine learning for all the types up to the end data science. We all need the benefits that they can bring to every single use case, but only if it can really be operationalized at scale, accurate and in real time. And the power of Vertica is, of course, how we're able to bring so many of these things together. Let me talk a little bit more about some of these trends. So one of the first industry trends that we've all been following probably now for over the last decade, is Hadoop and specifically HDFS. So many companies have invested, time, money, more importantly, people in leveraging the opportunity that HDFS brought to the market. HDFS is really part of a much broader storage disruption that we'll talk a little bit more about, more broadly than HDFS. But HDFS itself was really designed for petabytes of data, leveraging low cost commodity hardware and the ability to capture a wide variety of data formats, from a wide variety of data sources and applications. And I think what people really wanted, was to store that data before having to define exactly what structures they should go into. So over the last decade or so, the focus for most organizations is figuring out how to capture, store and frankly manage that data. And as a platform to do that, I think, Hadoop was pretty good. It certainly changed the way that a lot of enterprises think about their data and where it's locked up. In parallel with Hadoop, particularly over the last five years, Cloud Object Storage has also given every organization another option for collecting, storing and managing even more data. That has led to a huge growth in data storage, obviously, up on public clouds like Amazon and their S3, Google Cloud Storage and Azure Blob Storage just to name a few. And then when you consider regional and local object storage offered by cloud vendors all over the world, the explosion of that data, in leveraging this type of object storage is very real. And I think, as I mentioned, it's just part of this broader storage disruption that's been going on. But with all this growth in the data, in all these new places to put this data, every organization we talk to is facing even more challenges now around the data silo. Sure the data silos certainly getting bigger. And hopefully they're getting cheaper per bit. But as I said, the focus has really been on collecting, storing and managing the data. But between the new data lakes and many different cloud object storage combined with all sorts of data types from the complexity of managing all this, getting that business value has been very limited. This actually takes me to big bet number one for Team Vertica, which is to unify the data. Our goal, and some of the announcements we have made today plus roadmap announcements I'll share with you throughout this presentation. Our goal is to ensure that all the time, money and effort that has gone into storing that data, all the data turns into business value. So how are we going to do that? With a unified analytics platform that analyzes the data wherever it is HDFS, Cloud Object Storage, External tables in an any format ORC, Parquet, JSON, and of course, our own Native Roth Vertica format. Analyze the data in the right place in the right format, using a single unified tool. This is something that Vertica has always been committed to, and you'll see in some of our announcements today, we're just doubling down on that commitment. Let's talk a little bit more about the public cloud. This is certainly the second trend. It's the second wave maybe of data disruption with object storage. And there's a lot of advantages when it comes to public cloud. There's no question that the public clouds give rapid access to compute storage with the added benefit of eliminating data center maintenance that so many companies, want to get out of themselves. But maybe the biggest advantage that I see is the architectural innovation. The public clouds have introduced so many methodologies around how to provision quickly, separating compute and storage and really dialing-in the exact needs on demand, as you change workloads. When public clouds began, it made a lot of sense for the cloud providers and their customers to charge and pay for compute and storage in the ratio that each use case demanded. And I think you're seeing that trend, proliferate all over the place, not just up in public cloud. That architecture itself is really becoming the next generation architecture for on-premise data centers, as well. But there are a lot of concerns. I think we're all aware of them. They're out there many times for different workloads, there are higher costs. Especially if some of the workloads that are being run through analytics, which tend to run all the time. Just like some of the silo challenges that companies are facing with HDFS, data lakes and cloud storage, the public clouds have similar types of siloed challenges as well. Initially, there was a belief that they were cheaper than data centers, and when you added in all the costs, it looked that way. And again, for certain elastic workloads, that is the case. I don't think that's true across the board overall. Even to the point where a lot of the cloud vendors aren't just charging lower costs anymore. We hear from a lot of customers that they don't really want to tether themselves to any one cloud because of some of those uncertainties. Of course, security and privacy are a concern. We hear a lot of concerns with regards to cloud and even some SaaS vendors around shared data catalogs, across all the customers and not enough separation. But security concerns are out there, you can read about them. I'm not going to jump into that bandwagon. But we hear about them. And then, of course, I think one of the things we hear the most from our customers, is that each cloud stack is starting to feel even a lot more locked in than the traditional data warehouse appliance. And as everybody knows, the industry has been running away from appliances as fast as it can. And so they're not eager to get locked into another, quote, unquote, virtual appliance, if you will, up in the cloud. They really want to make sure they have flexibility in which clouds, they're going to today, tomorrow and in the future. And frankly, we hear from a lot of our customers that they're very interested in eventually mixing and matching, compute from one cloud with, say storage from another cloud, which I think is something that we'll hear a lot more about. And so for us, that's why we've got our big bet number two. we love the cloud. We love the public cloud. We love the private clouds on-premise, and other hosting providers. But our passion and commitment is for Vertica to be able to run in any of the clouds that our customers choose, and make it portable across those clouds. We have supported on-premises and all public clouds for years. And today, we have announced even more support for Vertica in Eon Mode, the deployment option that leverages the separation of compute from storage, with even more deployment choices, which I'm going to also touch more on as we go. So super excited about our big bet number two. And finally as I mentioned, for all the hype that there is around machine learning, I actually think that most importantly, this third trend that team Vertica is determined to address is the need to bring business critical, analytics, machine learning, data science projects into production. For so many years, there just wasn't enough data available to justify the investment in machine learning. Also, processing power was expensive, and storage was prohibitively expensive. But to train and score and evaluate all the different models to unlock the full power of predictive analytics was tough. Today you have those massive data volumes. You have the relatively cheap processing power and storage to make that dream a reality. And if you think about this, I mean with all the data that's available to every company, the real need is to operationalize the speed and the scale of machine learning so that these organizations can actually take advantage of it where they need to. I mean, we've seen this for years with Vertica, going back to some of the most advanced gaming companies in the early days, they were incorporating this with live data directly into their gaming experiences. Well, every organization wants to do that now. And the accuracy for clickability and real time actions are all key to separating the leaders from the rest of the pack in every industry when it comes to machine learning. But if you look at a lot of these projects, the reality is that there's a ton of buzz, there's a ton of hype spanning every acronym that you can imagine. But most companies are struggling, do the separate teams, different tools, silos and the limitation that many platforms are facing, driving, down sampling to get a small subset of the data, to try to create a model that then doesn't apply, or compromising accuracy and making it virtually impossible to replicate models, and understand decisions. And if there's one thing that we've learned when it comes to data, prescriptive data at the atomic level, being able to show end of one as we refer to it, meaning individually tailored data. No matter what it is healthcare, entertainment experiences, like gaming or other, being able to get at the granular data and make these decisions, make that scoring applies to machine learning just as much as it applies to giving somebody a next-best-offer. But the opportunity has never been greater. The need to integrate this end-to-end workflow and support the right tools without compromising on that accuracy. Think about it as no downsampling, using all the data, it really is key to machine learning success. Which should be no surprise then why the third big bet from Vertica is one that we've actually been working on for years. And we're so proud to be where we are today, helping the data disruptors across the world operationalize machine learning. This big bet has the potential to truly unlock, really the potential of machine learning. And today, we're announcing some very important new capabilities specifically focused on unifying the work being done by the data science community, with their preferred tools and platforms, and the volume of data and performance at scale, available in Vertica. Our strategy has been very consistent over the last several years. As I said in the beginning, we haven't deviated from our strategy. Of course, there's always things that we add. Most of the time, it's customer driven, it's based on what our customers are asking us to do. But I think we've also done a great job, not trying to be all things to all people. Especially as these hype cycles flare up around us, we absolutely love participating in these different areas without getting completely distracted. I mean, there's a variety of query tools and data warehouses and analytics platforms in the market. We all know that. There are tools and platforms that are offered by the public cloud vendors, by other vendors that support one or two specific clouds. There are appliance vendors, who I was referring to earlier who can deliver package data warehouse offerings for private data centers. And there's a ton of popular machine learning tools, languages and other kits. But Vertica is the only advanced analytic platform that can do all this, that can bring it together. We can analyze the data wherever it is, in HDFS, S3 Object Storage, or Vertica itself. Natively we support multiple clouds on-premise deployments, And maybe most importantly, we offer that choice of deployment modes to allow our customers to choose the architecture that works for them right now. It still also gives them the option to change move, evolve over time. And Vertica is the only analytics database with end-to-end machine learning that can truly operationalize ML at scale. And I know it's a mouthful. But it is not easy to do all these things. It is one of the things that highly differentiates Vertica from the rest of the pack. It is also why our customers, all of you continue to bet on us and see the value that we are delivering and we will continue to deliver. Here's a couple of examples of some of our customers who are powered by Vertica. It's the scale of data. It's the millisecond response times. Performance and scale have always been a huge part of what we have been about, not the only thing. I think the functionality all the capabilities that we add to the platform, the ease of use, the flexibility, obviously with the deployment. But if you look at some of the numbers they are under these customers on this slide. And I've shared a lot of different stories about these customers. Which, by the way, it still amaze me every time I talk to one and I get the updates, you can see the power and the difference that Vertica is making. Equally important, if you look at a lot of these customers, they are the epitome of being able to deploy Vertica in a lot of different environments. Many of the customers on this slide are not using Vertica just on-premise or just in the cloud. They're using it in a hybrid way. They're using it in multiple different clouds. And again, we've been with them on that journey throughout, which is what has made this product and frankly, our roadmap and our vision exactly what it is. It's been quite a journey. And that journey continues now with the Vertica 10 release. The Vertica 10 release is obviously a massive release for us. But if you look back, you can see that building on that native columnar architecture that started a long time ago, obviously, with the C-Store paper. We built it to leverage that commodity hardware, because it was an architecture that was never tightly integrated with any specific underlying infrastructure. I still remember hearing the initial pitch from Mike Stonebreaker, about the vision of Vertica as a software only solution and the importance of separating the company from hardware innovation. And at the time, Mike basically said to me, "there's so much R&D in innovation that's going to happen in hardware, we shouldn't bake hardware into our solution. We should do it in software, and we'll be able to take advantage of that hardware." And that is exactly what has happened. But one of the most recent innovations that we embraced with hardware is certainly that separation of compute and storage. As I said previously, the public cloud providers offered this next generation architecture, really to ensure that they can provide the customers exactly what they needed, more compute or more storage and charge for each, respectively. The separation of compute and storage, compute from storage is a major milestone in data center architectures. If you think about it, it's really not only a public cloud innovation, though. It fundamentally redefines the next generation data architecture for on-premise and for pretty much every way people are thinking about computing today. And that goes for software too. Object storage is an example of the cost effective means for storing data. And even more importantly, separating compute from storage for analytic workloads has a lot of advantages. Including the opportunity to manage much more dynamic, flexible workloads. And more importantly, truly isolate those workloads from others. And by the way, once you start having something that can truly isolate workloads, then you can have the conversations around autonomic computing, around setting up some nodes, some compute resources on the data that won't affect any of the other data to do some things on their own, maybe some self analytics, by the system, etc. A lot of things that many of you know we've already been exploring in terms of our own system data in the product. But it was May 2018, believe it or not, it seems like a long time ago where we first announced Eon Mode and I want to make something very clear, actually about Eon mode. It's a mode, it's a deployment option for Vertica customers. And I think this is another huge benefit that we don't talk about enough. But unlike a lot of vendors in the market who will dig you and charge you for every single add-on like hit-buy, you name it. You get this with the Vertica product. If you continue to pay support and maintenance, this comes with the upgrade. This comes as part of the new release. So any customer who owns or buys Vertica has the ability to set up either an Enterprise Mode or Eon Mode, which is a question I know that comes up sometimes. Our first announcement of Eon was obviously AWS customers, including the trade desk, AT&T. Most of whom will be speaking here later at the Virtual Big Data Conference. They saw a huge opportunity. Eon Mode, not only allowed Vertica to scale elastically with that specific compute and storage that was needed, but it really dramatically simplified database operations including things like workload balancing, node recovery, compute provisioning, etc. So one of the most popular functions is that ability to isolate the workloads and really allocate those resources without negatively affecting others. And even though traditional data warehouses, including Vertica Enterprise Mode have been able to do lots of different workload isolation, it's never been as strong as Eon Mode. Well, it certainly didn't take long for our customers to see that value across the board with Eon Mode. Not just up in the cloud, in partnership with one of our most valued partners and a platinum sponsor here. Joy mentioned at the beginning. We announced Vertica Eon Mode for Pure Storage FlashBlade in September 2019. And again, just to be clear, this is not a new product, it's one Vertica with yet more deployment options. With Pure Storage, Vertica in Eon mode is not limited in any way by variable cloud, network latency. The performance is actually amazing when you take the benefits of separate and compute from storage and you run it with a Pure environment on-premise. Vertica in Eon Mode has a super smart cache layer that we call the depot. It's a big part of our secret sauce around Eon mode. And combined with the power and performance of Pure's FlashBlade, Vertica became the industry's first advanced analytics platform that actually separates compute and storage for on-premises data centers. Something that a lot of our customers are already benefiting from, and we're super excited about it. But as I said, this is a journey. We don't stop, we're not going to stop. Our customers need the flexibility of multiple public clouds. So today with Vertica 10, we're super proud and excited to announce support for Vertica in Eon Mode on Google Cloud. This gives our customers the ability to use their Vertica licenses on Amazon AWS, on-premise with Pure Storage and on Google Cloud. Now, we were talking about HDFS and a lot of our customers who have invested quite a bit in HDFS as a place, especially to store data have been pushing us to support Eon Mode with HDFS. So as part of Vertica 10, we are also announcing support for Vertica in Eon Mode using HDFS as the communal storage. Vertica's own Roth format data can be stored in HDFS, and actually the full functionality of Vertica is complete analytics, geospatial pattern matching, time series, machine learning, everything that we have in there can be applied to this data. And on the same HDFS nodes, Vertica can actually also analyze data in ORC or Parquet format, using External tables. We can also execute joins between the Roth data the External table holds, which powers a much more comprehensive view. So again, it's that flexibility to be able to support our customers, wherever they need us to support them on whatever platform, they have. Vertica 10 gives us a lot more ways that we can deploy Eon Mode in various environments for our customers. It allows them to take advantage of Vertica in Eon Mode and the power that it brings with that separation, with that workload isolation, to whichever platform they are most comfortable with. Now, there's a lot that has come in Vertica 10. I'm definitely not going to be able to cover everything. But we also introduced complex types as an example. And complex data types fit very well into Eon as well in this separation. They significantly reduce the data pipeline, the cost of moving data between those, a much better support for unstructured data, which a lot of our customers have mixed with structured data, of course, and they leverage a lot of columnar execution that Vertica provides. So you get complex data types in Vertica now, a lot more data, stronger performance. It goes great with the announcement that we made with the broader Eon Mode. Let's talk a little bit more about machine learning. We've been actually doing work in and around machine learning with various extra regressions and a whole bunch of other algorithms for several years. We saw the huge advantage that MPP offered, not just as a sequel engine as a database, but for ML as well. Didn't take as long to realize that there's a lot more to operationalizing machine learning than just those algorithms. It's data preparation, it's that model trade training. It's the scoring, the shaping, the evaluation. That is so much of what machine learning and frankly, data science is about. You do know, everybody always wants to jump to the sexy algorithm and we handle those tasks very, very well. It makes Vertica a terrific platform to do that. A lot of work in data science and machine learning is done in other tools. I had mentioned that there's just so many tools out there. We want people to be able to take advantage of all that. We never believed we were going to be the best algorithm company or come up with the best models for people to use. So with Vertica 10, we support PMML. We can import now and export PMML models. It's a huge step for us around that operationalizing machine learning projects for our customers. Allowing the models to get built outside of Vertica yet be imported in and then applying to that full scale of data with all the performance that you would expect from Vertica. We also are more tightly integrating with Python. As many of you know, we've been doing a lot of open source projects with the community driven by many of our customers, like Uber. And so now with Python we've integrated with TensorFlow, allowing data scientists to build models in their preferred language, to take advantage of TensorFlow. But again, to store and deploy those models at scale with Vertica. I think both these announcements are proof of our big bet number three, and really our commitment to supporting innovation throughout the community by operationalizing ML with that accuracy, performance and scale of Vertica for our customers. Again, there's a lot of steps when it comes to the workflow of machine learning. These are some of them that you can see on the slide, and it's definitely not linear either. We see this as a circle. And companies that do it, well just continue to learn, they continue to rescore, they continue to redeploy and they want to operationalize all that within a single platform that can take advantage of all those capabilities. And that is the platform, with a very robust ecosystem that Vertica has always been committed to as an organization and will continue to be. This graphic, many of you have seen it evolve over the years. Frankly, if we put everything and everyone on here wouldn't fit on a slide. But it will absolutely continue to evolve and grow as we support our customers, where they need the support most. So, again, being able to deploy everywhere, being able to take advantage of Vertica, not just as a business analyst or a business user, but as a data scientists or as an operational or BI person. We want Vertica to be leveraged and used by the broader organization. So I think it's fair to say and I encourage everybody to learn more about Vertica 10, because I'm just highlighting some of the bigger aspects of it. But we talked about those three market trends. The need to unify the silos, the need for hybrid multiple cloud deployment options, the need to operationalize business critical machine learning projects. Vertica 10 has absolutely delivered on those. But again, we are not going to stop. It is our job not to, and this is how Team Vertica thrives. I always joke that the next release is the best release. And, of course, even after Vertica 10, that is also true, although Vertica 10 is pretty awesome. But, you know, from the first line of code, we've always been focused on performance and scale, right. And like any really strong data platform, the execution engine, the optimizer and the execution engine are the two core pieces of that. Beyond Vertica 10, some of the big things that we're already working on, next generation execution engine. We're already actually seeing incredible early performance from this. And this is just one example, of how important it is for an organization like Vertica to constantly go back and re-innovate. Every single release, we do the sit ups and crunches, our performance and scale. How do we improve? And there's so many parts of the core server, there's so many parts of our broader ecosystem. We are constantly looking at coverages of how we can go back to all the code lines that we have, and make them better in the current environment. And it's not an easy thing to do when you're doing that, and you're also expanding in the environment that we are expanding into to take advantage of the different deployments, which is a great segue to this slide. Because if you think about today, we're obviously already available with Eon Mode and Amazon, AWS and Pure and actually MinIO as well. As I talked about in Vertica 10 we're adding Google and HDFS. And coming next, obviously, Microsoft Azure, Alibaba cloud. So being able to expand into more of these environments is really important for the Vertica team and how we go forward. And it's not just running in these clouds, for us, we want it to be a SaaS like experience in all these clouds. We want you to be able to deploy Vertica in 15 minutes or less on these clouds. You can also consume Vertica, in a lot of different ways, on these clouds. As an example, in Amazon Vertica by the Hour. So for us, it's not just about running, it's about taking advantage of the ecosystems that all these cloud providers offer, and really optimizing the Vertica experience as part of them. Optimization, around automation, around self service capabilities, extending our management console, we now have products that like the Vertica Advisor Tool that our Customer Success Team has created to actually use our own smarts in Vertica. To take data from customers that give it to us and help them tune automatically their environment. You can imagine that we're taking that to the next level, in a lot of different endeavors that we're doing around how Vertica as a product can actually be smarter because we all know that simplicity is key. There just aren't enough people in the world who are good at managing data and taking it to the next level. And of course, other things that we all hear about, whether it's Kubernetes and containerization. You can imagine that that probably works very well with the Eon Mode and separating compute and storage. But innovation happens everywhere. We innovate around our community documentation. Many of you have taken advantage of the Vertica Academy. The numbers there are through the roof in terms of the number of people coming in and certifying on it. So there's a lot of things that are within the core products. There's a lot of activity and action beyond the core products that we're taking advantage of. And let's not forget why we're here, right? It's easy to talk about a platform, a data platform, it's easy to jump into all the functionality, the analytics, the flexibility, how we can offer it. But at the end of the day, somebody, a person, she's got to take advantage of this data, she's got to be able to take this data and use this information to make a critical business decision. And that doesn't happen unless we explore lots of different and frankly, new ways to get that predictive analytics UI and interface beyond just the standard BI tools in front of her at the right time. And so there's a lot of activity, I'll tease you with that going on in this organization right now about how we can do that and deliver that for our customers. We're in a great position to be able to see exactly how this data is consumed and used and start with this core platform that we have to go out. Look, I know, the plan wasn't to do this as a virtual BDC. But I really appreciate you tuning in. Really appreciate your support. I think if there's any silver lining to us, maybe not being able to do this in person, it's the fact that the reach has actually gone significantly higher than what we would have been able to do in person in Boston. We're certainly looking forward to doing a Big Data Conference in the future. But if I could leave you with anything, know this, since that first release for Vertica, and our very first customers, we have been very consistent. We respect all the innovation around us, whether it's open source or not. We understand the market trends. We embrace those new ideas and technologies and for us true north, and the most important thing is what does our customer need to do? What problem are they trying to solve? And how do we use the advantages that we have without disrupting our customers? But knowing that you depend on us to deliver that unified analytics strategy, it will deliver that performance of scale, not only today, but tomorrow and for years to come. We've added a lot of great features to Vertica. I think we've said no to a lot of things, frankly, that we just knew we wouldn't be the best company to deliver. When we say we're going to do things we do them. Vertica 10 is a perfect example of so many of those things that we from you, our customers have heard loud and clear, and we have delivered. I am incredibly proud of this team across the board. I think the culture of Vertica, a customer first culture, jumping in to help our customers win no matter what is also something that sets us massively apart. I hear horror stories about support experiences with other organizations. And people always seem to be amazed at Team Vertica's willingness to jump in or their aptitude for certain technical capabilities or understanding the business. And I think sometimes we take that for granted. But that is the team that we have as Team Vertica. We are incredibly excited about Vertica 10. I think you're going to love the Virtual Big Data Conference this year. I encourage you to tune in. Maybe one other benefit is I know some people were worried about not being able to see different sessions because they were going to overlap with each other well now, even if you can't do it live, you'll be able to do those sessions on demand. Please enjoy the Vertica Big Data Conference here in 2020. Please you and your families and your co-workers be safe during these times. I know we will get through it. And analytics is probably going to help with a lot of that and we already know it is helping in many different ways. So believe in the data, believe in data's ability to change the world for the better. And thank you for your time. And with that, I am delighted to now introduce Micro Focus CEO Stephen Murdoch to the Vertica Big Data Virtual Conference. Thank you Stephen. >> Stephen: Hi, everyone, my name is Stephen Murdoch. I have the pleasure and privilege of being the Chief Executive Officer here at Micro Focus. Please let me add my welcome to the Big Data Conference. And also my thanks for your support, as we've had to pivot to this being virtual rather than a physical conference. Its amazing how quickly we all reset to a new normal. I certainly didn't expect to be addressing you from my study. Vertica is an incredibly important part of Micro Focus family. Is key to our goal of trying to enable and help customers become much more data driven across all of their IT operations. Vertica 10 is a huge step forward, we believe. It allows for multi-cloud innovation, genuinely hybrid deployments, begin to leverage machine learning properly in the enterprise, and also allows the opportunity to unify currently siloed lakes of information. We operate in a very noisy, very competitive market, and there are people, who are in that market who can do some of those things. The reason we are so excited about Vertica is we genuinely believe that we are the best at doing all of those things. And that's why we've announced publicly, you're under executing internally, incremental investment into Vertica. That investments targeted at accelerating the roadmaps that already exist. And getting that innovation into your hands faster. This idea is speed is key. It's not a question of if companies have to become data driven organizations, it's a question of when. So that speed now is really important. And that's why we believe that the Big Data Conference gives a great opportunity for you to accelerate your own plans. You will have the opportunity to talk to some of our best architects, some of the best development brains that we have. But more importantly, you'll also get to hear from some of our phenomenal Roth Data customers. You'll hear from Uber, from the Trade Desk, from Philips, and from AT&T, as well as many many others. And just hearing how those customers are using the power of Vertica to accelerate their own, I think is the highlight. And I encourage you to use this opportunity to its full. Let me close by, again saying thank you, we genuinely hope that you get as much from this virtual conference as you could have from a physical conference. And we look forward to your engagement, and we look forward to hearing your feedback. With that, thank you very much. >> Joy: Thank you so much, Stephen, for joining us for the Vertica Big Data Conference. Your support and enthusiasm for Vertica is so clear, and it makes a big difference. Now, I'm delighted to introduce Amy Fowler, the VP of Strategy and Solutions for FlashBlade at Pure Storage, who was one of our BDC Platinum Sponsors, and one of our most valued partners. It was a proud moment for me, when we announced Vertica in Eon mode for Pure Storage FlashBlade and we became the first analytics data warehouse that separates compute from storage for on-premise data centers. Thank you so much, Amy, for joining us. Let's get started. >> Amy: Well, thank you, Joy so much for having us. And thank you all for joining us today, virtually, as we may all be. So, as we just heard from Colin Mahony, there are some really interesting trends that are happening right now in the big data analytics market. From the end of the Hadoop hype cycle, to the new cloud reality, and even the opportunity to help the many data science and machine learning projects move from labs to production. So let's talk about these trends in the context of infrastructure. And in particular, look at why a modern storage platform is relevant as organizations take on the challenges and opportunities associated with these trends. The answer is the Hadoop hype cycles left a lot of data in HDFS data lakes, or reservoirs or swamps depending upon the level of the data hygiene. But without the ability to get the value that was promised from Hadoop as a platform rather than a distributed file store. And when we combine that data with the massive volume of data in Cloud Object Storage, we find ourselves with a lot of data and a lot of silos, but without a way to unify that data and find value in it. Now when you look at the infrastructure data lakes are traditionally built on, it is often direct attached storage or data. The approach that Hadoop took when it entered the market was primarily bound by the limits of networking and storage technologies. One gig ethernet and slower spinning disk. But today, those barriers do not exist. And all FlashStorage has fundamentally transformed how data is accessed, managed and leveraged. The need for local data storage for significant volumes of data has been largely mitigated by the performance increases afforded by all Flash. At the same time, organizations can achieve superior economies of scale with that segregation of compute and storage. With compute and storage, you don't always scale in lockstep. Would you want to add an engine to the train every time you add another boxcar? Probably not. But from a Pure Storage perspective, FlashBlade is uniquely architected to allow customers to achieve better resource utilization for compute and storage, while at the same time, reducing complexity that has arisen from the siloed nature of the original big data solutions. The second and equally important recent trend we see is something I'll call cloud reality. The public clouds made a lot of promises and some of those promises were delivered. But cloud economics, especially usage based and elastic scaling, without the control that many companies need to manage the financial impact is causing a lot of issues. In addition, the risk of vendor lock-in from data egress, charges, to integrated software stacks that can't be moved or deployed on-premise is causing a lot of organizations to back off the all the way non-cloud strategy, and move toward hybrid deployments. Which is kind of funny in a way because it wasn't that long ago that there was a lot of talk about no more data centers. And for example, one large retailer, I won't name them, but I'll admit they are my favorites. They several years ago told us they were completely done with on-prem storage infrastructure, because they were going 100% to the cloud. But they just deployed FlashBlade for their data pipelines, because they need predictable performance at scale. And the all cloud TCO just didn't add up. Now, that being said, well, there are certainly challenges with the public cloud. It has also brought some things to the table that we see most organizations wanting. First of all, in a lot of cases applications have been built to leverage object storage platforms like S3. So they need that object protocol, but they may also need it to be fast. And the said object may be oxymoron only a few years ago, and this is an area of the market where Pure and FlashBlade have really taken a leadership position. Second, regardless of where the data is physically stored, organizations want the best elements of a cloud experience. And for us, that means two main things. Number one is simplicity and ease of use. If you need a bunch of storage experts to run the system, that should be considered a bug. The other big one is the consumption model. The ability to pay for what you need when you need it, and seamlessly grow your environment over time totally nondestructively. This is actually pretty huge and something that a lot of vendors try to solve for with finance programs. But no finance program can address the pain of a forklift upgrade, when you need to move to next gen hardware. To scale nondestructively over long periods of time, five to 10 years plus is a crucial architectural decisions need to be made at the outset. Plus, you need the ability to pay as you use it. And we offer something for FlashBlade called Pure as a Service, which delivers exactly that. The third cloud characteristic that many organizations want is the option for hybrid. Even if that is just a DR site in the cloud. In our case, that means supporting appplication of S3, at the AWS. And the final trend, which to me represents the biggest opportunity for all of us, is the need to help the many data science and machine learning projects move from labs to production. This means bringing all the machine learning functions and model training to the data, rather than moving samples or segments of data to separate platforms. As we all know, machine learning needs a ton of data for accuracy. And there is just too much data to retrieve from the cloud for every training job. At the same time, predictive analytics without accuracy is not going to deliver the business advantage that everyone is seeking. You can kind of visualize data analytics as it is traditionally deployed as being on a continuum. With that thing, we've been doing the longest, data warehousing on one end, and AI on the other end. But the way this manifests in most environments is a series of silos that get built up. So data is duplicated across all kinds of bespoke analytics and AI, environments and infrastructure. This creates an expensive and complex environment. So historically, there was no other way to do it because some level of performance is always table stakes. And each of these parts of the data pipeline has a different workload profile. A single platform to deliver on the multi dimensional performances, diverse set of applications required, that didn't exist three years ago. And that's why the application vendors pointed you towards bespoke things like DAS environments that we talked about earlier. And the fact that better options exists today is why we're seeing them move towards supporting this disaggregation of compute and storage. And when it comes to a platform that is a better option, one with a modern architecture that can address the diverse performance requirements of this continuum, and allow organizations to bring a model to the data instead of creating separate silos. That's exactly what FlashBlade is built for. Small files, large files, high throughput, low latency and scale to petabytes in a single namespace. And this is importantly a single rapid space is what we're focused on delivering for our customers. At Pure, we talk about it in the context of modern data experience because at the end of the day, that's what it's really all about. The experience for your teams in your organization. And together Pure Storage and Vertica have delivered that experience to a wide range of customers. From a SaaS analytics company, which uses Vertica on FlashBlade to authenticate the quality of digital media in real time, to a multinational car company, which uses Vertica on FlashBlade to make thousands of decisions per second for autonomous cars, or a healthcare organization, which uses Vertica on FlashBlade to enable healthcare providers to make real time decisions that impact lives. And I'm sure you're all looking forward to hearing from John Yavanovich from AT&T. To hear how he's been doing this with Vertica and FlashBlade as well. He's coming up soon. We have been really excited to build this partnership with Vertica. And we're proud to provide the only on-premise storage platform validated with Vertica Eon Mode. And deliver this modern data experience to our customers together. Thank you all so much for joining us today. >> Joy: Amy, thank you so much for your time and your insights. Modern infrastructure is key to modern analytics, especially as organizations leverage next generation data center architectures, and object storage for their on-premise data centers. Now, I'm delighted to introduce our last speaker in our Vertica Big Data Conference Keynote, John Yovanovich, Director of IT for AT&T. Vertica is so proud to serve AT&T, and especially proud of the harmonious impact we are having in partnership with Pure Storage. John, welcome to the Virtual Vertica BDC. >> John: Thank you joy. It's a pleasure to be here. And I'm excited to go through this presentation today. And in a unique fashion today 'cause as I was thinking through how I wanted to present the partnership that we have formed together between Pure Storage, Vertica and AT&T, I want to emphasize how well we all work together and how these three components have really driven home, my desire for a harmonious to use your word relationship. So, I'm going to move forward here and with. So here, what I'm going to do the theme of today's presentation is the Pure Vertica Symphony live at AT&T. And if anybody is a Westworld fan, you can appreciate the sheet music on the right hand side. What we're going to what I'm going to highlight here is in a musical fashion, is how we at AT&T leverage these technologies to save money to deliver a more efficient platform, and to actually just to make our customers happier overall. So as we look back, and back as early as just maybe a few years ago here at AT&T, I realized that we had many musicians to help the company. Or maybe you might want to call them data scientists, or data analysts. For the theme we'll stay with musicians. None of them were singing or playing from the same hymn book or sheet music. And so what we had was many organizations chasing a similar dream, but not exactly the same dream. And, best way to describe that is and I think with a lot of people this might resonate in your organizations. How many organizations are chasing a customer 360 view in your company? Well, I can tell you that I have at least four in my company. And I'm sure there are many that I don't know of. That is our problem because what we see is a repetitive sourcing of data. We see a repetitive copying of data. And there's just so much money to be spent. This is where I asked Pure Storage and Vertica to help me solve that problem with their technologies. What I also noticed was that there was no coordination between these departments. In fact, if you look here, nobody really wants to play with finance. Sales, marketing and care, sure that you all copied each other's data. But they actually didn't communicate with each other as they were copying the data. So the data became replicated and out of sync. This is a challenge throughout, not just my company, but all companies across the world. And that is, the more we replicate the data, the more problems we have at chasing or conquering the goal of single version of truth. In fact, I kid that I think that AT&T, we actually have adopted the multiple versions of truth, techno theory, which is not where we want to be, but this is where we are. But we are conquering that with the synergies between Pure Storage and Vertica. This is what it leaves us with. And this is where we are challenged and that if each one of our siloed business units had their own stories, their own dedicated stories, and some of them had more money than others so they bought more storage. Some of them anticipating storing more data, and then they really did. Others are running out of space, but can't put anymore because their bodies aren't been replenished. So if you look at it from this side view here, we have a limited amount of compute or fixed compute dedicated to each one of these silos. And that's because of the, wanting to own your own. And the other part is that you are limited or wasting space, depending on where you are in the organization. So there were the synergies aren't just about the data, but actually the compute and the storage. And I wanted to tackle that challenge as well. So I was tackling the data. I was tackling the storage, and I was tackling the compute all at the same time. So my ask across the company was can we just please play together okay. And to do that, I knew that I wasn't going to tackle this by getting everybody in the same room and getting them to agree that we needed one account table, because they will argue about whose account table is the best account table. But I knew that if I brought the account tables together, they would soon see that they had so much redundancy that I can now start retiring data sources. I also knew that if I brought all the compute together, that they would all be happy. But I didn't want them to tackle across tackle each other. And in fact that was one of the things that all business units really enjoy. Is they enjoy the silo of having their own compute, and more or less being able to control their own destiny. Well, Vertica's subclustering allows just that. And this is exactly what I was hoping for, and I'm glad they've brought through. And finally, how did I solve the problem of the single account table? Well when you don't have dedicated storage, and you can separate compute and storage as Vertica in Eon Mode does. And we store the data on FlashBlades, which you see on the left and right hand side, of our container, which I can describe in a moment. Okay, so what we have here, is we have a container full of compute with all the Vertica nodes sitting in the middle. Two loader, we'll call them loader subclusters, sitting on the sides, which are dedicated to just putting data onto the FlashBlades, which is sitting on both ends of the container. Now today, I have two dedicated storage or common dedicated might not be the right word, but two storage racks one on the left one on the right. And I treat them as separate storage racks. They could be one, but i created them separately for disaster recovery purposes, lashing work in case that rack were to go down. But that being said, there's no reason why I'm probably going to add a couple of them here in the future. So I can just have a, say five to 10, petabyte storage, setup, and I'll have my DR in another 'cause the DR shouldn't be in the same container. Okay, but I'll DR outside of this container. So I got them all together, I leveraged subclustering, I leveraged separate and compute. I was able to convince many of my clients that they didn't need their own account table, that they were better off having one. I eliminated, I reduced latency, I reduced our ticketing I reduce our data quality issues AKA ticketing okay. I was able to expand. What is this? As work. I was able to leverage elasticity within this cluster. As you can see, there are racks and racks of compute. We set up what we'll call the fixed capacity that each of the business units needed. And then I'm able to ramp up and release the compute that's necessary for each one of my clients based on their workloads throughout the day. And so while they compute to the right before you see that the instruments have already like, more or less, dedicated themselves towards all those are free for anybody to use. So in essence, what I have, is I have a concert hall with a lot of seats available. So if I want to run a 10 chair Symphony or 80, chairs, Symphony, I'm able to do that. And all the while, I can also do the same with my loader nodes. I can expand my loader nodes, to actually have their own Symphony or write all to themselves and not compete with any other workloads of the other clusters. What does that change for our organization? Well, it really changes the way our database administrators actually do their jobs. This has been a big transformation for them. They have actually become data conductors. Maybe you might even call them composers, which is interesting, because what I've asked them to do is morph into less technology and more workload analysis. And in doing so we're able to write auto-detect scripts, that watch the queues, watch the workloads so that we can help ramp up and trim down the cluster and subclusters as necessary. There has been an exciting transformation for our DBAs, who I need to now classify as something maybe like DCAs. I don't know, I have to work with HR on that. But I think it's an exciting future for their careers. And if we bring it all together, If we bring it all together, and then our clusters, start looking like this. Where everything is moving in harmonious, we have lots of seats open for extra musicians. And we are able to emulate a cloud experience on-prem. And so, I want you to sit back and enjoy the Pure Vertica Symphony live at AT&T. (soft music) >> Joy: Thank you so much, John, for an informative and very creative look at the benefits that AT&T is getting from its Pure Vertica symphony. I do really like the idea of engaging HR to change the title to Data Conductor. That's fantastic. I've always believed that music brings people together. And now it's clear that analytics at AT&T is part of that musical advantage. So, now it's time for a short break. And we'll be back for our breakout sessions, beginning at 12 pm Eastern Daylight Time. We have some really exciting sessions planned later today. And then again, as you can see on Wednesday. Now because all of you are already logged in and listening to this keynote, you already know the steps to continue to participate in the sessions that are listed here and on the previous slide. In addition, everyone received an email yesterday, today, and you'll get another one tomorrow, outlining the simple steps to register, login and choose your session. If you have any questions, check out the emails or go to www.vertica.com/bdc2020 for the logistics information. There are a lot of choices and that's always a good thing. Don't worry if you want to attend one or more or can't listen to these live sessions due to your timezone. All the sessions, including the Q&A sections will be available on demand and everyone will have access to the recordings as well as even more pre-recorded sessions that we'll post to the BDC website. Now I do want to leave you with two other important sites. First, our Vertica Academy. Vertica Academy is available to everyone. And there's a variety of very technical, self-paced, on-demand training, virtual instructor-led workshops, and Vertica Essentials Certification. And it's all free. Because we believe that Vertica expertise, helps everyone accelerate their Vertica projects and the advantage that those projects deliver. Now, if you have questions or want to engage with our Vertica engineering team now, we're waiting for you on the Vertica forum. We'll answer any questions or discuss any ideas that you might have. Thank you again for joining the Vertica Big Data Conference Keynote Session. Enjoy the rest of the BDC because there's a lot more to come

Published Date : Mar 30 2020

SUMMARY :

And he'll share the exciting news And that is the platform, with a very robust ecosystem some of the best development brains that we have. the VP of Strategy and Solutions is causing a lot of organizations to back off the and especially proud of the harmonious impact And that is, the more we replicate the data, Enjoy the rest of the BDC because there's a lot more to come

ENTITIES

Entity	Category	Confidence
Stephen	PERSON	0.99+
Amy Fowler	PERSON	0.99+
Mike	PERSON	0.99+
John Yavanovich	PERSON	0.99+
Amy	PERSON	0.99+
Colin Mahony	PERSON	0.99+
AT&T	ORGANIZATION	0.99+
Boston	LOCATION	0.99+
John Yovanovich	PERSON	0.99+
Vertica	ORGANIZATION	0.99+
Joy King	PERSON	0.99+
Mike Stonebreaker	PERSON	0.99+
John	PERSON	0.99+
May 2018	DATE	0.99+
100%	QUANTITY	0.99+
Wednesday	DATE	0.99+
Colin	PERSON	0.99+
AWS	ORGANIZATION	0.99+
Vertica Academy	ORGANIZATION	0.99+
five	QUANTITY	0.99+
Joy	PERSON	0.99+
2020	DATE	0.99+
two	QUANTITY	0.99+
Uber	ORGANIZATION	0.99+
Stephen Murdoch	PERSON	0.99+
Vertica 10	TITLE	0.99+
Pure Storage	ORGANIZATION	0.99+
one	QUANTITY	0.99+
today	DATE	0.99+
Philips	ORGANIZATION	0.99+
tomorrow	DATE	0.99+
AT&T.	ORGANIZATION	0.99+
September 2019	DATE	0.99+
Python	TITLE	0.99+
www.vertica.com/bdc2020	OTHER	0.99+
One gig	QUANTITY	0.99+
Amazon	ORGANIZATION	0.99+
Second	QUANTITY	0.99+
First	QUANTITY	0.99+
15 minutes	QUANTITY	0.99+
yesterday	DATE	0.99+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Minio: