Josh Berkus, Red Hat | Postgres Vision 2021

(upbeat music) >> From around the globe, it's theCUBE with digital coverage of Postgres vision 2021 brought to you by EDB. >> Hello everybody. Welcome back to Postgres Vision 21. My name is Dave Vellante and we're super excited to have Josh Berkus on. He's joining us, he's a leader in the Kubernetes community, extremely well-versed in containerized applications, application development, containerizing databases all things Open-source, CUBE alum, Josh Berkus welcome back to theCUBE. Great to see you again. >> Thank you. I'm glad to be here. >> Just recently, you're coming off KuberCon, we heard some of the themes from that event. There was a lot of focus on inclusion and diversity, which of course, you know, that's the Open-source ethos and a lot of discussion around designing security in, the whole conversation about shift left. That's great to see larger companies giving back, to obviously a lot of the pressure over the years on the big companies that there's a one-way street, they're actually giving back, making some investments. So we love to see that. And just Open-source continues to be the main spring of innovation. I got to say, I got to call-out and a recent Red Hat survey the state of the enterprise Open-source in 2021, 90% of technology leaders said that they're adopting Open-source and made a joke that the other 10% they're doing it they just don't know it. But so what were some of your takeaways from the event and some of the trends you're seeing but specifically as it relates to containers. >> So, I mean, you're right, one thing is this sort of return to security, the security topic again because we've had like a couple of things happen. One was, when we initially got, started doing containers or platform with Docker and with early Kubernetes and that sort of thing we got a lot of container image scan, right? So you have like Clare and Docker has a scanning thing and Amazon and Azure have their own scanning things. And people felt that was kind of good enough for a while but then we both had the solar winds hack. And the thing is like, in the meantime, we've gone from a stage where people were mostly using Kubernetes in dev to people using Kubernetes in production. And there's a lot of extra security issues and vulnerabilities that come up in an actual production environment that people just didn't necessarily think about before. And so now we're looking at adding more pieces to the security stack and making those more standard for everyone who uses Kubernetes. And I've had the chance to work with the StackRox folks since they became part of Red Hat. So it's been very exciting to look at the whole thing and look at things like container supply chain because the solar winds showed us obviously, it's not enough to necessarily just trust the vendor. You need to trust their whole supply chain. And it helps to be able to examine that supply chain. >> Yeah, it's very scary when you look at that you're absolutely right. Multiple components of malware coming into an organization through the supply chain cell forming, different signatures. And so it's great to see the community spending time on that and an emphasis on that. Now I got to cut right to the chase here, in 2018, you wrote a two-part blog series it's called Should I run Postgres in Kubernetes? Obviously it's highly relevant for this community. So I want to talk about your perspective, well, first of all, the thing I love about you is you're tactical and you can go deep, but at the same time, you can speak to a business audience. >> Thanks. >> You're welcome and thank you for writing this and communicating the way you do, but talk about when it makes sense and when it doesn't, I mean, that's kind of... My big three takeaways on the pros were simplify, simplify, simplify, especially if you're running application components and other services on Kubernetes but give us the update three years later, why should you, why shouldn't? >> You know let's actually, why don't we zoom out to an even bigger picture? Which is just honestly like every new platform that we've got, right? So when virtualization and VMware became a thing we had the same sort of decisions about when do I move my database to this, when AWS and the public cloud became a thing. I could have like, like if I had written that 12 years ago I could have written it about AWS and it would have had a lot of the same decision tree 'cause what it really sort of comes down to is the more commodifiable a particular database instance is the better candidate it is to move to an advanced infrastructure platform, and the most advanced, currently being Kubernetes. To the extent that you can describe this particular database, what it does, who needs to use it, what's in it in and a simple one pager then that's probably a really good candidate for hosting on Kubernetes. Whereas if you have a database where it's like, Hey, the entire company uses it and it's so complicated we can't describe it's inputs and outputs. That's possibly the last thing in your company that you're going to migrate to Kubernetes, because both in terms of there's less gain to be made there, because the real advantage of moving stuff to Kubernetes is your ability to automate things. The whole way I got into Kubernetes in the first place was I started out way down the line not using containers at all. I was just looking to solve the problem of how do we automate Postgres high availability. That's what I was looking for. And it started out with something I built using SaltStack called handy rep, that Casey and I built. And mostly that was a problem discovery exercise, we discovered what the hard problems were there. And then we moved from that, and then we moved from that to Docker because containers offered an encapsulation strategy because one of the problems you run into when automating high availability is the database actually down or not. And so the first thing that containers offered us was not packaging, what people usually talk about but instead of encapsulation, right, because it's a lot easier to determine is the container running or not, than is the database down or not? Because an actual Postgres database has multiple components and multiple processes that make it up. And some of those can be down without the others being down which can then make you think a database is down that's not actually shut down. And being able to put that in a container, it gives me more of a binary up or down. And then from there, I got into, okay, well but I need to automate a lot of other components. I need to automate the storage and everything else. And that led to Kubernetes. And so if you look at it in terms of deciding when you're going to migrate the database to Kubernetes you look at, can I take advantage of that automation? Is this something that my application workflow and my team organization allows me to do? And if the answer is yes, particularly, if you're in a company that's doing the full dev ops thing where you have a unified development and infra team that owns the entire stack then those people are going to be a really good candidate for moving that stack to Kubernetes. >> Got it. Okay, so let me ask you, in database especially in critical apps, your recovery's everything, when something goes wrong, you got to recover. So if I understand it correctly, just in reading and listening to you, if you've got Kubernetes expertise and you're building applications in that environment then the application components are in there. And am I inferring correctly that you're going to be able to automate and facilitate high quality recovery with certainty? >> Yeah, there's a bunch of infrastructure involved, and this is why, what enterprises do is they move things like the web front-end to Kubernetes first and is what they should do, right? That is absolutely the right order of things to do because the minute that you're looking at bringing databases in, you're now looking at your whole storage infrastructure. So that direct attack storage that was attached physically to one machine is not going to work once you've moved to a container-based cloud. You suddenly need a way to be able to attach that storage to any of the nodes in your cluster so that you can move the database around and you can have fail-over. But once you build those things up, you can't. I mean, some of the stuff that I've done, I work in the office of the CTO now at Red Hat. So I'm not in production support. So the only Postgres instance I'm supporting are ones for some Open-source projects we support like the Python project. And in those cases, it's not a high criticality database, but I'm not support, I'm not on call on the weekend. I want something where it doesn't require need to be on call in order for it to stay up. And so putting that on open shift with the Patroni fail-over driver was the answer for that. And it has failed over in the Red Hat IT team contacts me and says, "Hey, we need to move those servers. And then we'll just add a node to the cluster and delete the old node and it'll do the right thing." And I don't have to worry about it, which is really what you're going for there. >> The other thing I took away from your writing was that you suggested that a lot of the successes in areas where the Postgres databases were rather small and there were a lots of them. And so to the extent that you can automate that you're going to save yourself a lot of problems. Whereas in the flip side if you're running extremely large databases or there may be performance constraint that might be an area to be a little bit more circumspect. >> Yeah and that's absolutely true because like the other side of this, like I've worked with the dev ops people and the people who are on Heroku and that sort of thing that have one database per application, right. And those people are great candidates for migrating. But then I've also worked with the people who have a one big database for the company, where the database is three terabytes in size, it powers their reporting system and their customer's system and the web portal and everything else in one database. That's the one that's really going to be a hard call and that you might in fact, never physically migrate to Kubernetes because even if it's on Kubernetes you are going to mess with the hardware policy to give it its own dedicated machine. So in that case, what I would honestly tend to do is there's a feature in Kubernetes called service catalog that allows you to expose an external service within Kubernetes as if it were a Kubernetes service. And that's what I tend to do with those kinds of databases because it's, there's not a huge advantage in actually physically moving the database to a container. There's a bunch of steps involved and going via service catalog is a lot easier. >> But essentially you're you're speaking the same language in that example that you just gave. >> Yeah. >> Now, the other thing you pointed out at the time that you wrote this article is there's a lot of pre 1.0 kind of alpha in the Kubernetes stack and it might be prudent to if, not putting your HIPAA compliant, since it evolved. >> Yeah, if I was to update two things in the article I guess that would be one of them the other one I'll get to in a minute. So the first one is that, Kubernetes has progressed along that maturity timeline. Like we recently added the production readiness reviews as part of our feature review process. We've really improved tested adherence, so that we're not releasing with known broken tests, and a bunch of other things to make it more stable. But part of it depends on who I'm talking to because there's still degrees here. So if I'm talking to the context of the world of software then Kubernetes has reached the point of maturity that it is as stable as anything else. And if you use a release, you can assume that any sort of major issues have been worked out. The one difference with it and some other platforms people may have used is it's still young enough that backwards compatibility can be an issue. As in Kubernetes releases now three times a year, we've stepped down from four and within three releases you can find yourself needing to change API calls which means needing to refactor parts of your application. So if you compare that with some other things, like a JVM platform, when's the last time you had a major API change with a JVM platform. But you know the Kubernetes is only six years old, so that's part of that. The other thing is the question is I'm talking to the Postgres community, right? Which is within Postgres, people run the daily Postgres snapshot in production. I would not do that with Kubernetes, I would wait for release. So there's still kind of a difference there if people are coming from the Postgres community, right. Is we're used to this really extreme level of stability that we have with Postgres and Kubernetes as a much younger project isn't quite there yet. >> So that's a process, a change that you would have to be aware of if you want to take the benefits of containers with Postgres, you just have to really understand that and make that process part of your change management. >> The other thing I would say has changed is there are new opportunities in running your data warehouse, your big data databases on Kubernetes. A number of platforms, the one I'm most familiar with is Citus, because I worked with those folks that have taken advantage of Kubernetes as a deployment and management platform for their database, their big data database infrastructure, which makes sense because if you look at a lot of modern data analysis and data mining platforms that are built on top of Postgres part of how they do their work is they actually run a bunch of little Postgres instances that they federate together. And then Kubernetes becomes the tool that allows you to manage all of those little Postgres instances. So that's the sort of exception to the, should I migrate this really big database? That can be a yes, if you are migrating it to a big data platform that supports Kubernetes, then it can be a huge advantage. >> Obviously you've got the practitioner knowledge and you were working in the community. I'm wondering if you can share just thinking about sort of the motivation to move to a container environment if you're one of the Postgres folks in the audience could you share any, either anecdotal or other data on business impact, benchmarks that you've seen, some of the things that you've seen some positives there? >> If you actually look at my history when you talk about performance is one, right? And if you actually look at my history, I actually did, and for that matter of some of the folks from Percona and some of our other folks in the database field did a bunch of benchmarks of running Postgres in MySQL, on Kubernetes versus running it not on Kubernetes. And one of the advantages of containers over VMS is that there isn't any intrinsic, there's not any intrinsic sort of layer gap or virtualization that modifies your performance. In other words, if a container is using storage that's present on the node where the container is running it is using that storage through Linux. And therefore the performance is, with some caveats, performance is going to be identical to if you were running that on the host system. Now, where performance differences creep in is that you might not be able to use the same kind of storage. In that Kubernetes and containers systems in general are organized around the idea that no service is using a majority of the resources on the system, so again, if you're planning on user running a larger Postgres database that really needs all the RAM that a system has you're going to have to do a lot of tinkering with Kubernetes configuration to get the same performance, you would have a running it on a dedicated hardware now. >> Okay, but fundamentally you're saying that overhead is less with caveats, like you said, you just mentioned in the story, right? >> Yeah, well, the overhead is not any different from if you were running under the host system. So a really good example of that was, if you go back to on my lightning talking in, (indistinct) Austin, I think. I showed running a benchmark with Postgres on an AWS instance using EBS storage, both not in Kubernetes and in Kubernetes. And there was no perceptible performance difference between the two of them because it was all metered by how fast was EBS for me. >> Right, and I said less, but I should've been more specific less than say you would expect with virtualization. >> Right, and then it just comes down to a business decision, which is that if you're already on some sort of cloud storage or network storage, and again you have databases that can share hardware systems then you shouldn't really expect substantial performance differences by moving to Kubernetes. That's something that you can eliminate inside of words, but if you're going in the process going to be migrating from direct attached storage to network storage then you are going to see a performance difference but that's caused by the change in storage. Or if you're going to be moving from systems that are not shared to systems that aren't shared again you're going to see a difference from them, but it wouldn't be any different than if you did that without Kubernetes containers being involved. >> If you're using any world-class shared storage device from whatever name of big vendor, you're going to accommodate if you're racking and stacking your own flash drives or worse yet spinning disk drives that's in direct attached, that's maybe a different story, so, okay. That's good. Where would you advise people to get started with Postgres and Kubernetes? >> The nice thing is there are a number of advanced systems now, and advanced systems that are supported by the various Postgres vendors. And that can actually be a great place to get started because the systems are Open-source so you can try them out. This is, as far as I know, they're Open-source you can try them out but then if you decide you like them, you can get support. And so that would include Crunchy data. Enterprise DB has a system, and honestly, I have to admit less familiar with than the ones that Crunchy runs. StackRox is another one out of Europe that has their own system for running cloud native Postgres. And there's one I'm forgetting, and what a lot of these have to do with is taking advantage of the automation. 'Cause you can obviously can put Postgres and container play around, right? But your whole point of moving to Kubernetes in general is going to be take advantage of the automation, so you want to look at the various automation platforms and you can go ahead and do that and the one I'm most familiar with because I develop it as Patroni, is the component for automating Postgres. You do Patroni plus you do operators, it's another word that comes in here. But if you're looking at this as a business you're probably going to want something that supported or that at least there's a potential to buy support and a bunch of the different companies in the Postgres space package up these components for you into a platform. Like I know the Crunchy platform uses Patroni plus some proxy stuff, plus PG back rest plus a couple of other things to give you a sort of full automation platform for running Postgres on Kubernetes. >> Awesome, last question. Where are we in the whole container adoption, we started out kind of you've mentioned this stateless and now you're building stateful applications but still you look at the, we look at spending data with our data partners ETR and containers and container orchestration. It's it's right up there with RPA, with cloud, with AI just in terms of the attention and resource that's going in. So it's exploding. It feels like it's still early days. There's a lot of legs left, what do you see? >> Yeah, well, a lot of it is, I mean you're talking about migrating IT infrastructure, right? So where we are with Kubernetes is we have the early adopters, right? We have all the people who were at the point of building their new infrastructure when Kubernetes came out, right. And people who had major unsolved problems which is a big reason for adopting a new platform was just was no old platform for you. and so we sort of have those people and those people are already on Kubernetes and running their stuff there. And so now we're looking at the really long path of people who are not in one of those camps moving, right. And in a lot of cases, that's a matter of coinciding with other reasons why they have to look at an upgrade because even if, whether it's the gradual replacement of old applications by new ones, where you gradually all the legacy applications get offline and the new applications run in Kubernetes or sometimes it's a, "Hey we're waiting for replacement cycle." We're waiting for, we already had plans to move from on-prem to public cloud, and so we're going to move from on-prem to public cloud on Kubernetes, to make it part of the migration. And that'll be years. I still like, I have fingers into other areas, like I still know a lot of people in the nonprofit space and a lot of nonprofits just got around to adopting virtualization, right? Like they're not even at public cloud yet. I don't even talk to them about Kubernetes. There's this huge long tail in terms of adoption. The nice thing is we don't show any signs of stopping, is that one of the things that we kind of learned from earlier stuff particularly learned from our friends at OpenStack was to really really focus on the APIs, to look at who Kubernetes more as the hub of a system of an infrastructure idea with potentially unbounded growth. If you have a new concept that comes in like service mesh, service mesh is not a successor to Kubernetes. It's not an alternative to Kubernetes. It is a thing you layer on top of Kubernetes because we didn't make it exclusive. >> Right. Great, great example going back to OpenStack and thank you for bringing that in because there's lessons learned. And so Josh, we've got to leave it there. Thanks so much for coming back in theCUBE, great conversation, you're awesome. >> Okay, good to talk to you. >> All right, and thank you for watching everybody, keep it right there for more content from Postgres Vision 21. My name is Dave Vellante, you're watching theCUBE. (upbeat music)

Published Date : Jun 25 2021

SUMMARY :

brought to you by EDB. Great to see you again. I'm glad to be here. and some of the trends you're seeing And I've had the chance to but at the same time, you can and communicating the way you do, and infra team that owns the entire stack to be able to automate and facilitate high so that you can move the database around that might be an area to be a and that you might in fact, in that example that you just gave. Now, the other thing you pointed out the other one I'll get to in a minute. a change that you would So that's the sort of exception to the, and you were working in the community. is that you might not be able to use from if you were running less than say you would That's something that you can people to get started and a bunch of the different but still you look at the, is that one of the things and thank you for bringing that in you for watching everybody,

ENTITIES

Entity	Category	Confidence
Dave Vellante	PERSON	0.99+
Josh Berkus	PERSON	0.99+
Europe	LOCATION	0.99+
Amazon	ORGANIZATION	0.99+
Josh	PERSON	0.99+
2018	DATE	0.99+
two	QUANTITY	0.99+
10%	QUANTITY	0.99+
2021	DATE	0.99+
two-part	QUANTITY	0.99+
90%	QUANTITY	0.99+
Red Hat	ORGANIZATION	0.99+
Postgres	ORGANIZATION	0.99+
AWS	ORGANIZATION	0.99+
Python	TITLE	0.99+
one machine	QUANTITY	0.99+
one	QUANTITY	0.99+
MySQL	TITLE	0.99+
KuberCon	EVENT	0.99+
12 years ago	DATE	0.99+
Linux	TITLE	0.99+
both	QUANTITY	0.99+
one database	QUANTITY	0.99+
three terabytes	QUANTITY	0.98+
EBS	ORGANIZATION	0.98+
Azure	ORGANIZATION	0.98+
first one	QUANTITY	0.98+
two things	QUANTITY	0.98+
OpenStack	ORGANIZATION	0.98+
three years later	DATE	0.98+
Kubernetes	TITLE	0.98+
Postgres Vision 21	ORGANIZATION	0.98+
HIPAA	TITLE	0.97+
three releases	QUANTITY	0.97+
StackRox	ORGANIZATION	0.97+
One	QUANTITY	0.97+
EDB	ORGANIZATION	0.97+
first thing	QUANTITY	0.96+
Casey	PERSON	0.94+
three times a year	QUANTITY	0.94+
four	QUANTITY	0.93+
Postgres Vision	ORGANIZATION	0.91+
one-way	QUANTITY	0.9+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for KuberCon: