Jasmine James and Ricardo Rocha | KubeCon + CloudNativeCon EU 2022

>>Welcome to the cubes coverage of C CFS, co con EU cloud native con in Valencia Spain, I'm John furrier. This is a preview interview with the co-chairs versus we have Jasmine James senior engineering manager and of developer experience and Coon cloud native con EU co-chair and RI ricotta Rocher computing engineer at CERN and Coon co-chair as well at EU. Great to have you both on great to see you, both of you, >>Hey, to be here, >>Us >>Keep alumni. So, you know, Coon just continues to roll and get bigger and bigger, um, and watching all the end user action watching the corporations enterprises come in and just all the open source projects being green litted and just all the developer onboarding has been amazing. So it should be a great EU and Vale span, great venue. A lot of people I I'm talking to are very excited, so let's get into it as co-chairs take us through kind of the upcoming schedule at a very high level. Then I wanna dig into, uh, some of the new insights into selection and program programming that you guys had to go through. I know every year it's hard. So let's start with the overall upcoming schedule for COCOM. >>Yeah. So I'll dive into that. So the schedule is represents a, quite a diverse set of topics. I would say, um, I personally am a fan of those, you know, more personal talks from an end user perspective. There's also like a lot of the representation from a community perspective and how folks can get involved. Um, as most of, you know, like our tracks, the types of tracks has evolved over the year as well. So we now have a community track student track. So it's gonna be very exciting to hear content within those tracks, um, through in Valencia. So a very exciting schedule. Um, yeah. >>And just real quick for the folks watching it's virtual and physical it's hybrid event May 4th through seventh Ricardo, what's your take on the schedule? Uh, how do you see it breaking down from a high level standpoint? >>Yeah, so, um, I'm pretty excited. Um, I think the, the fact that this hybrid will help keep, um, build on the experiences we had, uh, during the pandemic times to, to give a better experience for people not making, uh, it to Valencia. I'm pretty excited also about the number of co-located events. So the two days before the conference will include, uh, um, a large number of co-located events, focusing on security S uh, and some new stuff for like batch and HPC workloads that I'm pretty close to as well. Uh, and then some, some really good consolidation in some tracks like this value, which I think will be quite, quite interesting as well. >>So you mentioned this is gonna be like watch parties, people gonna be creating kind of satellite events. Is that what you're referring to, uh, in terms of the physical space gonna be an event, obviously, um, what's going on around, outside the event, either online or as part of the program. >>So, yeah, uh, the, the, all the sessions, uh, from, from the collocated events will be available virtually as well. I don't know if people will actually be setting up parties everywhere. <laugh>, I'm sure some people will. Yeah, >>There'll definitely be >>Some. And then for, for, for the conference itself, there will be dedicated rooms where for the virtual talks, uh, people can just join in and sit for a while and watch the virtual talks and then go back to the in person, ones, uh, Monday feel >>Like, yeah, it's always a good event. Uh, Jasmine, we talked about this last time and Ricardo, we always get into the hood as well. What's the vibe on the, the, the, the programming. And honestly, people wanna get, give talks. There's a virtual component, which opens up more aperture, uh, for more community and more actions as, as Ricardo pointed out. What's what was the process this year? Because we're seeing a lot of big trends emerge, obviously securities front and center, um, end user projects are growing data engineering is a new persona. That's just really emerged out of kind of the growth of data and the role of data that it plays and containers. And, and with Kubernetes, just a lot of action. What's the, what was it like this year in, in the selection process for the program? >>Yeah, I mean, the selection process is always lots of fun for the co-chairs. Um, you shout out to program committee, track chairs, you put in a lot of great work and reviewing talks and, and it's just a very, very thorough process. So kudos to all of us who are getting through it for this year. I think that lots of things emerge, but I still feel like security is top of mind for a lot of folks, like security is really has provided. One of the biggest, um, submissions is from, from a quantity perspective, there are tons of talks submitted for security track, and that just kind of speaks for itself, right? This is something that the cloud native community cares about, and there's still a lot of innovation and people wanna voice what they're doing and share it. >>Ricardo, what's your take, we've had a lot of chats around not only some of the hardcore tech, but some of the new waves that are emerging out of the growth, the mature maturization of, of, of the segment. What are you seeing, uh, as terms of like the, the key things that came out during the, the process? >>Yeah, exactly. So I think I would highlight something that Jasmine said, which is the, the emergency emergence of some new tracks as well. Uh, she mentioned the student track, but also we added a research track, which is actually the first time we'll have it. So I'm pretty excited about that. Of course, uh, then for the trends, clearly security observability are, uh, massive tracks for app dev operations, uh, extending Kubernetes had also a lot of submissions. Um, I think the, the main things I saw that, uh, kind of, uh, gain a bit of more consistency is the part for the business value. And, uh, the, the, the fact that people are now looking more at the second step, like managing cloud costs, uh, how to optimize, uh, spot usage and, um, usage of GPUs for machine learning, things like this. So I'm pretty excited. And all these hybrid deployments also is something that keeps coming back. So those were, are the ones that, uh, I, I think came out from, from, from the submission at this time, >>You know, it's interesting as the growth comes in, you see these cool new things happen, but there are also signs of problems that need to be solved to create opportunities. Jasmine, you mentioned security. Um, there's a lot of big trends, scale Ricardo kind of hinting at the scale piece of it, but there's all this now new things, the security posture changes, uh, as you shift left, it's not, it's not, it's not over when you shift left in security in the pipeline in there, but it's, there's audits. There's the size of, uh, the security elements, uh, there's bill of materials. Now, people who got supply chains, these are huge conversations right now in the industry, supply chain security, um, scale data, uh, optimization management, um, notifications, all this is built in, built into a whole nother level. What do you guys see in the key trends in the cloud native ecosystem? >>I, I would say that a lot of the key trends, like you said, it, right, these things are not going anywhere. It's actually coming to a point of maturation. Um, I see more of a focus on how consuming, how, how companies go about consuming these different capabilities. What is that experience like? There's a talk that's gonna be offered, um, as a keynote, um, just about that security and leveraging developers to scale security within your environment. And not only is it a tool problem, it's a mindset thing that you have to be able to get over and partner bridge gaps between teams in order to make this, um, a reality within, within, um, people, within certain organizations. So I see the experience part of it, um, coming a big, a big thing. Um, there's multiple talks about that. >>Ricardo, what's your take on these trends? Cause I look at the, the, the paragraph of the projects now it's like this big used to be like a couple sentences. Now you got more projects coming on, you got the rookies in there and you got the, the veterans, the veteran projects in there. So this speaks volumes to kind of things like notaries new, right? So this is cool. Wait, what does that mean? Okay. Security auditing all this is happening. What are the, what are the big trends that you're excited about that you see that people are gonna be digging in, in, in the pro in, in the event? >>Yeah, I think we, we, we talked about supply chain just before. I think that's, that's a big one. We, we saw a, a keynote back in north America already introducing this, and we saw a lot of consolidation happening now in projects, but also companies supporting this project. Um, I, I'm also quite interested, interested in the evolution of Kubernetes in the sense that it's not just for, what was it, it was traditionally used for like traditional it services and scaling. We start seeing, there will be a very cool keynote from, from deploying, uh, Kubernetes at the edge, but really at the edge with the lower orbit satellites running ES in basically, uh, space. So those things I think are, are, are very cool. Like we start seeing really a lot of consolidation, but also people looking at Kubernetes for, for pretty crazy things, which is very exciting. >>Yeah. You mention, you mentioned space that really takes us to a whole edge, another level of edge thinking, um, you know, I've had many conversations around how do you do break fixing space with some folks in, in the space industry, in, in public sector, software is key in all this. And again, back to open source, open source has to be secured. It has to be, be able to managed effectively. It needs to be optimized into the new workflows space is one of them, you know, you see in, um, 5g edge is huge, uh, with new kind of apps that are being built there. So open source plays a big role in all this. So the, the question I wanna ask you guys is as open source continues to grow and it's growing, we're seeing startups emerge with the playbook of you. You play an open source or you actually create a project and then you get funding behind it because I know at least three or four VCs here in Silicon valley that look at the projects and say, they're looking for deals. And they're saying, keep it open a whole nother level. Can you guys share your insights on how the ecosystem's, uh, evolving with entrepreneurship and, and startups? >>Uh, oh, I guess I'll start. Um, I think that it's such a healthy thing, um, to have such innovation occurring, um, is it's really just, uh, Testament as to how the cloud native community right. Nurtures and cultivates these ideas and provides a great framework for them to develop over time, going from, you know, the sandbox and incubating and graduating and having the support of a solid framework, I think is a lot of the reason why a lot of these projects grow so quickly and reach certain these high levels of adoption. Um, so it's a really fantastic thing to see. I think that, you know, VCC an opportunity and, and, and there's a lot of great innovation that can be, you know, operationalized and scaled, right. Um, and applied to a lot of industries. So I feel it, I feel like it's a very healthy thing. Um, it also creates a lot of opportunities about something I'm passionate about, which is like, you know, people getting involved in open source as a step into the world of tech. Um, so all of these projects coming about provide an opportunity for folks to get involved in a particular component they're interested in and then grow their career in open source. So really great thing, in my opinion. >>And you mentioned the student track, by the way, I kept to point that out. I mean, that's huge. That's gonna be a lot of people who have, you know, in computer science programs or self learning. I mean, the, the, the ability to get up to speed, uh, from a development standpoint, as a coder, um, you can be a rural comp SI or, uh, just a practitioner just coding. I mean, data's everywhere. So data engineering, coding, I mean, Ricardo, this is huge student and then just every sector's opening up. I mean, the color codes on the calendar is, uh, larger than ever before. >>Yeah. I think, yeah, the, the diversity of the usage and the communities is, is something that is really important and it's been growing still. So I, I think this one not stop. Um, I'm pretty, pretty, pretty excited to see also how we'll handle this growth, because as you mentioned, like everything is increasing in numbers, number of projects, number of startups around this project. Uh, so one, one thing that I'm particularly interested on as an end user is to understand also how to help other end users that are jumping in not only the, the developers or, or the people wanting to support these projects, but also the end users. How, how do they choose their sta how, how it's, how, how should they look like for their use cases, much more than just going, uh, from, from the selection, individual projects to understand how they, they work together. So I think this is a challenge for, for the next couple of years. >>Yeah. I mean, roll your own and building blocks, whatever you wanna call it, you're starting to see people, uh, build their own stacks. And that's not a bad thing. It might be a feature, not a bug. >>Yeah. I, I would agree that I think it's something that we have to work on, uh, together to, to, to help, especially people starting in the ecosystem, but also for, for the experienced ones that start looking at other use cases as well. >>Okay. Jasmine, we talked about this last time, you gotta pick a favorite, uh, child in the, in the, in the agenda. Uh, what's your favorite session? Um, and you gotta pick one or three or maybe put handful, um, as you guys look through this year, what's the theme. I mean, people like you can kind of sense what's happening. Uh, when you look at the agenda, obviously observability is in there, all these great stuff's in there, but what's the, what's your favorite, um, uh, project or topic this year that, uh, you're jazzed about >>For me, I I'd say there's such diverse, um, topics that are being presented both on the keynote stage and throughout, um, the various tracks. I will just reference, um, the talk that I, I sort of alluded to earlier about, um, leveraging developers to scale Kubernetes. Um, it's a talk given by red hat on the keynote stage. Um, I just think it, you know, the abstracts will me because it's talks about bridging two different roles together, um, and scaling what we all know to be so important within the cloud native space, security and Kubernetes. So it's something that's very like real for me, um, in, in my current role and previous roles. So I think that that's the one that spoke to me. >>Awesome. Ricardo, what's your favorite, uh, this year? What do you, what do you, uh, if you had to put a little gold star on something that you're interested in, what it would it be? >>I think I hinted on, on it just before, which is, uh, I'm, I'm kind of a space enthusiast. So all, all this idea of running Kubernetes in space, um, makes me very excited. So really looking forward to that one, but as an end user, I'm also very interested in talks. Uh, like the one Mercedes will be doing, which is the transition from a kind of a more traditional company to this, uh, uh, more modern world of, uh, cloud native. And I'm quite interested to hear how, how, what their experience has been has been like in the last few years. >>Well, you guys do a great job. I love chatting with you and I love, uh, CNCF and following from the beginning, we were there when it was, when it was created and watched it grow from an insider perspective, the hyperscalers people who are really kind of eating glass and building scale, you know, SREs. Now you have, you have the SRE concept going kind of global mainstream, seeing enterprises and end users contributing and participating enterprises, getting, connecting those two worlds. Jasmine, as you said, as you look at that, you're starting to see the scale piece become huge. You mentioned it a little bit earlier, Jasmine, the SRE role was specific to servers and cloud. You're kind of seeing that kind of role needed for this kind of cloud native layer. We're seeing it with data engineering. It's not for the faint of heart. It may not be a persona. That's got zillions of people, but it scales. It's like an SRE role. You're seeing that with this kind of monitoring and, and with containers and Kubernetes where it's gotta get easier and scale, how do you guys see that? Do you see that emerging in the community, this, this kind of new scale role and, um, what is it, what is this trend? Or maybe I'm misrepresenting it or maybe I'm sensing it wrong, but what do you guys think about the scale piece? How is that F falling into place? >>Yeah, I, I think that is, um, adoption, like, or there's more saturation of, of cloud native technologies within any environment. Um, most in most companies realize that you have to have that represented right within the role that is managing it. Um, if you wanna have it be reliable. Um, so I think that a lot of roles are adopting those behaviors, right. In order to be able to sustain this within their environment and learning as they start to implement these things. Um, so I see that to be something that just happens. Um, we saw it was like DevOps, right? You know, engineers were starting to adopt, you know, working on the systems versus just, you know, working on software. Um, so it's sort of like encompassing all the things, right. We're, we're seeing a shift in the role and, and the behaviors that are within it in order to maintain these cloud native services. So >>Ricardo, what's your take, we've been seeing engineers get to the front lines more and more. Uh, you guys mentioned business value as one of the tracks and, uh, focus topics this year, it's happening, engineers and developers. They're getting in the front lines cuz as you move up that stack, whether it's a headless system for retail or deploying something in another sector, they gotta be in the front lines. If you're gonna be in doing machine learning and have data, you gotta have domain scales about what the business is. Right? >>Yeah. I, I, I agree very much with what Jasmine said and, and uh, if we add this for, for kind of the business value and the, this opportu opportunistic usage of, uh, all types of resources that can come from basically anywhere these days, I think this is, this is really becoming, um, a real role to, to understand how, how to best, uh, use all of this and uh, to, to make the best of all this available resources. When we start talking about, uh, CPUs, it's already important. If we start talking about GPU's, which are more scar or some sort of specialized accelerators, then, then it becomes really like something that, uh, you, you need people that know where, where to go and fish for those. Cause they, they, you can just build your own data center and, and scale that anymore. So you really need to understand what's out there. >>Applications gotta have the security posture nailed down. They gotta have it. Automation built in. You gotta have the observability, you gotta have the business value. I mean, it sounds like a mature industry developing here finally. It's happening. Good job guys. Thanks for coming on the queue. Really appreciate it. >>Thank you. Thank you for having >>Us. And we'll see the cube here at Koon cloud native con May 16th through the 20th in Vale Spain, the cube will be there. We'll have some online coverage as well. Look for the virtual from CNCF. The cube will bring all the, all the action. I'm John fur, your host, see you in Spain and see you on the 16th.

Published Date : May 10 2022

SUMMARY :

Great to have you both on great to see you, both of you, that you guys had to go through. of those, you know, more personal talks from an end user perspective. So the two days before the conference will include, So you mentioned this is gonna be like watch parties, people gonna be creating kind of satellite events. from, from the collocated events will be available virtually as well. talks and then go back to the in person, ones, uh, Monday feel of kind of the growth of data and the role of data that it plays and containers. Um, you shout out to program committee, track chairs, you put in a lot of great work and reviewing What are you seeing, uh, as terms of like the, the key things that came out during Uh, she mentioned the student track, but also we added a research track, which is actually the first time You know, it's interesting as the growth comes in, you see these cool new things happen, but there are also signs So I see the experience part of it, um, coming a big, a big thing. Now you got more projects coming on, you got the rookies in there and you got the, Um, I, I'm also quite interested, interested in the evolution of Kubernetes in the sense the new workflows space is one of them, you know, you see in, um, 5g edge is huge, I think that, you know, VCC an opportunity and, and, and there's a lot of great innovation that can I mean, the color codes on the calendar is, uh, larger than ever before. So I think this is a challenge for, for the next couple of years. uh, build their own stacks. but also for, for the experienced ones that start looking at other use cases as well. Um, and you gotta pick one or three I just think it, you know, the abstracts will me because it's talks about bridging two different Ricardo, what's your favorite, uh, this year? So all, all this idea of running Kubernetes in space, um, makes me very excited. I love chatting with you and I love, uh, CNCF and following from the beginning, Um, if you wanna have it be reliable. They're getting in the front lines cuz as you move up that stack, So you really need to understand what's out there. You gotta have the observability, you gotta have the business value. Thank you for having the cube will be there.

ENTITIES

Entity	Category	Confidence
Ricardo	PERSON	0.99+
Jasmine	PERSON	0.99+
CERN	ORGANIZATION	0.99+
Spain	LOCATION	0.99+
May 4th	DATE	0.99+
Jasmine James	PERSON	0.99+
Coon	ORGANIZATION	0.99+
Valencia	LOCATION	0.99+
Jasmine James	PERSON	0.99+
Ricardo Rocha	PERSON	0.99+
Mercedes	ORGANIZATION	0.99+
north America	LOCATION	0.99+
both	QUANTITY	0.99+
May 16th	DATE	0.99+
second step	QUANTITY	0.99+
Valencia Spain	LOCATION	0.99+
Silicon valley	LOCATION	0.99+
John fur	PERSON	0.99+
two	QUANTITY	0.98+
this year	DATE	0.98+
three	QUANTITY	0.98+
20th	DATE	0.98+
Vale Spain	LOCATION	0.97+
CloudNativeCon	EVENT	0.97+
one	QUANTITY	0.96+
ricotta Rocher	PERSON	0.95+
CNCF	ORGANIZATION	0.95+
zillions of people	QUANTITY	0.94+
KubeCon	EVENT	0.94+
four VCs	QUANTITY	0.93+
one thing	QUANTITY	0.93+
first time	QUANTITY	0.92+
Kubernetes	TITLE	0.92+
two worlds	QUANTITY	0.91+
John furrier	PERSON	0.91+
16th	DATE	0.9+
C CFS	ORGANIZATION	0.89+
EU	LOCATION	0.89+
Monday	DATE	0.87+
One	QUANTITY	0.85+
Kubernetes	ORGANIZATION	0.82+
two days	DATE	0.81+
seventh	QUANTITY	0.79+
roles	QUANTITY	0.77+
EU	ORGANIZATION	0.76+
pandemic	EVENT	0.76+
COCOM	ORGANIZATION	0.73+
next couple of years	DATE	0.71+
tons of talks	QUANTITY	0.7+
con	EVENT	0.69+
ink	QUANTITY	0.67+
SRE	TITLE	0.61+
Koon	EVENT	0.61+
at least three	QUANTITY	0.58+
VCC	ORGANIZATION	0.58+
last	DATE	0.57+
Vale	LOCATION	0.51+
cloud native	ORGANIZATION	0.46+
5g	TITLE	0.42+
EU	EVENT	0.42+
2022	DATE	0.31+

Ricardo Rocha, CERN | KubeCon + CloudNativeCon Europe 2021 - Virtual

>>from around the globe. It's >>the cube >>with coverage of >>Kublai khan and >>Cloud Native Con, Europe 2021 virtual brought >>to you by red hat, >>the cloud Native >>Computing foundation and ecosystem partners. Hello, welcome back to the cubes coverage of Kublai khan. Cloud Native Con 2021 part of the CNC. S continuing cube partnership virtual here because we're not in person soon, we'll be out of the pandemic and hopefully in person for the next event. I'm john for your host of the key. We're here with ricardo. Roach computing engineers sir. In CUBA. I'm not great to see you ricardo. Thanks for remote ng in all the way across the world. Thanks for coming in. >>Hello, Pleasure. Happy to be here. >>I saw your talk with Priyanka on linkedin and all around the web. Great stuff as always, you guys do great work over there at cern. Talk about what's going on with you and the two speaking sessions you have it coop gone pretty exciting news and exciting sessions happening here. So take us through the sessions. >>Yeah. So actually the two sessions are kind of uh showing the two types of things we do with kubernetes. We we are doing we have a lot of uh services moving to kubernetes, but the first one is more on the services we have in the house. So certain is known for having a lot of data and requests, requiring a lot of computing capacity to analyze all this data. But actually we have also very large community and we have a lot of users and people interested in the stuff we do. So the first question will actually show how we've been uh migrating our group of infrastructure into the into communities and in this case actually open shift. And uh the challenge there is to to run a very large amount of uh global websites on coordinators. Uh we run more than 1000 websites and there will be a demonstration on how we do all the management of the website um life cycle, including upgrading and deploying new new websites and an operator that was developed for this purpose. And then more on the other side will give with a colleague also talk about machine learning. Machine learning has been a big topic for us. A lot of our workloads are migrating to accelerators and can benefit a lot from machine learning. So we're giving a talk about a new service that we've deployed on top of Cuban areas where we try to manage to uh lifecycle of machine learning workloads from data preparation all the way to serving the bottles, also exploring the communities features and integrating accelerators and a lot of accelerators. >>So one part of the one session, it's a large scale deployment kubernetes key to there and now the machine learning essentially service for other people to use that. Right? Like take me through the first large scale deployment. What's the key innovation there in your opinion? >>Yeah, I think compared to the infrastructure we had before, is this notion that we can develop an operator that will uh, manage resource, in this case a website. And this is uh, something that is not always obvious when people start with kubernetes, it's not just an orchestra, it's really the ap and the capability of managing a huge amount of resources, including custom resources. So the possibility to develop this operator and then uh, manage the lifecycle of uh, something that was defined in the house and that fits our needs. Uh, There are challenges there because we have a large amount of websites and uh, they can be pretty active. Uh, we also have to some scaling issues on the storage that serves these these websites and we'll give some details uh during the talk as well, >>so kubernetes storage, this is all kind of under the covers, making this easier. Um and the machine learning, it plays nicely in that what if you take us for the machine learning use case, what's going on there, wow, what was the discovery, How did you guys put that together? What's the key elements there? >>Right, so the main challenge there has been um that machine learning is is quite popular but it's quite spread as well, so we have multiple groups focusing on this, but there's no obvious way to centralize not only the resource usage and make it more efficient, but also centralize the knowledge of how these procedures can be done. So what we are trying to do is just offer a service to all our users where we help them with infrastructure so that they don't have to focus on that and they could focus just on their workloads and we do everything from exposing the data systems that we have in the house so that they can do access to the data and data preparation and then doing um some iteration using notebooks and then doing distributed training with potentially large amount of gps and that storage and serving up the models and all of this is uh is managed with the coordinates cluster underneath. Uh We had a lot of knowledge of how to handle kubernetes and uh all the features that everyone likes scalability. The reliability out of scaling is very important for this type of workload. This is, this is key. >>Yeah, it's interesting to see how kubernetes is maturing, um congratulations on the projects. Um they're going to probably continue to scale. Remember this reminds me of when I was uh you know coming into the business in the 98 late eighties early nineties with TCP I. P. And the S. I. Model, you saw the standards evolve and get settled in and then boom innovation everywhere. And that took about a year to digest state and scale up. It's happening much faster now with kubernetes I have to ask you um what's your experience with the question that people are looking to get answered? Which is as kubernetes goes, the next generation of the next step? Um People want to integrate. So how is kubernetes exposing a. P. I. S. To say integration points for tools and other things? Can you share your experience and where this is going, what's happening now and where it goes? Because we know there's no debate. People like the kubernetes aspect of it, but now it's integration is the conversation. Can you share your thoughts on that? >>I can try. Uh So it's uh I would say it's a moving target, but I would say the fact that there's such a rich ecosystem around kubernetes with all the cloud, David projects, uh it's it's uh like a real proof that the popularity of the A. P. I. And this is also something that we after we had the first step of uh deploying and understanding kubernetes, we started seeing the potential that it's not reaching only the infrastructure itself, it's reaching all the layers, all the stack that we support in house and premises. And also it's opening up uh doors to easily scale into external resources as as well. So what we've been trying to tell our users is to rely on these integrations as much as possible. So this means like the application lifecycle being managed with things like Helmand getups, but also like the monitoring being managed with Prometheus and once you're happy with your deployment in house we have ways to scale out to external resources including public clouds. And this is really like see I don't know a proof that all these A. P. I. S are not only popular but incredibly useful because there's such a rich ecosystem around it. >>So talk about the role of data in this obviously machine learning pieces something that everyone is interested in as you get infrastructure as code and devops um and def sec ops as everything's shifting left. I love that, love that narrative day to our priests. All this is all proving mature, mature ization. Um data is critical. Right? So now you get real time information, real time data. The expectations for the apps is to integrate the data. What's your view on how this is progressing from your standpoint because machine learning and you mentioned you know acceleration or being part of another system. Cashing has always done that would say databases. Right. So you've got now is databases get slower, caches are getting faster now they're all the ones so it's all changing. So what's your thoughts on this next level data equation into kubernetes? Because you know stateless is cool but now you've got state issues. >>Yeah so uh yeah we we've always had huge needs for for data we store and I I think we are over half an exhibit of data available on the premises but we we kind of have our own storage systems which are external and that's for for like the physics data, the raw data and one particular charity that we had with our workloads until recently is that we we call them embarrassing parallel in the sense that they don't really need uh very tight connectivity between the different workloads. So if it's people always say tens of thousands of jobs to do some analysis, they're actually quite independent, they will produce a lot more data but we can store them independently. Machine learning is is posing a challenge in the sense that this is a training tends to be a lot more interconnected. Um so it can be a benefit from from um systems that we are not so familiar with. So for us it's it's maybe not so much the cashing layers themselves is really understanding how our infrastructure needs to evolve on premises to support this kind of workloads. We had some smallish uh more high performance computing clusters with things like infinite and for low latency. But this is not the bulk of our workloads. This is not what we are experts on these days. This is the transition we are doing towards uh supporting this machine learning workers >>um just as a reference for the folks watching you mentioned embarrassing parallel and that's a quote that you I read on your certain tech blog. So if you go to tech blog dot web dot search dot ch or just search cern tech blog, you'll see the post there um and good stuff there and in there you go, you lay out a bunch of other things too where you start to see the deployment services and customer resource definitions being part of this, is it going to get to the point where automation is a bigger part of the cluster management setting stuff up quicker. Um As you look at some of the innovations you're doing with machines and Coubertin databases and thousands of other point things that you're working on there, I mean I know you've got a lot going on there, it's in the post but um you know, we don't want to have the problem of it's so hard to stand up and manage and this is what people want to make simpler. How do you how do you answer that when people say say we want to make it easier? >>Yeah. So uh for us it's it's really automate everything and up to now it has been automate the deployment in the kubernetes clusters right now we are looking at automating the kubernetes clusters themselves. So there's some really interesting projects, uh So people are used to using things like terra form to manage the deployment of clusters, but there are some projects like cross playing, for example, that allows us to have the clusters themselves being resources within kubernetes. Uh and this is something we are exploring quite a bit. Uh This allows us to also abstract the kubernetes clusters themselves uh as uh as carbonated resources. So this this idea of having a central cluster that will manage a much larger infrastructure. So this is something that we're exploring the getups part is really key for us to, it's something that eases the transition from from from people that are used already to manage large scale systems but are not necessarily experts on core NATO's. Uh they see that there's an easier past there if they if they can be introduced slowly through through the centralized configuration. >>You know, you mentioned cross plane, I had some on earlier, he's awesome dude, great guy and I was smiling because you know I still have you know flashbacks and trigger episodes from the Hadoop world, you know when it was such so promising that technology but it was just so hard to stand up and managed to be like really an expert to do that. And I think you mentioned cross plane, this comes up to the whole operator notion of operating the clusters, right? So you know, this comes back down to provisioning and managing the infrastructure, which is, you know, we all know is key, right? But when you start getting into multi cloud and multiple environments, that's where it becomes challenging. And I think I like what they're doing is that something that's on your mind to around hybrid and multi cloud? Can you share your thoughts on that whole trajectory? >>Absolutely. So I actually gave an internal seminar just last week describing what we've been playing with in this area and I showed some demo of using cross plane to manage clusters on premises but also manage clusters running on public clouds. A. W. S. Uh google cloud in nature and it's really like the goal there. There are many reasons we we want to explore external resources. We are kind of used to this because we have a lot of sites around the world that collaborate with us, but specifically for public clouds. Uh there are some some motivations there. The first one is this idea that we have periodic load spikes. So we knew we have international conferences, the number of analysis and job requests goes up quite a bit, so we need to be able to like scale on demand for short periods instead of over provisioning this uh in house. The second one is again coming back to machine learning this idea of accelerators. We have a lot of Cpus, we have a lot less gPS uh so it would be nice to go on fish uh for those in the public clouds. And then there's also other accelerators that are quite interesting, like CPUs and I p u s that will definitely play a role and we probably, or maybe we will never have among premises, will only be able to to use them externally. So in that, in that respect, actually coming back to your previous question, this idea of storage then becomes quite important. So what we've been playing with is not only managing this external cluster centrally, but also managing the wall infrastructure from a central place. So this means uh, making all the clusters, whatever they are look very, very much the same, including like the monitoring and the aggregation of the monitoring centrally. And then as we talked about storage, this idea of having local storage that that will be allow us to do really quick software distribution but also access to the data, >>what you guys are doing as we say, cool. And relevant projects. I mean you got the large scale deployments and the machine learning to really kind of accelerate which will drive a lot of adoption in terms of automation. And as that kicks in when you got to get the foundational work done, I see that clearly the right trajectory, you know, reminds me ricardo, um you know, again not do a little history lesson here, but you know, back when network protocols were moving from proprietary S N A for IBM deck net for digital back in the history the old days the os I Open Systems Interconnect Standard stack was evolving and you know when TCP I P came around that really opened up this interoperability, right? And SAM and I were talking about this kind of cross cloud connections or inter clouding as lou lou tucker. And I talked that open stack in 2013 about inter networking or interconnections and it's about integration and interoperability. This is like the next gen conversation that kubernetes is having. So as you get to scale up which is happening very fast as you get machine learning which can handle data and enable modern applications really it's connecting networks and connecting systems together. This is a huge architectural innovation direction. Could you share your reaction to that? >>Yeah. So actually we are starting the easy way, I would say we are starting with the workloads that are loosely coupled that we don't necessarily have to have this uh tighten inter connectivity between the different deployments, I would say that this is this is already giving us a lot because our like the bulk of our workloads are this kind of batch, embarrassing parallel, uh and we are also doing like co location when we have large workloads that made this kind of uh close inter connectivity then we kind of co locate them in the same deployment, same clouds in region. Um I think like what you describe of having cross clouds interconnectivity, this will be like a huge topic. It is already, I would say so we started investigating a lot of service measure options to try to learn what we can gain from it. There is clearly a benefit for managing services but there will be definitely also potential to allow us to kind of more easily scale out across regions. There's we've seen this by using the public cloud. Some things that we found is for example, this idea of infinite, infinite capacity which is kind of sometimes uh it feels kind of like that even at the scale we have for Cpus But when you start using accelerators, Yeah, you start negotiating like maybe use multiple regions because there's not enough capacity in a single region and you start having to talk to the cloud providers to negotiate this. And this makes the deployments more complicated of course. So this, this interconnectivity between regions and clouds will be a big thing. >>And, and again, low hanging fruit is just a kind of existing market but has thrown the vision out there mainly to kind of talk about what what we're seeing which is the world's are distributed computer. And if you have the standards, good things happen. Open systems, open innovating in the open really could make a big difference is going to be the difference between real value for the society of global society or are we going to get into the silo world? So I think the choice is the industry and I think, you know, Cern and C and C. F and Lennox Foundation and all the companies that are investing in open really is a key inflection point for us right now. So congratulations. Thanks for coming on the cube. Yeah, appreciate it. Thank you. Okay, Ricardo, rocha computing engineer cern here in the cube coverage of the CN Cf cube con cloud, native con europe. I'm john for your host of the cube. Thanks for watching.

Published Date : May 5 2021

SUMMARY :

from around the globe. I'm not great to see you ricardo. Happy to be here. what's going on with you and the two speaking sessions you have it coop gone pretty exciting news the two types of things we do with kubernetes. So one part of the one session, it's a large scale deployment kubernetes key to there and now So the possibility to Um and the machine learning, it plays nicely in that what if you take us for the machine learning use case, the data systems that we have in the house so that they can do access to the data and data preparation in the 98 late eighties early nineties with TCP I. P. And the S. I. Model, you saw the standards that the popularity of the A. P. I. And this is also something that we So talk about the role of data in this obviously machine learning pieces something that everyone is interested in as This is the transition we are doing towards So if you go to tech blog dot web dot search dot ch Uh and this is something we are exploring quite a bit. this comes back down to provisioning and managing the infrastructure, which is, you know, we all know is key, The first one is this idea that we have periodic load spikes. and the machine learning to really kind of accelerate which will drive a lot of adoption in terms of uh it feels kind of like that even at the scale we have for Cpus But when you open innovating in the open really could make a big difference is going to be the difference

ENTITIES

Entity	Category	Confidence
Priyanka	PERSON	0.99+
Ricardo Rocha	PERSON	0.99+
2013	DATE	0.99+
David	PERSON	0.99+
IBM	ORGANIZATION	0.99+
two sessions	QUANTITY	0.99+
first question	QUANTITY	0.99+
CERN	ORGANIZATION	0.99+
two types	QUANTITY	0.99+
Ricardo	PERSON	0.99+
more than 1000 websites	QUANTITY	0.99+
last week	DATE	0.99+
CUBA	LOCATION	0.99+
98 late eighties	DATE	0.99+
NATO	ORGANIZATION	0.99+
Lennox Foundation	ORGANIZATION	0.98+
two speaking sessions	QUANTITY	0.98+
first one	QUANTITY	0.98+
thousands	QUANTITY	0.98+
Cloud Native Con	EVENT	0.98+
second one	QUANTITY	0.97+
Cloud Native Con 2021	EVENT	0.97+
first step	QUANTITY	0.97+
one session	QUANTITY	0.96+
C. F	ORGANIZATION	0.96+
KubeCon	EVENT	0.95+
C	ORGANIZATION	0.95+
ricardo	PERSON	0.95+
linkedin	ORGANIZATION	0.95+
tens of thousands of jobs	QUANTITY	0.95+
john	PERSON	0.95+
Prometheus	TITLE	0.95+
one part	QUANTITY	0.94+
europe	LOCATION	0.94+
about a year	QUANTITY	0.93+
cloud Native	ORGANIZATION	0.9+
2021	EVENT	0.89+
one particular charity	QUANTITY	0.88+
pandemic	EVENT	0.81+
red hat	ORGANIZATION	0.81+
single region	QUANTITY	0.81+
Helmand	TITLE	0.81+
Kublai khan	PERSON	0.8+
first large	QUANTITY	0.8+
Cuban	LOCATION	0.8+
Cern and	ORGANIZATION	0.79+
Europe	LOCATION	0.78+
P.	OTHER	0.77+
Coubertin	ORGANIZATION	0.75+
early nineties	DATE	0.7+
CloudNativeCon Europe 2021	EVENT	0.7+
over half	QUANTITY	0.68+
form	TITLE	0.68+
con	COMMERCIAL_ITEM	0.67+
S. I. Model	OTHER	0.67+
Kublai khan	PERSON	0.65+
TCP I.	OTHER	0.65+
Cf	COMMERCIAL_ITEM	0.64+
deployment	QUANTITY	0.56+
services	QUANTITY	0.53+
google	ORGANIZATION	0.48+
SAM	ORGANIZATION	0.46+
P. I.	OTHER	0.4+
native con	COMMERCIAL_ITEM	0.37+

Ricardo Rocha, CERN | KubeCon + CloudNativeCon NA 2020

from around the globe it's thecube with coverage of kubecon and cloudnativecon north america 2020 virtual brought to you by red hat the cloud native computing foundation and ecosystem partners hey welcome back everybody jeff frick here with thecube coming to you from our palo alto studios for the continuing coverage of kubecon cloud native con 2020 north america there was the european version earlier in the summer it's all virtual uh so the good news is we don't have to get on planes and we can get guests from all over the world and we're excited to welcome back for his return to the cube ricardo rocha he is a staff member and computing engineer at cern ricardo great to see you hello thanks for having me absolutely and you're coming in from uh from geneva so you're you already had a good thursday i bet yeah we're just finishing right now yeah right so in in getting ready for this um interview i was looking at the interview that you did i think it was two cube cons ago uh in may of 2019 and it just strikes me a lot of people know what cern is but a lot of people don't know what's cern in so i wonder if you can just give you know kind of the 101 of what cern's mission is and what is some of the work that you guys do there yeah sure uh so cern is the european organization for uh nuclear research we are the largest particle physics laboratory in the world and our main mission is uh fundamental research so we try to answer big questions about why don't we see antimatter what is dark matter or dark energy other questions about the origin of the universe and to answer these questions we build very large machines particle accelerators where we try to recreate some of [Music] the moments just after the universe was created the big bang to try to understand better what was the state of the matter at that time the result of all of this is very often a lot of data that has to be analyzed and that's why we traditionally have had a huge requirements for computing resources during the the start of cern we always had this this large large requirements right and so you have this large particle accelerators as you said large machines the one that you've got now the the latest one how long has that one been operational yeah so it started uh like maybe around 10 years ago the first launch was a bit before that uh and it's uh it's a very large uh it's the largest one ever built so it's 27 kilometers in perimeter we inject protons into different uh directions and then we we make them collide where we build these huge detectors that can can see what's happening in these collisions uh the the main the main particle accelerator is this one we do have other experiments we have a nancy meta factory that is just uh down from my office and we have other types of experiments as well going right 27 kilometers that's a big that's a big number and then and then again just so people get some type of sense of scale so then you you you speed up the particles you smash them together you see what happens they collect all the data what types of data sets are generated off off just a one you know kind of event and i don't even know if that's a relative you know if that's a valid measure how do how do you measure kind of quantities of data around event just you know kind of for orders of magnitude right so uh the way it works is as you said we accelerate the particles to very close to the speed of light and we increase the energy by by having the beams well controlled and then at specific points we make them collide we have this gigantic detectors underground all of this is 100 meters in the ground and these detectors are pretty much a very large camera that would take something like 40 million pictures a second and the result of this is a huge amount of data each of these detectors can generate up to one petabyte of second this is not something we can record so what we do is we have hardware filters that will bring this down to something we can manage which is in the order of a few tens of gigabytes per second wow so you've been you've got a very serious computing challenge ahead of you because you're the one that's on the hook for for grabbing the data recording the data making the data available for for people to use um on their experiments um so we're here at kubecon cloud native con where did containers come into the story uh and and kubernetes specifically what was the real uh challenge that you're trying to overcome yeah so uh this is a a long story of uh using distributed computing at cern and other types of computing so as i mentioned we generate a lot of data we generate something like 7 but of 70 petabytes of data every year and we accumulated something over one half an exabyte of data by now so uh traditionally we've had to build this software ourselves um which was uh because there was not so many people around that would have this kind of needs but this revolution with containers and the clouds appearing kind of allowed us to to join other other communities and benefit also from their work and not have to do everything ourselves so this is the main probe for us to start doing this the other point is more containerization we traditionally are very we have a lot of needs to share information but also share resources between physicists and engineers so this idea of containerizing the work including all the code all the data and then sharing this with our colleagues is very appealing the fact that we can also take this unit of work and just deploy it in any infrastructure that has a standardized api like kubernetes and scale that monitoring the same way it's also very appealing so all of these things kind of connect with our way of working our natural way of working i would say right so you've talked about the this upgrade is coming um to the particle accelerator in a couple four or five years whatever that timeline is relatively soon um this as you've said before is a huge step function in the data that's that that's going to come off these experiments i mean how are you keeping up on the compute side with the fundamental shift in on kind of the physics side and the data that's going to be generated to make sure that you can keep up and i think you said it in a prior interview somewhere along the way that you know you don't want to be the bottleneck when there's all this great work being done but if it's not captured and made available for people to do stuff with the data then you know it's not uh it's not the greatest experiment so how are you keeping up and and what's the relative scale to have what you got to do on the compute side to keep up with the the guys on the physics side yeah so the the the idea well we what we will have to deal with is an increase of 10 times of more data than we have today we already have a lot and very soon we'll have a lot more but this is not i would say this is not the first time this kind of uh step happens uh in our computing we always kind of found a new technology or a new way to do things that would improve in in this case uh what we do is we do what we always do which is we try to look for all sorts of new technologies or all sorts of new resources that we could make use of in this case a lot is involving improving our own software to replace what we currently use with hardware triggers to replace that with software-based using accelerators gpus and other types of accelerators this will play a big role and also making our software more efficient in this way the second thing that we are doing is trying to make our infrastructure more agile and this is where cloud native kubernetes plays a huge role so that we can benefit from external resources uh we we can always think of like expanding our in on-premises resources but it's also very good to be able to just go and fish around if there's something available externally kubernetes plays a very big role in that respect as well yeah i'd love to dig into that a little deeper because the cloud native foundation is a super active foundation obviously a ton of activity around kubernetes so what does that mean to you as an infrastructure provider you know to your own company being on the hook to have now you know kind of an open source community that's supporting you indirectly via ongoing developments and ongoing projects and having as you said kind of this broader group of brain power to pull from to help you move your own infrastructure along yeah i think this this is great we've had really good experiences in the past we've been uh heavy users of uh linux from from from for a very long time we've used openstack for our private cloud and we've been heavily involved in that community as well we not only uh contribute as end users but we also uh offer some some manpower for development and helping with the community and we are doing the same with kubernetes uh and this is uh this is really we we end up getting a lot more than we we are putting in the community we are quite involved but uh it's so large and and and with such big players that have very similar needs to ours that uh we end up having a lot a lot more back than we are putting in we try to help as much as possible but uh yeah we have limited resources as well now open source is an amazing it's just an amazing innovation uh machine and and obviously it's proved as its value over a lot of things from linux to kubernetes being one of the most recent i want to shift gears a little bit right and ask you just your your take on public cloud right one of the huge benefits of public cloud is is the flexibility to add capacity shrink capacity as you need it and you talked again in a prior thing i was looking at you know that you definitely have spikes uh in demand spikes whether there's a high frequency of experiments i don't know how frequently you run those things versus maybe a conference or something where you said people you know want to get access to the data run experiments prior to your conference do you where does public cloud play in your thoughts and maybe you're there today maybe you're not how do you think about you know kind of public cloud generically but more specifically you know that ability to add a little bit more flex in your compute horsepower or are you just going up into the right up into the right and not really flexing down very much yeah so this is this is something we've been working on for a few years now uh we it's uh it's uh it's i would say it's an ongoing work it's a situation that will will not uh be very clear for the for the next few years but again what what we try to do is just to explore as much as possible all kinds of resources that can help us what we did in the kubecon last year was this demonstration that we can actually scale we can scale out and burst for for this uh spiky workloads we have we can burst to the to the public cloud quite easily using this kind of cloud native technologies that we have today and this is extremely important because it kind of changes our mindset instead of having to to think only on investing on premises we can think that maybe we can cover for the majority of use cases but then explore and burst to the public cloud this has to be easy in terms of infrastructure and that we are at that point right now with kubernetes we also have kind of workload that is maybe easier to do these things than than a traditional i.t where services are very interconnected in our case we are more thinking of batch workloads where we can just submit jobs uh and then fetch the data back right this also has a few challenges but but it's i would say it's it's easier than the traditional ite service deployments the other aspect where the public cloud is also very interesting is uh for resources that we don't have in large quantities so we have a very large farm for with cpus we have some gpus and it's very good to be able to explore this new accelerator technologies and maybe expand our available pool of accelerators by going to the public cloud maybe to use them but also to validate to see which ones are best for our use cases and explore that option as well it's not only general capacity it's really like dedicated um hardware that we might not even have ever like we think of tpus or ipu's it's something that is very interesting that we can scale and just go go use them in the public cloud yeah that's a really interesting point because because the cloud providers are big enough now right that they're building all kind of specialized specialized server specialized uh cpu specialized gpus dpus is a new one i've heard a data processing unit as you said there's fpgas and all kinds of accelerators so it is a really rich environment for as you said to do your experiments and find what the optimal solution is for whatever that particular workload is but ricardo i want to shift gears a little bit as we come to the end of 2020 thankfully for a whole bunch of reasons as you look forward to 2021 i mean clearly anticipating and starting to plan to get ready for your upgrade as a priority i'm just curious what are your other priorities and how does you know kind of the compute infrastructure in terms of an investment within cern you know kind of rank with the investment around the physical things that you're building the big machines because without the compute those other things really don't provide much data and i know those are we always talked about how expensive the particle accelerators is it's an interesting number and it's big but you guys are a big piece of that as well so what are your priorities looking forward to 2021 yeah from from the compute side i think we are keeping the the priorities in similar to what we've been doing the last few years which is to make sure that we improve all our automation to improve efficiency as well to prepare for these upgrades we have but also there's a lot of activity in this new uh area with machine learning popping up we have a ton of services appearing where people want to to start doing machine learning in many many use cases in some cases they want to do the filtering in the detectors in other cases they want to generate simulation data a lot faster using machine learning as well so i think this will be something that will be a huge topic for next year even for the next couple of years which is to see how we can offer our users and physicists the best service so that they don't have to care about the infrastructure they don't have to know about the details of how they scale their their model training their serving of their models all of this i think this will be a very big topic um it's something that it's becoming really a big part of of the world computing for high energy physics and for cern as well that's great we see that a lot you know just applied machine learning to very specific problems you talked about you still can't even record all that information that comes off those things you have to do some compression technology and other things so real opportunities barely scratched on the surface of machine learning and ai but i'm sure you're going to be using it a ton well ricardo give you give you the last word um we're in at cncf's uh kubecon cloud native con you know what do you get out of these types of shows and why is this such again kind of why is it such an important piece of your way you get your job done yeah honestly uh with all this uh situation right now i kind of really miss this kind of conferences in person uh it's really a huge opportunity to connect with uh with the other end users but also with with the community and to talk to the developers discuss things over uh coffee beer this is something that is really something that is really useful to to have this kind of meetings every year uh i think what what uh i always try to say is uh this this wall infrastructure is is truly making a big impact in the way we do things so we can only thank the community uh it's it allows us to to kind of shift to focusing on a higher level to focus more on our use cases instead of having to focus so much on the infrastructure we kind of start giving it as a given that the infrastructure scales and we can just use it and focus on optimizing our own software so this is a huge contribution we can only thank the cncf projects and everyone involved great well thank you for that uh that summary and that that's a terrific summary so ricardo thank you so much for all your hard work answering really big helping answer really big questions and uh and for joining us today and sharing your insight thank you very much all right he's ricardo i'm jeff you're watching the cube from our palo alto studios for continuing coverage of kubecon cloud nativecon 2020. thanks for watching see you next time [Music] you

Published Date : Nov 19 2020

SUMMARY :

the relative scale to have what you got

ENTITIES

Entity	Category	Confidence
Ricardo Rocha	PERSON	0.99+
100 meters	QUANTITY	0.99+
10 times	QUANTITY	0.99+
2021	DATE	0.99+
27 kilometers	QUANTITY	0.99+
jeff frick	PERSON	0.99+
last year	DATE	0.99+
CERN	ORGANIZATION	0.99+
today	DATE	0.99+
second thing	QUANTITY	0.99+
five years	QUANTITY	0.99+
ricardo	PERSON	0.98+
palo alto	ORGANIZATION	0.98+
40 million pictures	QUANTITY	0.98+
KubeCon	EVENT	0.98+
first launch	QUANTITY	0.98+
first time	QUANTITY	0.98+
next year	DATE	0.98+
CloudNativeCon	EVENT	0.97+
jeff	PERSON	0.96+
ricardo rocha	PERSON	0.96+
north america	LOCATION	0.95+
around 10 years ago	DATE	0.95+
geneva	LOCATION	0.95+
four	QUANTITY	0.95+
101	QUANTITY	0.94+
over one half an exabyte of data	QUANTITY	0.93+
70 petabytes of data	QUANTITY	0.93+
kubecon	ORGANIZATION	0.92+
next couple of years	DATE	0.92+
7	QUANTITY	0.92+
every year	QUANTITY	0.91+
linux	TITLE	0.9+
last few years	DATE	0.89+
up to one petabyte	QUANTITY	0.89+
may of 2019	DATE	0.87+
end of 2020	DATE	0.87+
2020	DATE	0.87+
next few years	DATE	0.86+
a ton of services	QUANTITY	0.84+
nancy meta factory	ORGANIZATION	0.82+
NA 2020	EVENT	0.8+
each	QUANTITY	0.8+
cloudnativecon	ORGANIZATION	0.8+
a lot of people	QUANTITY	0.79+
a lot of data	QUANTITY	0.79+
one	QUANTITY	0.78+
few tens of gigabytes per second	QUANTITY	0.78+
so many people	QUANTITY	0.76+
kubecon	EVENT	0.75+
openstack	TITLE	0.74+
challenges	QUANTITY	0.7+
kubecon cloud	ORGANIZATION	0.66+
thursday	DATE	0.66+
second	QUANTITY	0.66+
a second	QUANTITY	0.64+
lot of people	QUANTITY	0.63+
a few years	QUANTITY	0.62+
hat	ORGANIZATION	0.61+
cern	ORGANIZATION	0.61+
european	OTHER	0.58+
lot of data	QUANTITY	0.58+
foundation	ORGANIZATION	0.57+
in the summer	DATE	0.55+
red	PERSON	0.54+
cloud nativecon 2020	EVENT	0.54+
lot of activity	QUANTITY	0.53+
two cube	QUANTITY	0.49+
con	EVENT	0.4+

Lukas Heinrich & Ricardo Rocha, CERN | KubeCon + CloudNativeCon EU 2019

>> Live from Barcelona, Spain, it's theCUBE, covering KubeCon + CloudNativeCon Europe 2019. Brought to you by Red Hat, the Cloud Native Computing Foundation and Ecosystem Partners. >> Welcome back to theCUBE, here at KubeCon CloudNativeCon 2019 in Barcelona, Spain. I'm Stu Miniman. My co-host is Corey Quinn and we're thrilled to welcome to the program two gentlemen from CERN. Of course, CERN needs no introduction. We're going to talk some science, going to talk some tech. To my right here is Ricardo Rocha, who is the computer engineer, and Lukas Heinrich, who's a physicist. So Lukas, let's start with you, you know, if you were a traditional enterprise, we'd talk about your business, but talk about your projects, your applications. What piece of, you know, fantastic science is your team working on? >> All right, so I work on an experiment that is situated with the Large Hadron Collider, so it's a particle accelerator experiments where we accelerate protons, which are hydrogen nuclei, to a very high energy, so that they almost go with the speed of light. And so, we have a large tunnel underground, 100 meters underground in Geneva, so straddling the border of France and Switzerland. And there, we're accelerating two beams. One is going clockwise. The other one is going counterclockwise, and there, we collide them. And so, I work on an experiment that kind of looks at these collisions and then analyzes this data. >> Lukas, if I can, you know, when you talk to most companies, you talk about scale, you talk about latency, you talk about performance. Those have real-world implications for your world. Do you have anything you could share there? >> Yeah, so, one of the main things that we need to do, so we collide 40 million times a second these protons, and we need to analyze them in real time, because we cannot write out all the collision data to disk because we don't have enough disk space, and so we've essentially run 10,000 core real-time application to analyze this data in real-time and see what collisions are actually most interesting, and then only those get written out to disk, so this is a system that I work on called The Trigger, and yeah, that's pretty dependent on latency. >> All right, Ricardo, luckily you know, your job's easy. We say most people you need to respond, you know, to what the business needs for you and, you know, don't worry, you can't go against the laws of physics. Well, you're working on physics here, and boy those are some hefty requirements there. Talk a little bit about that dynamic and how your team has to deal with some pretty tough challenges. >> Right, so, as Lukas was saying, we have this large amount of data. The machines can generate something around the order of a petabyte a second, and then, thanks to their hardware- and software-level triggers, they will reduce this to something that is 10 gigabytes a second, and that's what my side has to handle. So, it's still a lot of data. We are collecting something like 70 petabytes a year, and we keep adding, so right now we have, the amount of storage available is on the order of 400 petabytes. We're starting to get at a pretty large scale. And then we have to analyze all of this. So we have one big data center at CERN, which is 300,000 cores, or something like this, around that, but that's not enough, so what we've done over the last 15, 20 years, we've created this large distributed computing environment around the world. We link to many different institutes and research labs together, and this doubles our capacity. So that's our challenge, is to make sure all the effort that the physicists put into building this large machine, that, in the end, it's not the computing that is breaking the world system. We have to keep up, yup. >> One thing that I always find fascinating is people who are dealing with real problems that push our conception of what scale starts to look like, and when you're talking about things like a petabyte a second, that's beyond the comprehension of what most of us can wind up talking about. One problem that I've seen historically with a number of different infrastructure approaches is it requires a fair level of complexity to go from this problem to this problem to this problem, and you have to wind up working through a bunch of layers of abstraction, and the end result is, and at the end of all of this we can run our blog that gets eight visits a day, and that just doesn't seem to make sense. Whereas what you're talking about, that level of complexity is more than justified. So my question for you is, as you start seeing these things evolve and looking at other best practices and guidance from folks who are doing far less data-intensive applications, are you seeing that a lot of the best practices start to fall down as you're pushing theoretical boundaries of scale? >> Right, that's actually a good point. Like, the physicists are very good at getting things done, and they don't worry that much about the process, as long as in the end it works. But there's always this kind of split between the physicists and the more computing engineer where the practices, we want to establish practices, but at the end of the day, we have a large machine that has to work, so sometimes we skip a couple of steps, but we still need, there's still quite a lot of control on like data quality and the software validation and all of this. But yeah, it's a non-traditional environment in terms of IT, I would say. It's much more fast pacing than most traditional companies. >> You mentioned you had how many cores working on these problems on site? >> So in-house, we have 300,000. >> If you were to do a full migration to the public cloud, you'd almost have to repurpose that many cores just to calculating out the bill at that point. Just, because all the different dimensions, everything winds working on at that scale becomes almost completely non-trivial. I don't often say that I'm not sure public cloud can scale to the level that someone would need to. In your case, that becomes a very real concern. >> Yeah, so that's one debate we are having now, and it's, it has a lot of advantages to have the computing in-house, and also because we pretty much use it 24/7, it's a very different type of workload. So we need a lot of resources 24/7, like even the pricing is kind of calculated differently. But the issue we have now is that the accelerator will go through a major upgrade just in five years' time, where we will increase the amount of data by 100 times. Now we are talking about 70 petabytes a year and we're very soon talking about like exabytes. So the amount of computing we'll need there is just going to explode, so we need all the options. We're looking into GPUs and machine learning to change how we do computing, and we are looking at any kind of additional resources we might get, and there the public cloud will probably play a role. >> Could you speak to kind of the dynamic of how something like an upgrade of that, you know, how do you work together? I can't imagine that you just say, "Well, we built it, "whatever we needed and everything, and, you know, "throw it over the wall and make sure it works." >> Right, I mean, so I work a lot on this boundary between computing and physics, and so internally, I think we also go through the same processes as a lot of companies, that we're trying to educate people on the physics side how to go through the best practices, because it's also important. So one thing I stressed also in the keynote is this idea of reproducibility and reusability of scientific software is pretty important, so we teach people to containerize their applications and then make them reusable and stuff like that, yup. >> Anything about that relationship you can expound on? >> Yeah, so like this keynote we had yesterday is a perfect example of how this is improving a lot at CERN. We were actually using data from CMS, which was one of the experiments. Lukas is a physicist in ATLAS, which is like a computing experiment, kind of. I'm in IT, and like all this containerized infrastructure kind of is getting us all together because computing is getting much easier in terms of how to share pieces of software and even infrastructure, and this helps us a lot internally also. >> So what particular about Kubernetes helps your environment? You talk for 15 years that you've been on this distributed systems build-out, so sounds like you were the hipsters when it came to some of these solutions we're working on today. >> That has been like a major change. Lukas mentioned the container part for the software reproducibility, but I have been working on the infrastructure for, I joined CERN as a student and I've been working on the distributed infrastructure for many years, and we basically had to write our own tools, like storage systems, all the batch systems, over the years, and suddenly with this public cloud explosion and open source usage, we can just go and join communities that have requirements sometimes that are higher than ours and we can focus really on the application development. If we base, if we start writing software using Kubernetes, then not only we get this flexibility of choosing different public clouds or different infrastructures, but also we don't have to care so much about the core infrastructure, all the monitoring, log collection, restarting. Kubernetes is very important for us in this respect. We kind of remove a lot of the software we were depending on for many years. >> So these days, as you look at this build-out and what you're looking, not just what you're doing today but what you're looking to build in the upcoming years, are you viewing containers as the fundamental primitive of what empowers this? Are you looking at virtual machines as that primitive? Are you looking at functions? Where exactly do you draw the abstraction layer, as you start building this architecture? >> So, yeah, traditionally we've been using virtual machines for like the last maybe 10 years almost, or, I don't know, eight years at least, and we see containerization happening very quickly, and maybe Lukas can say a bit more about the physics, how this is important on the physics side? >> Yeah, what's been, so currently I think we are looking at containers for the main abstraction because it's also we go through things like functions as a service. What's kind of special about scientific applications is that we don't usually just have our entire code base on one software stack, right? It's not like we would deploy Node.js application or Python stack and that's it. And so, sometimes you have a complete mix between C++, Python, Fortran, and all that stuff. So this idea that we can build the entire software stack as we want it is pretty important. So even for functions as a service where, traditionally, you had just a limited choice of runtimes, this becomes important. >> Like, from our side, the virtual machines still had a very complex setup to be able to support all this diversity of software and the containerization, just all the people have to give us is like run this building block and it's kind of a standard interface, so we only have to build the infrastructure to be able to handle these pieces. >> Well, I don't think anyone can dispute that you folks are experts in taking larger things and breaking them down into constituent components thereof. I mean, you are, quite obviously, the leading world experts on that. But was there any challenge to you as you went through that process of, I don't necessarily even want to say modernizing, but in changing your viewpoint of those primitives as you've evolved, have you seen that there were challenges in gaining buy-in throughout the organization? Was there pushback? Was it culturally painful to wind up moving away from the virtual machine approach into a containerized world? >> Right, so yeah, a bit, of course. But traditionally we, like physicists really focus on their end goal. We often say that we don't count how many cores or whatever, we care about events per second, how many events we can process per second. So, it's a kind of more open-minded community maybe than traditional IT, so we don't care so much about which technology we use at some point, as long as the job gets done. So, yeah, there's a bit of traction sometimes, but there's also a push when you can demonstrate that we get a clear benefit, then it's kind of easier to push it. >> What's a little bit special maybe also for particle physics is that it's not only CERN that is the researcher. We are an international collaboration of many, many institutes all around the world that work on the same project, which is just hosted at CERN, and so it's a very flat hierarchy and people do have the freedom to try out things and so it's not like we have a top-down mandate what technology we use. And then somebody tries something out. If it works and people see a value in it then you get adoption from it. >> The collaboration with the data volumes you're talking about as well has got to be intense. I think you're a little bit beyond the, okay, we ran the experiment, we put the data in Dropbox, go ahead and download it, you'll get that in only 18 short years. It seems like there's absolutely a challenge in that. >> That was one of the key points actually in the keynote is that, so a lot of the experiments at CERN have an open data policy where we release our data, and so that's great because we think it's important for open science, but it was always a bit of an issue, like who can actually practically analyze this data for people who don't have a data center? And so one part of the keynote was that we could demonstrate that using Kubernetes and public cloud infrastructure actually becomes possible for people who don't work at CERN to analyze this large-scale scientific data sets. >> Yeah, I mean maybe just for our audience, the punchline is rediscovering the Higgs boson in the public cloud. Maybe just give our audience a little bit of taste of that. >> Right, yeah, so basically what we did is, so the Higgs boson was discovered in 2012 by both ATLAS and CMS, and a part of that data, we used open data from CMS and part of that data has now been released publicly, and basically this was a 70-terabyte data set which we, thanks to our Google Cloud partners, could put onto public cloud infrastructure and then we analyzed it on a large-scale Kubernetes cluster, and-- >> The main challenge there was that, like, we publish it and we say you probably need a month to process it, but we had like 20 minutes on the keynote, so we kind of needed a bit larger infrastructure than usual to run it down to five minutes or less. In the end, it all worked out, but that was a bit of a challenge. >> How are you approaching, I guess, making this more accessible to more people? By which I mean, not just other research institutions scattered around the world, but students, individual students, sometimes in emerging economies, where they don't have access to the kinds of resources that many of us take for granted, particularly work for a prestigious research institutions? What are you doing to make this more accessible to high school kids, for example, folks who are just dipping their toes into a world they find fascinating? >> We have entire programs, outreach programs that go to high schools. I've been doing this when I was a student in Germany. We would go to high schools and we would host workshops and people would analyze a lot of this data themselves on their computers. So we would come with a USB stick that have data on them, and they could analyze it. And so part of also the open data strategy from ATLAS is to use that open data for educational purposes. And then there are also programs in emerging countries. >> Lukas and Ricardo, really appreciate you sharing the open data, open science mission that you have with our audience. Thank you so much for joining us. >> Thank you. >> Thank you. >> All right, for Corey Quinn, I'm Stu Miniman. We're in day two of two days live coverage here at KubeCon + CloudNativeCon 2019. Thank you for watching theCUBE. (upbeat music)

Published Date : May 22 2019

SUMMARY :

Brought to you by Red Hat, What piece of, you know, fantastic science and there, we collide them. to most companies, you talk about scale, Yeah, so, one of the main things that we need to do, to what the business needs for you and, you know, and we keep adding, so right now we have, and at the end of all of this we can run our blog but at the end of the day, we have a large machine Just, because all the different dimensions, But the issue we have now is that the accelerator "whatever we needed and everything, and, you know, on the physics side how to go through the best practices, Yeah, so like this keynote we had yesterday so sounds like you were the hipsters and we basically had to write our own tools, is that we don't usually just have our entire code base just all the people have to give us But was there any challenge to you We often say that we don't count how many cores and so it's not like we have a top-down mandate okay, we ran the experiment, we put the data in Dropbox, And so one part of the keynote was that we could demonstrate in the public cloud. and we say you probably need a month to process it, And so part of also the open data strategy Lukas and Ricardo, really appreciate you sharing Thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Ricardo Rocha	PERSON	0.99+
Corey Quinn	PERSON	0.99+
Stu Miniman	PERSON	0.99+
CERN	ORGANIZATION	0.99+
Lukas	PERSON	0.99+
ATLAS	ORGANIZATION	0.99+
2012	DATE	0.99+
Geneva	LOCATION	0.99+
Germany	LOCATION	0.99+
Ricardo	PERSON	0.99+
Lukas Heinrich	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
20 minutes	QUANTITY	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
70-terabyte	QUANTITY	0.99+
15 years	QUANTITY	0.99+
300,000 cores	QUANTITY	0.99+
300,000	QUANTITY	0.99+
Node.js	TITLE	0.99+
70 petabytes	QUANTITY	0.99+
Python	TITLE	0.99+
400 petabytes	QUANTITY	0.99+
10,000 core	QUANTITY	0.99+
Barcelona, Spain	LOCATION	0.99+
100 meters	QUANTITY	0.99+
eight years	QUANTITY	0.99+
KubeCon	EVENT	0.99+
a month	QUANTITY	0.99+
100 times	QUANTITY	0.99+
Switzerland	LOCATION	0.99+
five minutes	QUANTITY	0.99+
one	QUANTITY	0.99+
Fortran	TITLE	0.98+
yesterday	DATE	0.98+
France	LOCATION	0.98+
two days	QUANTITY	0.98+
Ecosystem Partners	ORGANIZATION	0.98+
One problem	QUANTITY	0.98+
One	QUANTITY	0.98+
five years'	QUANTITY	0.98+
18 short years	QUANTITY	0.97+
CMS	ORGANIZATION	0.97+
two beams	QUANTITY	0.97+
two gentlemen	QUANTITY	0.96+
Kubernetes	TITLE	0.96+
both	QUANTITY	0.96+
CloudNativeCon Europe 2019	EVENT	0.95+
40 million times a second	QUANTITY	0.95+
One thing	QUANTITY	0.94+
eight visits a day	QUANTITY	0.94+
CloudNativeCon EU 2019	EVENT	0.93+
CloudNativeCon 2019	EVENT	0.93+
C++	TITLE	0.93+
many years	QUANTITY	0.92+
KubeCon CloudNativeCon 2019	EVENT	0.92+
today	DATE	0.91+
one software	QUANTITY	0.91+
Dropbox	ORGANIZATION	0.89+
about 70 petabytes	QUANTITY	0.86+
one debate	QUANTITY	0.86+
10 gigabytes a second	QUANTITY	0.85+
one part	QUANTITY	0.77+
a year	QUANTITY	0.75+
one thing	QUANTITY	0.74+
a second	QUANTITY	0.73+
petabyte	QUANTITY	0.73+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for ricardo rocha: