Lukas Heinrich & Ricardo Rocha, CERN | KubeCon + CloudNativeCon EU 2019

>> Live from Barcelona, Spain, it's theCUBE, covering KubeCon + CloudNativeCon Europe 2019. Brought to you by Red Hat, the Cloud Native Computing Foundation and Ecosystem Partners. >> Welcome back to theCUBE, here at KubeCon CloudNativeCon 2019 in Barcelona, Spain. I'm Stu Miniman. My co-host is Corey Quinn and we're thrilled to welcome to the program two gentlemen from CERN. Of course, CERN needs no introduction. We're going to talk some science, going to talk some tech. To my right here is Ricardo Rocha, who is the computer engineer, and Lukas Heinrich, who's a physicist. So Lukas, let's start with you, you know, if you were a traditional enterprise, we'd talk about your business, but talk about your projects, your applications. What piece of, you know, fantastic science is your team working on? >> All right, so I work on an experiment that is situated with the Large Hadron Collider, so it's a particle accelerator experiments where we accelerate protons, which are hydrogen nuclei, to a very high energy, so that they almost go with the speed of light. And so, we have a large tunnel underground, 100 meters underground in Geneva, so straddling the border of France and Switzerland. And there, we're accelerating two beams. One is going clockwise. The other one is going counterclockwise, and there, we collide them. And so, I work on an experiment that kind of looks at these collisions and then analyzes this data. >> Lukas, if I can, you know, when you talk to most companies, you talk about scale, you talk about latency, you talk about performance. Those have real-world implications for your world. Do you have anything you could share there? >> Yeah, so, one of the main things that we need to do, so we collide 40 million times a second these protons, and we need to analyze them in real time, because we cannot write out all the collision data to disk because we don't have enough disk space, and so we've essentially run 10,000 core real-time application to analyze this data in real-time and see what collisions are actually most interesting, and then only those get written out to disk, so this is a system that I work on called The Trigger, and yeah, that's pretty dependent on latency. >> All right, Ricardo, luckily you know, your job's easy. We say most people you need to respond, you know, to what the business needs for you and, you know, don't worry, you can't go against the laws of physics. Well, you're working on physics here, and boy those are some hefty requirements there. Talk a little bit about that dynamic and how your team has to deal with some pretty tough challenges. >> Right, so, as Lukas was saying, we have this large amount of data. The machines can generate something around the order of a petabyte a second, and then, thanks to their hardware- and software-level triggers, they will reduce this to something that is 10 gigabytes a second, and that's what my side has to handle. So, it's still a lot of data. We are collecting something like 70 petabytes a year, and we keep adding, so right now we have, the amount of storage available is on the order of 400 petabytes. We're starting to get at a pretty large scale. And then we have to analyze all of this. So we have one big data center at CERN, which is 300,000 cores, or something like this, around that, but that's not enough, so what we've done over the last 15, 20 years, we've created this large distributed computing environment around the world. We link to many different institutes and research labs together, and this doubles our capacity. So that's our challenge, is to make sure all the effort that the physicists put into building this large machine, that, in the end, it's not the computing that is breaking the world system. We have to keep up, yup. >> One thing that I always find fascinating is people who are dealing with real problems that push our conception of what scale starts to look like, and when you're talking about things like a petabyte a second, that's beyond the comprehension of what most of us can wind up talking about. One problem that I've seen historically with a number of different infrastructure approaches is it requires a fair level of complexity to go from this problem to this problem to this problem, and you have to wind up working through a bunch of layers of abstraction, and the end result is, and at the end of all of this we can run our blog that gets eight visits a day, and that just doesn't seem to make sense. Whereas what you're talking about, that level of complexity is more than justified. So my question for you is, as you start seeing these things evolve and looking at other best practices and guidance from folks who are doing far less data-intensive applications, are you seeing that a lot of the best practices start to fall down as you're pushing theoretical boundaries of scale? >> Right, that's actually a good point. Like, the physicists are very good at getting things done, and they don't worry that much about the process, as long as in the end it works. But there's always this kind of split between the physicists and the more computing engineer where the practices, we want to establish practices, but at the end of the day, we have a large machine that has to work, so sometimes we skip a couple of steps, but we still need, there's still quite a lot of control on like data quality and the software validation and all of this. But yeah, it's a non-traditional environment in terms of IT, I would say. It's much more fast pacing than most traditional companies. >> You mentioned you had how many cores working on these problems on site? >> So in-house, we have 300,000. >> If you were to do a full migration to the public cloud, you'd almost have to repurpose that many cores just to calculating out the bill at that point. Just, because all the different dimensions, everything winds working on at that scale becomes almost completely non-trivial. I don't often say that I'm not sure public cloud can scale to the level that someone would need to. In your case, that becomes a very real concern. >> Yeah, so that's one debate we are having now, and it's, it has a lot of advantages to have the computing in-house, and also because we pretty much use it 24/7, it's a very different type of workload. So we need a lot of resources 24/7, like even the pricing is kind of calculated differently. But the issue we have now is that the accelerator will go through a major upgrade just in five years' time, where we will increase the amount of data by 100 times. Now we are talking about 70 petabytes a year and we're very soon talking about like exabytes. So the amount of computing we'll need there is just going to explode, so we need all the options. We're looking into GPUs and machine learning to change how we do computing, and we are looking at any kind of additional resources we might get, and there the public cloud will probably play a role. >> Could you speak to kind of the dynamic of how something like an upgrade of that, you know, how do you work together? I can't imagine that you just say, "Well, we built it, "whatever we needed and everything, and, you know, "throw it over the wall and make sure it works." >> Right, I mean, so I work a lot on this boundary between computing and physics, and so internally, I think we also go through the same processes as a lot of companies, that we're trying to educate people on the physics side how to go through the best practices, because it's also important. So one thing I stressed also in the keynote is this idea of reproducibility and reusability of scientific software is pretty important, so we teach people to containerize their applications and then make them reusable and stuff like that, yup. >> Anything about that relationship you can expound on? >> Yeah, so like this keynote we had yesterday is a perfect example of how this is improving a lot at CERN. We were actually using data from CMS, which was one of the experiments. Lukas is a physicist in ATLAS, which is like a computing experiment, kind of. I'm in IT, and like all this containerized infrastructure kind of is getting us all together because computing is getting much easier in terms of how to share pieces of software and even infrastructure, and this helps us a lot internally also. >> So what particular about Kubernetes helps your environment? You talk for 15 years that you've been on this distributed systems build-out, so sounds like you were the hipsters when it came to some of these solutions we're working on today. >> That has been like a major change. Lukas mentioned the container part for the software reproducibility, but I have been working on the infrastructure for, I joined CERN as a student and I've been working on the distributed infrastructure for many years, and we basically had to write our own tools, like storage systems, all the batch systems, over the years, and suddenly with this public cloud explosion and open source usage, we can just go and join communities that have requirements sometimes that are higher than ours and we can focus really on the application development. If we base, if we start writing software using Kubernetes, then not only we get this flexibility of choosing different public clouds or different infrastructures, but also we don't have to care so much about the core infrastructure, all the monitoring, log collection, restarting. Kubernetes is very important for us in this respect. We kind of remove a lot of the software we were depending on for many years. >> So these days, as you look at this build-out and what you're looking, not just what you're doing today but what you're looking to build in the upcoming years, are you viewing containers as the fundamental primitive of what empowers this? Are you looking at virtual machines as that primitive? Are you looking at functions? Where exactly do you draw the abstraction layer, as you start building this architecture? >> So, yeah, traditionally we've been using virtual machines for like the last maybe 10 years almost, or, I don't know, eight years at least, and we see containerization happening very quickly, and maybe Lukas can say a bit more about the physics, how this is important on the physics side? >> Yeah, what's been, so currently I think we are looking at containers for the main abstraction because it's also we go through things like functions as a service. What's kind of special about scientific applications is that we don't usually just have our entire code base on one software stack, right? It's not like we would deploy Node.js application or Python stack and that's it. And so, sometimes you have a complete mix between C++, Python, Fortran, and all that stuff. So this idea that we can build the entire software stack as we want it is pretty important. So even for functions as a service where, traditionally, you had just a limited choice of runtimes, this becomes important. >> Like, from our side, the virtual machines still had a very complex setup to be able to support all this diversity of software and the containerization, just all the people have to give us is like run this building block and it's kind of a standard interface, so we only have to build the infrastructure to be able to handle these pieces. >> Well, I don't think anyone can dispute that you folks are experts in taking larger things and breaking them down into constituent components thereof. I mean, you are, quite obviously, the leading world experts on that. But was there any challenge to you as you went through that process of, I don't necessarily even want to say modernizing, but in changing your viewpoint of those primitives as you've evolved, have you seen that there were challenges in gaining buy-in throughout the organization? Was there pushback? Was it culturally painful to wind up moving away from the virtual machine approach into a containerized world? >> Right, so yeah, a bit, of course. But traditionally we, like physicists really focus on their end goal. We often say that we don't count how many cores or whatever, we care about events per second, how many events we can process per second. So, it's a kind of more open-minded community maybe than traditional IT, so we don't care so much about which technology we use at some point, as long as the job gets done. So, yeah, there's a bit of traction sometimes, but there's also a push when you can demonstrate that we get a clear benefit, then it's kind of easier to push it. >> What's a little bit special maybe also for particle physics is that it's not only CERN that is the researcher. We are an international collaboration of many, many institutes all around the world that work on the same project, which is just hosted at CERN, and so it's a very flat hierarchy and people do have the freedom to try out things and so it's not like we have a top-down mandate what technology we use. And then somebody tries something out. If it works and people see a value in it then you get adoption from it. >> The collaboration with the data volumes you're talking about as well has got to be intense. I think you're a little bit beyond the, okay, we ran the experiment, we put the data in Dropbox, go ahead and download it, you'll get that in only 18 short years. It seems like there's absolutely a challenge in that. >> That was one of the key points actually in the keynote is that, so a lot of the experiments at CERN have an open data policy where we release our data, and so that's great because we think it's important for open science, but it was always a bit of an issue, like who can actually practically analyze this data for people who don't have a data center? And so one part of the keynote was that we could demonstrate that using Kubernetes and public cloud infrastructure actually becomes possible for people who don't work at CERN to analyze this large-scale scientific data sets. >> Yeah, I mean maybe just for our audience, the punchline is rediscovering the Higgs boson in the public cloud. Maybe just give our audience a little bit of taste of that. >> Right, yeah, so basically what we did is, so the Higgs boson was discovered in 2012 by both ATLAS and CMS, and a part of that data, we used open data from CMS and part of that data has now been released publicly, and basically this was a 70-terabyte data set which we, thanks to our Google Cloud partners, could put onto public cloud infrastructure and then we analyzed it on a large-scale Kubernetes cluster, and-- >> The main challenge there was that, like, we publish it and we say you probably need a month to process it, but we had like 20 minutes on the keynote, so we kind of needed a bit larger infrastructure than usual to run it down to five minutes or less. In the end, it all worked out, but that was a bit of a challenge. >> How are you approaching, I guess, making this more accessible to more people? By which I mean, not just other research institutions scattered around the world, but students, individual students, sometimes in emerging economies, where they don't have access to the kinds of resources that many of us take for granted, particularly work for a prestigious research institutions? What are you doing to make this more accessible to high school kids, for example, folks who are just dipping their toes into a world they find fascinating? >> We have entire programs, outreach programs that go to high schools. I've been doing this when I was a student in Germany. We would go to high schools and we would host workshops and people would analyze a lot of this data themselves on their computers. So we would come with a USB stick that have data on them, and they could analyze it. And so part of also the open data strategy from ATLAS is to use that open data for educational purposes. And then there are also programs in emerging countries. >> Lukas and Ricardo, really appreciate you sharing the open data, open science mission that you have with our audience. Thank you so much for joining us. >> Thank you. >> Thank you. >> All right, for Corey Quinn, I'm Stu Miniman. We're in day two of two days live coverage here at KubeCon + CloudNativeCon 2019. Thank you for watching theCUBE. (upbeat music)

Published Date : May 22 2019

SUMMARY :

Brought to you by Red Hat, What piece of, you know, fantastic science and there, we collide them. to most companies, you talk about scale, Yeah, so, one of the main things that we need to do, to what the business needs for you and, you know, and we keep adding, so right now we have, and at the end of all of this we can run our blog but at the end of the day, we have a large machine Just, because all the different dimensions, But the issue we have now is that the accelerator "whatever we needed and everything, and, you know, on the physics side how to go through the best practices, Yeah, so like this keynote we had yesterday so sounds like you were the hipsters and we basically had to write our own tools, is that we don't usually just have our entire code base just all the people have to give us But was there any challenge to you We often say that we don't count how many cores and so it's not like we have a top-down mandate okay, we ran the experiment, we put the data in Dropbox, And so one part of the keynote was that we could demonstrate in the public cloud. and we say you probably need a month to process it, And so part of also the open data strategy Lukas and Ricardo, really appreciate you sharing Thank you for watching theCUBE.

ENTITIES

Entity	Category	Confidence
Ricardo Rocha	PERSON	0.99+
Corey Quinn	PERSON	0.99+
Stu Miniman	PERSON	0.99+
CERN	ORGANIZATION	0.99+
Lukas	PERSON	0.99+
ATLAS	ORGANIZATION	0.99+
2012	DATE	0.99+
Geneva	LOCATION	0.99+
Germany	LOCATION	0.99+
Ricardo	PERSON	0.99+
Lukas Heinrich	PERSON	0.99+
Red Hat	ORGANIZATION	0.99+
20 minutes	QUANTITY	0.99+
Cloud Native Computing Foundation	ORGANIZATION	0.99+
70-terabyte	QUANTITY	0.99+
15 years	QUANTITY	0.99+
300,000 cores	QUANTITY	0.99+
300,000	QUANTITY	0.99+
Node.js	TITLE	0.99+
70 petabytes	QUANTITY	0.99+
Python	TITLE	0.99+
400 petabytes	QUANTITY	0.99+
10,000 core	QUANTITY	0.99+
Barcelona, Spain	LOCATION	0.99+
100 meters	QUANTITY	0.99+
eight years	QUANTITY	0.99+
KubeCon	EVENT	0.99+
a month	QUANTITY	0.99+
100 times	QUANTITY	0.99+
Switzerland	LOCATION	0.99+
five minutes	QUANTITY	0.99+
one	QUANTITY	0.99+
Fortran	TITLE	0.98+
yesterday	DATE	0.98+
France	LOCATION	0.98+
two days	QUANTITY	0.98+
Ecosystem Partners	ORGANIZATION	0.98+
One problem	QUANTITY	0.98+
One	QUANTITY	0.98+
five years'	QUANTITY	0.98+
18 short years	QUANTITY	0.97+
CMS	ORGANIZATION	0.97+
two beams	QUANTITY	0.97+
two gentlemen	QUANTITY	0.96+
Kubernetes	TITLE	0.96+
both	QUANTITY	0.96+
CloudNativeCon Europe 2019	EVENT	0.95+
40 million times a second	QUANTITY	0.95+
One thing	QUANTITY	0.94+
eight visits a day	QUANTITY	0.94+
CloudNativeCon EU 2019	EVENT	0.93+
CloudNativeCon 2019	EVENT	0.93+
C++	TITLE	0.93+
many years	QUANTITY	0.92+
KubeCon CloudNativeCon 2019	EVENT	0.92+
today	DATE	0.91+
one software	QUANTITY	0.91+
Dropbox	ORGANIZATION	0.89+
about 70 petabytes	QUANTITY	0.86+
one debate	QUANTITY	0.86+
10 gigabytes a second	QUANTITY	0.85+
one part	QUANTITY	0.77+
a year	QUANTITY	0.75+
one thing	QUANTITY	0.74+
a second	QUANTITY	0.73+
petabyte	QUANTITY	0.73+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Depu: