Robin Goldstone, Lawrence Livermore National Laboratory | Red Hat Summit 2019
>> live from Boston, Massachusetts. It's the queue covering your red. Have some twenty nineteen brought to you by bread. Welcome back a few, but our way Our red have some twenty nineteen >> center along with Sue Mittleman. I'm John Walls were now joined by Robin Goldstone, who's HBC solution architect at the Lawrence Livermore National Laboratory. Hello, Robin >> Harrier. Good to see you. I >> saw you on the Keystone States this morning. Fascinating presentation, I thought. First off for the viewers at home who might not be too familiar with the laboratory If you could please just give it that thirty thousand foot level of just what kind of national security work you're involved with. >> Sure. So yes, indeed. We are a national security lab. And you know, first and foremost, our mission is assuring the safety, security reliability of our nuclear weapons stockpile. And there's a lot to that mission. But we also have broader national security mission. We work on counterterrorism and nonproliferation, a lot of of cyber security kinds of things. And but even just general science. We're doing things with precision medicine and and just all all sorts >> of interesting technology. Fascinating >> Es eso, Robin, You know so much and i t you know, the buzzword. The vast months years has been scaled on. We talk about what public loud people are doing. It's labs like yours have been challenged. Challenge with scale in many other ways, especially performance is something that you know, usually at the forefront of where things are you talked about in the keynote this morning. Sierra is the latest generation supercomputer number two, you know, supercomputer. So you know, I don't know how many people understand the petaflop one hundred twenty five flops and the like, but tell us a little bit about, you know, kind of the why and the what of that, >> right? So So Sierra's a supercomputer. And what's unique about these systems is that we're solving. There's lots of systems that network together. Maybe you're bigger number of servers than us, but we're doing scientific simulation, and that kind of computing requires a level of parallelism and very tightly coupled. So all the servers are running a piece of the problem. They all have to sort of operate together. If any one of them is running slow, it makes the whole thing goes slow. So it's really this tightly couple nature of super computers that make things really challenging. You know, we talked about performance. If if one servers just running slow for some reason, you know everything else is going to be affected by that. So we really do care about performance. And we really do care about just every little piece of the hardware you know, performing as it should. So So I >> think in national security, nuclear stockpiles. Um I mean, there is nothing more important, obviously, than the safety and security of the American people were at the center of that. Right? You're open source, right? You know, how does that work? How does that? Because as much trust and faith and confidence we have in the open source community. This is an extremely important responsibility that's being consigned more less to this open source community. >> Sure. You know, at first, people do have that feeling that we should be running some secret sauce. I mean, our applications themselves or secret. But when it comes to the system software and all the software around the applications, I mean, open source makes perfect sense. I mean, we started out running really closed source solutions in some cases, the perp. The hardware itself was really proprietary. And, of course, the vendors who made the hardware proprietary. They wanted their software to be proprietary. But I think most people can resonate when you buy a piece of software and the vendor tells you it's it's great. It's going to do everything you needed to do and trust us, right? Okay, But at our scale, it often doesn't work the way it's It's supposed to work. They've never tested it. Our skill. And when it breaks, now they have to fix. They're the only ones that can fix it. And in some cases we found it wasn't in the vendors decided. You know what? No one else has one quite like yours. And you know, it's a lot of work to make it work for you. So we're just not going to fix and you can't wait, right? And so open source is just the opposite of that, right? I mean, we have all that visibility in that software. If it doesn't work for our needs, we can make it work for our needs, and then we can give it back to the community. Because even though people are doing things that the scale that we are today, Ah, lot of the things that we're doing really do trickle down and can be used by a lot of other people. >> But it's something really important because, as you said, you used to be and I was like, OK, the Cray supercomputer is what we know, You know, let's use proprietary interfaces and I need the highest speed and therefore it's not the general purpose stuff. You moved X eighty six. Lennox is something that's been in the shower computers. Why? But it's a finely tuned version there. Let's get you know, the duct tape and baling wire. And don't breathe on it once you get it running. You're running well today and you talk a little bit about the journey with Roland. You know, now on the Super Computers, >> right? So again, there's always been this sort of proprietary, really high end supercomputing. But about in the late nineteen nineties, early two thousand, that's when we started building these these commodity clusters. You know, at the time, I think Beta Wolf was the terminology for that. But, you know, basically looking at how we could take these basic off the shelf servers and make them work for our applications and trying to take advantage of a CZ much commodity technologies we can, because we didn't want to re invent anything. We want to use as much as possible. And so we've really written that curve. And initially it was just red hat. Lennox. There was no relative time, but then when we started getting into the newer architectures going from Mexico six. Taxi, six, sixty for and Itanium, you know the support just wasn't there in basic red hat and again, even though it's open source and we could do everything ourselves, we don't want to do everything ourselves. I mean, having an organization having this Enterprise edition of Red Hat having a company stand behind it. The software is still open. Source. We can look at the source code. We can modify it if we want, But you know what at the end of the day, were happy to hand over some of our challenge is to Red Hat and and let them do what they do best. They have great, you know, reach into the into the colonel community. They can get things done that we can't necessarily get done. So it's a great relationship. >> Yes. So that that last mile getting it on Sierra there. Is that the first time on one kind of the big showcase your computer? >> Sure. And part of the reason for that is because those big computers themselves are basically now mostly commodity. I mean, again, you talked about a Cray, Some really exotic architecture. I mean, Sierra is a collection of Lennox servers. Now, in this case, they're running the power architecture instead of X eighty six. So Red hat did a lot of work with IBM to make sure that that power was was fully supported in the rail stack. But so, you know, again that the service themselves somewhat commodity were running and video GP use those air widely used everywhere. Obviously big deal for machine learning and stuff that the main the biggest proprietary component we're still dealing was is thie interconnect. So, you know, I mentioned these clusters have to be really tightly coupled. They that performance has to be really superior and most importantly, the latent see right, they have to be super low late and see an ethernet just doesn't cut it >> So you run Infinite Band today. I'm assuming we're >> running infinite band on melon oxen finna ban on Sierra on some of our commodity clusters. We run melon ox on other ones. We run intel. Omni Path was just another flavor of of infinite band. You know, if we could use it, if we could use Ethernet, we would, because again, we would get all the benefit in the leverage of what everybody else is doing, but just just hasn't hasn't quite been able to meet our needs in that >> area now, uh, find recalled the history lesson. We got a bit from me this morning. The laboratory has been around since the early fifties, born of the Cold War. And so obviously open source was, you know? Yeah, right, you know, went well. What about your evolution to open source? I mean, ahs. This has taken hold. Now, there had to be a tipping point at some point that converted and made the laboratory believers. But if you can, can you go back to that process? And was it of was it a big moment for you big time? Or was it just a kind of a steady migration? tour. >> Well, it's interesting if you go way back. We actually wrote the operating systems for those early Cray computers. We wrote those operating systems in house because there really was no operating system that will work for us. So we've been software developers for a long time. We've been system software developers, but at that time it was all proprietary in closed source. So we know how to do that stuff. The reason I think really what happened was when these commodity clusters came along when we showed that we could build a, you know, a cluster that could perform well for our applications on that commodity hardware. We started with Red Hat, but we had to add some things on top. We had to add the software that made a bunch of individual servers function as a cluster. So all the system management stuff the resource manager of the thing that lets a schedule jobs, batch jobs. We wrote that software, the parallel file system. Those things did not exist in the open source, and we helped to write those things, and those things took on lives of their own. So luster. It's a parallel file system that we helped develop slow, Erm, if anyone outside of HBC probably hasn't heard of it, but it's a resource manager that again is very widely popular. So the lab really saw that. You know, we got a lot of visibility by contributing this stuff to the community. And I think everybody has embracing. And we develop open source software at all different layers. This >> software, Robin, you know, I'm curious how you look at Public Cloud. So, you know, when I look at the public odd, they do a lot with government agencies. They got cloud. You know, I've talked to companies that said I could have built a super computer. Here's how long and do. But I could spend it up in minutes. And you know what I need? Is that a possibility for something of yours? I understand. Maybe not the super high performance, But where does it fit in? >> Sure, Yeah. I mean, certainly for a company that has no experience or no infrastructure. I mean, we have invested a huge amount in our data center, and we have a ton of power and cooling and floor space. We have already made that investment, you know, trying to outsource that to the cloud doesn't make sense. There are definitely things. Cloud is great. We are using Gove Cloud for things like prototyping, or someone wants a server, that some architecture, that we don't have the ability to just spin it up. You know, if we had to go and buy it, it would take six months because you know, we are the government. But be able to just spin that stuff up. It's really great for what we do. We use it for open source for building test. We use it to conferences when we want to run a tutorial and spin up a bunch of instances of, you know, Lennox and and run a tutorial. But the biggest thing is at the end of the day are our most important work. Clothes are on a classified environment, and we don't have the ability to run those workloads in the cloud. And so to do it on the open side and not be ableto leverage it on the close side, it really takes away some of the value of because we really want to make the two environments look a similar is possible leverage our staff and and everything like that. So that's where Cloud just doesn't quite fit >> in for us. You were talking about, you know, the speed of, Of of Sierra. And then also mentioning El Capitan, which is thie the next generation. You're next, You know, super unbelievably fast computer to an extent of ten X that off current speed is within the next four to five years. >> Right? That's the goal. I >> mean, what those Some numbers that is there because you put a pretty impressive array up there, >> right? So Series about one hundred twenty five PETA flops and are the big Holy Grail for high performance computing is excess scale and exit flop of performance. And so, you know, El Capitan is targeted to be, you know, one point two, maybe one point five exit flops or even Mohr again. That's peak performance. It doesn't necessarily translate into what our applications, um, I can get out of the platform. But the reason you keep sometimes I think, isn't it enough isn't one hundred twenty five five's enough, But it's never enough because any time we get another platform, people figure out how to do things with it that they've never done before. Either they're solving problems faster than they could. And so now they're able to explore a solution space much faster. Or they want to look at, you know, these air simulations of three dimensional space, and they want to be able to look at it in a more fine grain level. So again, every computer we get, we can either push a workload through ten times faster. Or we can look at a simulation. You know, that's ten times more resolved than the one that >> we could do before. So do this for made and for folks at home and take the work that you do and translate that toe. Why that exponential increase in speed will make you better. What you do in terms of decision making and processing of information, >> right? So, yeah, so the thing is, these these nuclear weapons systems are very complicated. There's multi physics. There's lots of different interactions going on, and to really understand them at the lowest level. One of the reasons that's so important now is we're maintaining a stockpile that is well beyond the life span that it was designed for. You know, these nuclear weapons, some of them were built in the fifties, the sixties and seventies. They weren't designed to last this long, right? And so now they're sort of out of their design regime, and we really have to understand their behaviour and their properties as they age. So it opens up a whole nother area, you know, that we have to be able to floor and and just some of that physics has never been explored before. So, you know, the problems get more challenging the farther we get away from the design basis of these weapons, but also were really starting to do new things like eh, I am machine learning things that weren't part of our workflow before. We're starting to incorporate machine learning in with simulation again to help explore a very large problem space and be ableto find interesting areas within a simulation to focus in on. And so that's a really exciting area. And that is also an area where, you know, GPS and >> stuff just exploded. You know, the performance levels that people are seeing on these machines? Well, we thank you for your work. It is critically important, azaz, we all realize and wonderfully fascinating at the same time. So thanks for the insights here on for your time. We appreciate that. >> All right, Thanks for >> thanking Robin Goldstone. Joining us back with more here on the Cube. You're watching our coverage live from Boston of Red Hat Summit twenty nineteen.
SUMMARY :
Have some twenty nineteen brought to you by bread. center along with Sue Mittleman. Good to see you. saw you on the Keystone States this morning. And you know, of interesting technology. five flops and the like, but tell us a little bit about, you know, kind of the why and the what And we really do care about just every little piece of the hardware you know, in the open source community. And you know, it's a lot of work to make it work for you. Let's get you know, We can modify it if we want, But you know what at the end of the day, were happy to hand over Is that the first time on one kind of the But so, you know, again that the service themselves So you run Infinite Band today. You know, if we could use it, if we could use Ethernet, And so obviously open source was, you know? came along when we showed that we could build a, you know, a cluster that So, you know, when I look at the public odd, they do a lot with government agencies. You know, if we had to go and buy it, it would take six months because you know, we are the government. You were talking about, you know, the speed of, Of of Sierra. That's the goal. And so, you know, El Capitan is targeted to be, you know, one point two, So do this for made and for folks at home and take the work that you do And that is also an area where, you know, GPS and Well, we thank you for your work. of Red Hat Summit twenty nineteen.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Sue Mittleman | PERSON | 0.99+ |
IBM | ORGANIZATION | 0.99+ |
Robin Goldstone | PERSON | 0.99+ |
Robin | PERSON | 0.99+ |
John Walls | PERSON | 0.99+ |
ten times | QUANTITY | 0.99+ |
Cold War | EVENT | 0.99+ |
six months | QUANTITY | 0.99+ |
Boston, Massachusetts | LOCATION | 0.99+ |
HBC | ORGANIZATION | 0.99+ |
One | QUANTITY | 0.99+ |
Lennox | ORGANIZATION | 0.99+ |
El Capitan | TITLE | 0.99+ |
thirty thousand foot | QUANTITY | 0.98+ |
two environments | QUANTITY | 0.98+ |
one point | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
late nineteen nineties | DATE | 0.98+ |
Mexico | LOCATION | 0.98+ |
one hundred | QUANTITY | 0.98+ |
Harrier | PERSON | 0.98+ |
five years | QUANTITY | 0.98+ |
today | DATE | 0.97+ |
four | QUANTITY | 0.97+ |
first time | QUANTITY | 0.97+ |
Cray | ORGANIZATION | 0.97+ |
Red Hat | TITLE | 0.97+ |
Boston | LOCATION | 0.96+ |
early fifties | DATE | 0.96+ |
red hat | TITLE | 0.96+ |
twenty nineteen | QUANTITY | 0.96+ |
Sierra | LOCATION | 0.96+ |
first | QUANTITY | 0.95+ |
this morning | DATE | 0.93+ |
ten | QUANTITY | 0.93+ |
six | QUANTITY | 0.92+ |
one hundred twenty five flops | QUANTITY | 0.9+ |
sixties | DATE | 0.89+ |
one servers | QUANTITY | 0.88+ |
Itanium | ORGANIZATION | 0.87+ |
intel | ORGANIZATION | 0.86+ |
Of of Sierra | ORGANIZATION | 0.86+ |
First | QUANTITY | 0.83+ |
five | QUANTITY | 0.82+ |
Sierra | ORGANIZATION | 0.8+ |
Red Hat | ORGANIZATION | 0.8+ |
Red Hat Summit 2019 | EVENT | 0.79+ |
Roland | ORGANIZATION | 0.79+ |
Lawrence Livermore National Laboratory | ORGANIZATION | 0.79+ |
Red Hat Summit twenty | EVENT | 0.79+ |
two | QUANTITY | 0.78+ |
Keystone States | LOCATION | 0.78+ |
seventies | DATE | 0.78+ |
Red | ORGANIZATION | 0.76+ |
twenty five five | QUANTITY | 0.73+ |
early two thousand | DATE | 0.71+ |
Lawrence Livermore | LOCATION | 0.71+ |
Sierra | COMMERCIAL_ITEM | 0.69+ |
Erm | PERSON | 0.66+ |
Mohr | PERSON | 0.65+ |
supercomputer | QUANTITY | 0.64+ |
one hundred twenty five | QUANTITY | 0.62+ |
Path | OTHER | 0.59+ |
Band | OTHER | 0.58+ |
National Laboratory | ORGANIZATION | 0.55+ |
band | OTHER | 0.55+ |
Gove Cloud | TITLE | 0.54+ |
nineteen | QUANTITY | 0.53+ |
fifties | DATE | 0.52+ |
number | QUANTITY | 0.52+ |
Beta Wolf | OTHER | 0.52+ |
dimensional | QUANTITY | 0.49+ |
sixty | ORGANIZATION | 0.47+ |
six | COMMERCIAL_ITEM | 0.45+ |
American | PERSON | 0.43+ |
Sierra | TITLE | 0.42+ |