Armando Acosta, Dell Technologies and Matt Leininger, Lawrence Livermore National Laboratory

(upbeat music) >> We are back, approaching the finish line here at Supercomputing 22, our last interview of the day, our last interview of the show. And I have to say Dave Nicholson, my co-host, My name is Paul Gillin. I've been attending trade shows for 40 years Dave, I've never been to one like this. The type of people who are here, the type of problems they're solving, what they talk about, the trade shows are typically, they're so speeds and feeds. They're so financial, they're so ROI, they all sound the same after a while. This is truly a different event. Do you get that sense? >> A hundred percent. Now, I've been attending trade shows for 10 years since I was 19, in other words, so I don't have necessarily your depth. No, but seriously, Paul, totally, completely, completely different than any other conference. First of all, there's the absolute allure of looking at the latest and greatest, coolest stuff. I mean, when you have NASA lecturing on things when you have Lawrence Livermore Labs that we're going to be talking to here in a second it's a completely different story. You have all of the academics you have students who are in competition and also interviewing with organizations. It's phenomenal. I've had chills a lot this week. >> And I guess our last two guests sort of represent that cross section. Armando Acosta, director of HPC Solutions, High Performance Solutions at Dell. And Matt Leininger, who is the HPC Strategist at Lawrence Livermore National Laboratory. Now, there is perhaps, I don't know you can correct me on this, but perhaps no institution in the world that uses more computing cycles than Lawrence Livermore National Laboratory and is always on the leading edge of what's going on in Supercomputing. And so we want to talk to both of you about that. Thank you. Thank you for joining us today. >> Sure, glad to be here. >> For having us. >> Let's start with you, Armando. Well, let's talk about the juxtaposition of the two of you. I would not have thought of LLNL as being a Dell reference account in the past. Tell us about the background of your relationship and what you're providing to the laboratory. >> Yeah, so we're really excited to be working with Lawrence Livermore, working with Matt. But actually this process started about two years ago. So we started looking at essentially what was coming down the pipeline. You know, what were the customer requirements. What did we need in order to make Matt successful. And so the beauty of this project is that we've been talking about this for two years, and now it's finally coming to fruition. And now we're actually delivering systems and delivering racks of systems. But what I really appreciate is Matt coming to us, us working together for two years and really trying to understand what are the requirements, what's the schedule, what do we need to hit in order to make them successful >> At Lawrence Livermore, what drives your computing requirements I guess? You're working on some very, very big problems but a lot of very complex problems. How do you decide what you need to procure to address them? >> Well, that's a difficult challenge. I mean, our mission is a national security mission dealing with making sure that we do our part to provide the high performance computing capabilities to the US Department of Energy's National Nuclear Security Administration. We do that through the Advanced Simulation computing program. Its goal is to provide that computing power to make sure that the US nuclear rep of the stockpile is safe, secure, and effective. So how we go about doing that? There's a lot of work involved. We have multiple platform lines that we accomplish that goal with. One of them is the advanced technology systems. Those are the ones you've heard about a lot, they're pushing towards exit scale, the GPU technologies incorporated into those. We also have a second line, a platform line, called the Commodity Technology Systems. That's where right now we're partnering with Dell on the latest generation of those. Those systems are a little more conservative, they're right now CPU only driven but they're also intended to be the everyday work horses. So those are the first systems our users get on. It's very easy for them to get their applications up and running. They're the first things they use usually on a day to day basis. They run a lot of small to medium size jobs that you need to do to figure out how to most effectively use what workloads you need to move to the even larger systems to accomplish our mission goals. >> The workhorses. >> Yeah. >> What have you seen here these last few days of the show, what excites you? What are the most interesting things you've seen? >> There's all kinds of things that are interesting. Probably most interesting ones I can't talk about in public, unfortunately, 'cause of NDA agreements, of course. But it's always exciting to be here at Supercomputing. It's always exciting to see the products that we've been working with industry and co-designing with them on for, you know, several years before the public actually sees them. That's always an exciting part of the conference as well specifically with CTS-2, it's exciting. As was mentioned before, I've been working with Dell for nearly two years on this, but the systems first started being delivered this past August. And so we're just taking the initial deliveries of those. We've deployed, you know, roughly about 1600 nodes now but that'll ramp up to over 6,000 nodes over the next three or four months. >> So how does this work intersect with Sandia and Los Alamos? Explain to us the relationship there. >> Right, so those three laboratories are the laboratories under the National Nuclear Security Administration. We partner together on CTS. So the architectures, as you were asking, how do we define these things, it's the labs coming together. Those three laboratories we define what we need for that architecture. We have a joint procurement that is run out of Livermore but then the systems are deployed at all three laboratories. And then they serve the programs that I mentioned for each laboratory as well. >> I've worked in this space for a very long time you know I've worked with agencies where the closest I got to anything they were actually doing was the sort of guest suite outside the secure area. And sometimes there are challenges when you're communicating, it's like you have a partner like Dell who has all of these things to offer, all of these ideas. You have requirements, but maybe you can't share 100% of what you need to do. How do you navigate that? Who makes the decision about what can be revealed in these conversations? You talk about NDA in terms of what's been shared with you, you may be limited in terms of what you can share with vendors. Does that cause inefficiency? >> To some degree. I mean, we do a good job within the NSA of understanding what our applications need and then mapping that to technical requirements that we can talk about with vendors. We also have kind of in between that we've done this for many years. A recent example is of course with the exit scale computing program and some things it's doing creating proxy apps or mini apps that are smaller versions of some of the things that we are important to us. Some application areas are important to us, hydrodynamics, material science, things like that. And so we can collaborate with vendors on those proxy apps to co-design systems and tweak the architectures. In fact, we've done a little bit that with CTS-2, not as much in CTS as maybe in the ATS platforms but that kind of general idea of how we collaborate through these proxy applications is something we've used across platforms. >> Now is Dell one of your co-design partners? >> In CTS-2 absolutely, yep. >> And how, what aspects of CTS-2 are you working on with Dell? >> Well, the architecture itself was the first, you know thing we worked with them on, we had a procurement come out, you know they bid an architecture on that. We had worked with them, you know but previously on our requirements, understanding what our requirements are. But that architecture today is based on the fourth generation Intel Xeon that you've heard a lot about at the conference. We are one of the first customers to get those systems in. All the systems are interconnected together with the Cornell Network's Omni-Path Network that we've used before and are very excited about as well. And we build up from there. The systems get integrated in by the operations teams at the laboratory. They get integrated into our production computing environment. Dell is really responsible, you know for designing these systems and delivering to the laboratories. The laboratories then work with Dell. We have a software stack that we provide on top of that called TOSS, for Tri-Lab Operating System. It's based on Redhead Enterprise Linux. But the goal there is that it allows us, a common user environment, a common simulation environment across not only CTS-2, but maybe older systems we have and even the larger systems that we'll be deploying as well. So from a user perspective they see a common user interface, a common environment across all the different platforms that they use at Livermore and the other laboratories. >> And Armando, what does Dell get out of the co-design arrangement with the lab? >> Well, we get to make sure that they're successful. But the other big thing that we want to do, is typically when you think about Dell and HPC, a lot of people don't make that connection together. And so what we're trying to do is make sure that, you know they know that, hey, whether you're a work group customer at the smallest end or a super computer customer at the highest end, Dell wants to make sure that we have the right setup portfolio to match any needs across this. But what we were really excited about this, this is kind of our, you know big CTS-2 first thing we've done together. And so, you know, hopefully this has been successful. We've made Matt happy and we look forward to the future what we can do with bigger and bigger things. >> So will the labs be okay with Dell coming up with a marketing campaign that said something like, "We can't confirm that alien technology is being reverse engineered." >> Yeah, that would fly. >> I mean that would be right, right? And I have to ask you the question directly and the way you can answer it is by smiling like you're thinking, what a stupid question. Are you reverse engineering alien technology at the labs? >> Yeah, you'd have to suck the PR office. >> Okay, okay. (all laughing) >> Good answer. >> No, but it is fascinating because to a degree it's like you could say, yeah, we're working together but if you really want to dig into it, it's like, "Well I kind of can't tell you exactly how some of this stuff is." Do you consider anything that you do from a technology perspective, not what you're doing with it, but the actual stack, do you try to design proprietary things into the stack or do you say, "No, no, no, we're going to go with standards and then what we do with it is proprietary and secret."? >> Yeah, it's more the latter. >> Is the latter? Yeah, yeah, yeah. So you're not going to try to reverse engineer the industry? >> No, no. We want the solutions that we develop to enhance the industry to be able to apply to a broader market so that we can, you know, gain from the volume of that market, the lower cost that they would enable, right? If we go off and develop more and more customized solutions that can be extraordinarily expensive. And so we we're really looking to leverage the wider market, but do what we can to influence that, to develop key technologies that we and others need that can enable us in the high forms computing space. >> We were talking with Satish Iyer from Dell earlier about validated designs, Dell's reference designs for for pharma and for manufacturing, in HPC are you seeing that HPC, Armando, and is coming together traditionally and more of an academic research discipline beginning to come together with commercial applications? And are these two markets beginning to blend? >> Yeah, I mean so here's what's happening, is you have this convergence of HPC, AI and data analytics. And so when you have that combination of those three workloads they're applicable across many vertical markets, right? Whether it's financial services, whether it's life science, government and research. But what's interesting, and Matt won't brag about, but a lot of stuff that happens in the DoE labs trickles down to the enterprise space, trickles down to the commercial space because these guys know how to do it at scale, they know how to do it efficiently and they know how to hit the mark. And so a lot of customers say, "Hey we want what CTS-2 does," right? And so it's very interesting. The way I love it is their process the way they do the RFP process. Matt talked about the benchmarks and helping us understand, hey here's kind of the mark you have to hit. And then at the same time, you know if we make them successful then obviously it's better for all of us, right? You know, I want to secure nuclear stock pile so I hope everybody else does as well. >> The software stack you mentioned, I think Tia? >> TOSS. >> TOSS. >> Yeah. >> How did that come about? Why did you feel the need to develop your own software stack? >> It originated back, you know, even 20 years ago when we first started building Linux clusters when that was a crazy idea. Livermore and other laboratories were really the first to start doing that and then push them to larger and larger scales. And it was key to have Linux running on that at the time. And so we had the. >> So 20 years ago you knew you wanted to run on Linux? >> Was 20 years ago, yeah, yeah. And we started doing that but we needed a way to have a version of Linux that we could partner with someone on that would do, you know, the support, you know, just like you get from an EoS vendor, right? Security support and other things. But then layer on top of that, all the HPC stuff you need either to run the system, to set up the system, to support our user base. And that evolved into to TOSS which is the Tri-Lab Operating System. Now it's based on the latest version of Redhead Enterprise Linux, as I mentioned before, with all the other HPC magic, so to speak and all that HPC magic is open source things. It's not stuff, it may be things that we develop but it's nothing closed source. So all that's there we run it across all these different environments as I mentioned before. And it really originated back in the early days of, you know, Beowulf clusters, Linux clusters, as just needing something that we can use to run on multiple systems and start creating that common environment at Livermore and then eventually the other laboratories. >> How is a company like Dell, able to benefit from the open source work that's coming out of the labs? >> Well, when you look at the open source, I mean open source is good for everybody, right? Because if you make a open source tool available then people start essentially using that tool. And so if we can make that open source tool more robust and get more people using it, it gets more enterprise ready. And so with that, you know, we're all about open source we're all about standards and really about raising all boats 'cause that's what open source is all about. >> And with that, we are out of time. This is our 28th interview of SC22 and you're taking us out on a high note. Armando Acosta, director of HPC Solutions at Dell. Matt Leininger, HPC Strategist, Lawrence Livermore National Laboratories. Great discussion. Hopefully it was a good show for you. Fascinating show for us and thanks for being with us today. >> Thank you very much. >> Thank you for having us >> Dave it's been a pleasure. >> Absolutely. >> Hope we'll be back next year. >> Can't believe, went by fast. Absolutely at SC23. >> We hope you'll be back next year. This is Paul Gillin. That's a wrap, with Dave Nicholson for theCUBE. See here in next time. (soft upbear music)

Published Date : Nov 17 2022

SUMMARY :

And I have to say Dave You have all of the academics and is always on the leading edge about the juxtaposition of the two of you. And so the beauty of this project How do you decide what you need that you need to do but the systems first Explain to us the relationship there. So the architectures, as you were asking, 100% of what you need to do. And so we can collaborate with and the other laboratories. And so, you know, hopefully that said something like, And I have to ask you and then what we do with it reverse engineer the industry? so that we can, you know, gain And so when you have that combination running on that at the time. all the HPC stuff you need And so with that, you know, and thanks for being with us today. Absolutely at SC23. with Dave Nicholson for theCUBE.

ENTITIES

Entity	Category	Confidence
Matt Leininger	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Dave Nicholson	PERSON	0.99+
Paul Gillin	PERSON	0.99+
National Nuclear Security Administration	ORGANIZATION	0.99+
Armando Acosta	PERSON	0.99+
Cornell Network	ORGANIZATION	0.99+
Dell	ORGANIZATION	0.99+
Matt	PERSON	0.99+
CTS-2	TITLE	0.99+
US Department of Energy	ORGANIZATION	0.99+
Dave	PERSON	0.99+
two	QUANTITY	0.99+
10 years	QUANTITY	0.99+
40 years	QUANTITY	0.99+
two years	QUANTITY	0.99+
next year	DATE	0.99+
Lawrence Livermore	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
CTS	TITLE	0.99+
Dell Technologies	ORGANIZATION	0.99+
Paul	PERSON	0.99+
Linux	TITLE	0.99+
NASA	ORGANIZATION	0.99+
HPC Solutions	ORGANIZATION	0.99+
both	QUANTITY	0.99+
Lawrence Livermore Labs	ORGANIZATION	0.99+
today	DATE	0.99+
Los Alamos	ORGANIZATION	0.99+
One	QUANTITY	0.99+
Lawrence Livermore National Laboratory	ORGANIZATION	0.99+
Armando	ORGANIZATION	0.99+
each laboratory	QUANTITY	0.99+
second line	QUANTITY	0.99+
over 6,000 nodes	QUANTITY	0.99+
20 years ago	DATE	0.98+
three laboratories	QUANTITY	0.98+
28th interview	QUANTITY	0.98+
Lawrence Livermore National Laboratories	ORGANIZATION	0.98+
three	QUANTITY	0.98+
first	QUANTITY	0.98+
Tri-Lab	ORGANIZATION	0.98+
Sandia	ORGANIZATION	0.97+
one	QUANTITY	0.97+
First	QUANTITY	0.97+
two markets	QUANTITY	0.97+
Supercomputing	ORGANIZATION	0.96+
first systems	QUANTITY	0.96+
fourth generation	QUANTITY	0.96+
this week	DATE	0.96+
Livermore	ORGANIZATION	0.96+
Omni-Path Network	ORGANIZATION	0.95+
about 1600 nodes	QUANTITY	0.95+
Lawrence Livermore National Laboratory	ORGANIZATION	0.94+
LLNL	ORGANIZATION	0.93+
NDA	ORGANIZATION	0.93+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for LLNL: