Rajesh Pohani and Dan Stanzione | CUBE Conversation, February 2022

(contemplative upbeat music) >> Hello and welcome to this CUBE Conversation. I'm John Furrier, your host of theCUBE, here in Palo Alto, California. Got a great topic on expanding capabilities for urgent computing. Dan Stanzione, he's Executive Director of TACC, the Texas Advanced Computing Center, and Rajesh Pohani, VP of PowerEdge, HPC Core Compute at Dell Technologies. Gentlemen, welcome to this CUBE Conversation. >> Thanks, John. >> Thanks, John, good to be here. >> Rajesh, you got a lot of computing in PowerEdge, HPC, Core Computing. I mean, I get a sense that you love compute, so we'll jump right into it. And of course, I got to love TACC, Texas Advanced Computing Center. I can imagine a lot of stuff going on there. Let's start with TACC. What is the Texas Advanced Computing Center? Tell us a little bit about that. >> Yeah, we're part of the University of Texas at Austin here, and we build large-scale supercomputers, data systems, AI systems, to support open science research. And we're mainly funded by the National Science Foundation, so we support research projects in all fields of science, all around the country and around the world. Actually, several thousand projects at the moment. >> But tied to the university, got a lot of gear, got a lot of compute, got a lot of cool stuff going on. What's the coolest thing you got going on right now? >> Well, for me, it's always the next machine, but I think science-wise, it's the machines we have. We just finished deploying Lonestar6, which is our latest supercomputer, in conjunction with Dell. A little over 600 nodes of those PowerEdge servers that Rajesh builds for us. Which makes more than 20,000 that we've had here over the years, of those boxes. But that one just went into production. We're designing new systems for a few years from now, where we'll be even larger. Our Frontera system was top five in the world two years ago, just fell out of the top 10. So we've got to fix that and build the new top-10 system sometime soon. We always have a ton going on in large-scale computing. >> Well, I want to get to the Lonestar6 in a minute, on the next talk track, but... What are some of the areas that you guys are working on that are making an impact? Take us through, and we talked before we came on camera about, obviously, the academic affiliation, but also there's a real societal impact of the work you're doing. What are some of the key areas that the TACC is making an impact? >> So there's really a huge range from new microprocessors, new materials design, photovoltaics, climate modeling, basic science and astrophysics, and quantum mechanics, and things like that. But I think the nearest-term impacts that people see are what we call urgent computing, which is one of the drivers around Lonestar and some other recent expansions that we've done. And that's things like, there's a hurricane coming, exactly where is it going to land? Can we refine the area where there's going to be either high winds or storm surge? Can we assess the damage from digital imagery afterwards? Can we direct first responders in the optimal routes? Similarly for earthquakes, and a lot recently, as you might imagine, around COVID. In 2020, we moved almost a third of our resources to doing COVID work, full-time. >> Rajesh, I want to get your thoughts on this, because Dave Vellante and I have been talking about this on theCUBE recently, a lot. Obviously, people see what cloud's, going on with the cloud technology, but compute and on-premises, private cloud's been growing. If you look at the hyperscale on-premises and the edge, if you include that in, you're seeing a lot more user consumption on-premises, and now, with 5G, you got edge, you mentioned first responders, Dan. This is now pointing to a new architectural shift. As the VP of PowerEdge and HPC and Core Compute, you got to look at this and go, "Hmm." If Compute's going to be everywhere, and in locations, you got to have that compute. How does that all work together? And how do you do advanced computing, when you have these urgent needs, as well as real-time in a new architecture? >> Yeah, John, I mean, it's a pretty interesting time when you think about some of the changing dynamics and how customers are utilizing Compute in the compute needs in the industry. Seeing a couple of big trends. One, the distribution of Compute outside of the data center, 5G is really accelerating that, and then you're generating so much data, whether what you do with it, the insights that come out of it, that we're seeing more and more push to AI, ML, inside the data center. Dan mentioned what he's doing at TACC with computational analysis and some of the work that they're doing. So what you're seeing is, now, this push that data in the data center and what you do with it, while data is being created out at the edge. And it's actually this interesting dichotomy that we're beginning to see. Dan mentioned some of the work that they're doing in medical and on COVID research. Even at Dell, we're making cycles available for COVID research using our Zenith cluster, that's located in our HPC and AI Innovation Lab. And we continue to partner with organizations like TACC and others on research activities to continue to learn about the virus, how it mutates, and then how you treat it. So if you think about all the things, and data that's getting created, you're seeing that distribution and it's really leading to some really cool innovations going forward. >> Yeah, I want to get to that COVID research, but first, you mentioned a few words I want to get out there. You mentioned Lonestar6. Okay, so first, what is Lonestar6, then we'll get into the system aspect of it. Take us through what that definition is, what is Lonestar6? >> Well, as Dan mentioned, Lonestar6 is a Dell technology system that we developed with TACC, it's located at the University of Texas at Austin. It consists of more than 800 Dell PowerEdge 6525 servers that are powered with 3rd Generation AMD EPYC processors. And just to give you an example of the scale of this cluster, it could perform roughly three quadrillion operations per second. That's three petaFLOPS, and to match what Lonestar6 can compute in one second, a person would have to do one calculation every second for a hundred million years. So it's quite a good-size system, and quite a powerful one as well. >> Dan, what's the role that the system plays, you've got petaFLOPS, what, three petaFLOPS, you mentioned? That's a lot of FLOPS! So obviously urgent computing, what's cranking through the system there? Take us through, what's it like? >> Sure, well, there there's a mix of workloads on it, and on all our systems. So there's the urgent computing work, right? Fast turnaround, near real-time, whether it's COVID research, or doing... Project now where we bring in MRI data and are doing sort of patient-specific dosing for radiation treatments and chemotherapy, tailored to your tumor, instead of just the sort of general for people your size. That all requires sort of real-time turnaround. There's a lot AI research going on now, we're incorporating AI in traditional science and engineering research. And that uses an awful lot of data, but also consumes a huge amount of cycles in training those models. And then there's all of our traditional, simulation-based workloads and materials and digital twins for aircraft and aircraft design, and more efficient combustion in more efficient photovoltaic materials, or photovoltaic materials without using as much lead, and things like that. And I'm sure I'm missing dozens of other topics, 'cause, like I said, that one really runs every field of science. We've really focused the Lonestar line of systems, and this is obviously the sixth one we built, around our sort of Texas-centric users. It's the UT Austin users, and then with contributions from Texas A&M , and Texas Tech and the University of Texas system, MD Anderson Healthcare Center, the University of North Texas. So users all around the state, and every research problem that you might imagine, those are into. We're just ramping up a project in disaster information systems, that's looking at the probabilities of flooding in coastal Texas and doing... Can we make building code changes to mitigate impact? Do we have to change the standard foundation heights for new construction, to mitigate the increasing storm surges from these sort of slow storms that sit there and rain, like hurricanes didn't used to, but seem to be doing more and more. All those problems will run on Lonestar, and on all the systems to come, yeah. >> It's interesting, you mentioned urgent computing, I love that term because it could be an event, it could be some slow kind of brewing event like that rain example you mentioned. It could also be, obviously, with the healthcare, and you mentioned COVID earlier. These are urgent, societal challenges, and having that available, the processing capability, the compute, the data. You mentioned digital twins. I can imagine all this new goodness coming from that. Compare that, where we were 10 years ago. I mean, just from a mind-blowing standpoint, you have, have come so far, take us through, try to give a context to the level of where we are now, to do this kind of work, and where we were years ago. Can you give us a feel for that? >> Sure, there's a lot of ways to look at that, and how the technology's changed, how we operate around those things, and then sort of what our capabilities are. I think one of the big, first, urgent computing things for us, where we sort of realized we had to adapt to this model of computing was about 15 years ago with the big BP Gulf Oil spill. And suddenly, we were dumping thousands of processors of load to figure out where that oil spill was going to go, and how to do mitigation, and what the potential impacts were, and where you need to put your containment, and things like that. And it was, well, at that point we thought of it as sort of a rare event. There was another one, that I think was the first real urgent computing one, where the space shuttle was in orbit, and they knew something had hit it during takeoff. And we were modeling, along with NASA and a bunch of supercomputers around the world, the heat shield and could they make reentry safely? You have until they come back to get that problem done, you don't have months or years to really investigate that. And so, what we've sort of learned through some of those, the Japanese tsunami was another one, there have been so many over the years, is that one, these sort of disasters are all the time, right? One thing or another, right? If we're not doing hurricanes, we're doing wildfires and drought threat, if it's not COVID. We got good and ready for COVID through SARS and through the swine flu and through HIV work, and things like that. So it's that we can do the computing very fast, but you need to know how to do the work, right? So we've spent a lot of time, not only being able to deliver the computing quickly, but having the data in place, and having the code in place, and having people who know the methods who know how to use big computers, right? That's been a lot of what the COVID Consortium, the White House COVID Consortium, has been about over the last few years. And we're actually trying to modify that nationally into a strategic computing reserve, where we're ready to go after these problems, where we've run drills, right? And if there's a, there's a train that derails, and there's a chemical spill, and it's near a major city, we have the tools and the data in place to do wind modeling, and we have the terrain ready to go. And all those sorts of things that you need to have to be ready. So we've really sort of changed our sort of preparedness and operational model around urgent computing in the last 10 years. Also, just the way we scheduled the system, the ability to sort of segregate between these long-running workflows for things that are really important, like we displaced a lot of cancer research to do COVID research. And cancer's still important, but it's less likely that we're going to make an impact in the next two months, right? So we have to shuffle how we operate things and then just, having all that additional capacity. And I think one of the things that's really changed in the models is our ability to use AI, to sort of adroitly steer our simulations, or prune the space when we're searching parameters for simulations. So we have the operational changes, the system changes, and then things like adding AI on the scientific side, since we have the capacity to do that kind of things now, all feed into our sort of preparedness for this kind of stuff. >> Dan, you got me sold, I want to come work with you. Come on, can I join the team over there? It sounds exciting. >> Come on down! We always need good folks around here, so. (laughs) >> Rajesh, when I- >> Almost 200 now, and we're always growing. >> Rajesh, when I hear the stories about kind of the evolution, kind of where the state of the art is, you almost see the innovation trajectory, right? The growth and the learning, adding machine learning only extends out more capabilities. But also, Dan's kind of pointing out this kind of response, rapid compute engine, that they could actually deploy with learnings, and then software, so is this a model where anyone can call up and get some cycles to, say, power an autonomous vehicle, or, hey, I want to point the machinery and the cycles at something? Is the service, do you guys see this going that direction, or... Because this sounds really, really good. >> Yeah, I mean, one thing that Dan talked about was, it's not just the compute, it's also having the right algorithms, the software, the code, right? The ability to learn. So I think when those are set up, yeah. I mean, the ability to digitally simulate in any number of industries and areas, advances the pace of innovation, reduces the time to market of whatever a customer is trying to do or research, or even vaccines or other healthcare things. If you can reduce that time through the leverage of compute on doing digital simulations, it just makes things better for society or for whatever it is that we're trying to do, in a particular industry. >> I think the idea of instrumenting stuff is here forever, and also simulations, whether it's digital twins, and doing these kinds of real-time models. Isn't really much of a guess, so I think this is a huge, historic moment. But you guys are pushing the envelope here, at University of Texas and at TACC. It's not just research, you guys got real examples. So where do you guys see this going next? I see space, big compute areas that might need some data to be cranked out. You got cybersecurity, you got healthcare, you mentioned oil spill, you got oil and gas, I mean, you got industry, you got climate change. I mean, there's so much to tackle. What's next? >> Absolutely, and I think, the appetite for computing cycles isn't going anywhere, right? And it's only going to, it's going to grow without bound, essentially. And AI, while in some ways it reduces the amount of computing we do, it's also brought this whole new domain of modeling to a bunch of fields that weren't traditionally computational, right? We used to just do engineering, physics, chemistry, were all super computational, but then we got into genome sequencers and imaging and a whole bunch of data, and that made biology computational. And with AI, now we're making things like the behavior of human society and things, computational problems, right? So there's this sort of growing amount of workload that is, in one way or another, computational, and getting bigger and bigger. So that's going to keep on growing. I think the trick is not only going to be growing the computation, but growing the software and the people along with it, because we have amazing capabilities that we can bring to bear. We don't have enough people to hit all of them at once. And so, that's probably going to be the next frontier in growing out both our AI and simulation capability, is the human element of it. >> It's interesting, when you think about society, right? If the things become too predictable, what does a democracy even look like? If you know the election's going to be over two years from now in the United States, or you look at these major, major waves >> Human companies don't know. >> of innovation, you say, "Hmm." So it's democracy, AI, maybe there's an algorithm for checking up on the AI 'cause biases... So, again, there's so many use cases that just come out of this. It's incredible. >> Yeah, and bias in AI is something that we worry about and we work on, and on task forces where we're working on that particular problem, because the AI is going to take... Is based on... Especially when you look at a deep learning model, it's 100% a product of the data you show it, right? So if you show it a biased data set, it's going to have biased results. And it's not anything intrinsic about the computer or the personality, the AI, it's just data mining, right? In essence, right, it's learning from data. And if you show it all images of one particular outcome, it's going to assume that's always the outcome, right? It just has no choice, but to see that. So how we deal with bias, how do we deal with confirmation, right? I mean, in addition, you have to recognize, if you haven't, if it gets data it's never seen before, how do you know it's not wrong, right? So there's about data quality and quality assurance and quality checking around AI. And that's where, especially in scientific research, we use what's starting to be called things like physics-informed or physics-constrained AI, where the neural net that you're using to design an aircraft still has to follow basic physical laws in its output, right? Or if you're doing some materials or astrophysics, you still have to obey conservation of mass, right? So I can't say, well, if you just apply negative mass on this other side and positive mass on this side, everything works out right for stable flight. 'Cause we can't do negative mass, right? So you have to constrain it in the real world. So this notion of how we bring in the laws of physics and constrain your AI to what's possible is also a big part of the sort of AI research going forward. >> You know, Dan, you just, to me just encapsulate the science that's still out there, that's needed. Computer science, social science, material science, kind of all converging right now. >> Yeah, engineering, yeah, >> Engineering, science, >> slipstreams, >> it's all there, >> physics, yeah, mmhmm. >> it's not just code. And, Rajesh, data. You mentioned data, the more data you have, the better the AI. We have a world what's going from silos to open control planes. We have to get to a world. This is a cultural shift we're seeing, what's your thoughts? >> Well, it is, in that, the ability to drive predictive analysis based on the data is going to drive different behaviors, right? Different social behaviors for cultural impacts. But I think the point that Dan made about bias, right, it's only as good as the code that's written and the way that the data is actually brought into the system. So making sure that that is done in a way that generates the right kind of outcome, that allows you to use that in a predictive manner, becomes critically important. If it is biased, you're going to lose credibility in a lot of that analysis that comes out of it. So I think that becomes critically important, but overall, I mean, if you think about the way compute is, it's becoming pervasive. It's not just in selected industries as damage, and it's now applying to everything that you do, right? Whether it is getting you more tailored recommendations for your purchasing, right? You have better options that way. You don't have to sift through a lot of different ideas that, as you scroll online. It's tailoring now to some of your habits and what you're looking for. So that becomes an incredible time-saver for people to be able to get what they want in a way that they want it. And then you look at the way it impacts other industries and development innovation, and it just continues to scale and scale and scale. >> Well, I think the work that you guys are doing together is scratching the surface of the future, which is digital business. It's about data, it's about out all these new things. It's about advanced computing meets the right algorithms for the right purpose. And it's a really amazing operation you guys got over there. Dan, great to hear the stories. It's very provocative, very enticing to just want to jump in and hang out. But I got to do theCUBE day job here, but congratulations on success. Rajesh, great to see you and thanks for coming on theCUBE. >> Thanks for having us, John. >> Okay. >> Thanks very much. >> Great conversation around urgent computing, as computing becomes so much more important, bigger problems and opportunities are around the corner. And this is theCUBE, we're documenting it all here. I'm John Furrier, your host. Thanks for watching. (contemplative music)

Published Date : Feb 25 2022

SUMMARY :

the Texas Advanced Computing Center, good to be here. And of course, I got to love TACC, and around the world. What's the coolest thing and build the new top-10 of the work you're doing. in the optimal routes? and now, with 5G, you got edge, and some of the work that they're doing. but first, you mentioned a few of the scale of this cluster, and on all the systems to come, yeah. and you mentioned COVID earlier. in the models is our ability to use AI, Come on, can I join the team over there? Come on down! and we're always growing. Is the service, do you guys see this going I mean, the ability to digitally simulate So where do you guys see this going next? is the human element of it. of innovation, you say, "Hmm." the AI is going to take... You know, Dan, you just, the more data you have, the better the AI. and the way that the data Rajesh, great to see you are around the corner.

ENTITIES

Entity	Category	Confidence
Dan	PERSON	0.99+
Dan Stanzione	PERSON	0.99+
Dave Vellante	PERSON	0.99+
Rajesh	PERSON	0.99+
John	PERSON	0.99+
Rajesh Pohani	PERSON	0.99+
National Science Foundation	ORGANIZATION	0.99+
TACC	ORGANIZATION	0.99+
John Furrier	PERSON	0.99+
Texas A&M	ORGANIZATION	0.99+
February 2022	DATE	0.99+
NASA	ORGANIZATION	0.99+
100%	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
Texas Advanced Computing Center	ORGANIZATION	0.99+
United States	LOCATION	0.99+
2020	DATE	0.99+
COVID Consortium	ORGANIZATION	0.99+
Texas Tech	ORGANIZATION	0.99+
one second	QUANTITY	0.99+
Austin	LOCATION	0.99+
Texas	LOCATION	0.99+
thousands	QUANTITY	0.99+
University of Texas	ORGANIZATION	0.99+
Palo Alto, California	LOCATION	0.99+
first	QUANTITY	0.99+
HPC	ORGANIZATION	0.99+
AI Innovation Lab	ORGANIZATION	0.99+
University of North Texas	ORGANIZATION	0.99+
PowerEdge	ORGANIZATION	0.99+
two years ago	DATE	0.99+
White House COVID Consortium	ORGANIZATION	0.99+
more than 20,000	QUANTITY	0.99+
10 years ago	DATE	0.98+
Dell Technologies	ORGANIZATION	0.98+
Texas Advanced Computing Center	ORGANIZATION	0.98+
more than 800	QUANTITY	0.98+
one	QUANTITY	0.98+
both	QUANTITY	0.98+
dozens	QUANTITY	0.97+
PowerEdge 6525	COMMERCIAL_ITEM	0.97+
one calculation	QUANTITY	0.96+
MD Anderson Healthcare Center	ORGANIZATION	0.95+
top 10	QUANTITY	0.95+
first responders	QUANTITY	0.95+
One	QUANTITY	0.94+
AMD	ORGANIZATION	0.93+
HIV	OTHER	0.92+
Core Compute	ORGANIZATION	0.92+
over two years	QUANTITY	0.89+
Lonestar	ORGANIZATION	0.88+
last 10 years	DATE	0.88+
every second	QUANTITY	0.88+
Gulf Oil spill	EVENT	0.87+
Almost 200	QUANTITY	0.87+
a hundred million years	QUANTITY	0.87+
Lonestar6	COMMERCIAL_ITEM	0.86+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for Gulf Oil spill: