Rajesh Pohani and Dan Stanzione | CUBE Conversation, February 2022
(contemplative upbeat music) >> Hello and welcome to this CUBE Conversation. I'm John Furrier, your host of theCUBE, here in Palo Alto, California. Got a great topic on expanding capabilities for urgent computing. Dan Stanzione, he's Executive Director of TACC, the Texas Advanced Computing Center, and Rajesh Pohani, VP of PowerEdge, HPC Core Compute at Dell Technologies. Gentlemen, welcome to this CUBE Conversation. >> Thanks, John. >> Thanks, John, good to be here. >> Rajesh, you got a lot of computing in PowerEdge, HPC, Core Computing. I mean, I get a sense that you love compute, so we'll jump right into it. And of course, I got to love TACC, Texas Advanced Computing Center. I can imagine a lot of stuff going on there. Let's start with TACC. What is the Texas Advanced Computing Center? Tell us a little bit about that. >> Yeah, we're part of the University of Texas at Austin here, and we build large-scale supercomputers, data systems, AI systems, to support open science research. And we're mainly funded by the National Science Foundation, so we support research projects in all fields of science, all around the country and around the world. Actually, several thousand projects at the moment. >> But tied to the university, got a lot of gear, got a lot of compute, got a lot of cool stuff going on. What's the coolest thing you got going on right now? >> Well, for me, it's always the next machine, but I think science-wise, it's the machines we have. We just finished deploying Lonestar6, which is our latest supercomputer, in conjunction with Dell. A little over 600 nodes of those PowerEdge servers that Rajesh builds for us. Which makes more than 20,000 that we've had here over the years, of those boxes. But that one just went into production. We're designing new systems for a few years from now, where we'll be even larger. Our Frontera system was top five in the world two years ago, just fell out of the top 10. So we've got to fix that and build the new top-10 system sometime soon. We always have a ton going on in large-scale computing. >> Well, I want to get to the Lonestar6 in a minute, on the next talk track, but... What are some of the areas that you guys are working on that are making an impact? Take us through, and we talked before we came on camera about, obviously, the academic affiliation, but also there's a real societal impact of the work you're doing. What are some of the key areas that the TACC is making an impact? >> So there's really a huge range from new microprocessors, new materials design, photovoltaics, climate modeling, basic science and astrophysics, and quantum mechanics, and things like that. But I think the nearest-term impacts that people see are what we call urgent computing, which is one of the drivers around Lonestar and some other recent expansions that we've done. And that's things like, there's a hurricane coming, exactly where is it going to land? Can we refine the area where there's going to be either high winds or storm surge? Can we assess the damage from digital imagery afterwards? Can we direct first responders in the optimal routes? Similarly for earthquakes, and a lot recently, as you might imagine, around COVID. In 2020, we moved almost a third of our resources to doing COVID work, full-time. >> Rajesh, I want to get your thoughts on this, because Dave Vellante and I have been talking about this on theCUBE recently, a lot. Obviously, people see what cloud's, going on with the cloud technology, but compute and on-premises, private cloud's been growing. If you look at the hyperscale on-premises and the edge, if you include that in, you're seeing a lot more user consumption on-premises, and now, with 5G, you got edge, you mentioned first responders, Dan. This is now pointing to a new architectural shift. As the VP of PowerEdge and HPC and Core Compute, you got to look at this and go, "Hmm." If Compute's going to be everywhere, and in locations, you got to have that compute. How does that all work together? And how do you do advanced computing, when you have these urgent needs, as well as real-time in a new architecture? >> Yeah, John, I mean, it's a pretty interesting time when you think about some of the changing dynamics and how customers are utilizing Compute in the compute needs in the industry. Seeing a couple of big trends. One, the distribution of Compute outside of the data center, 5G is really accelerating that, and then you're generating so much data, whether what you do with it, the insights that come out of it, that we're seeing more and more push to AI, ML, inside the data center. Dan mentioned what he's doing at TACC with computational analysis and some of the work that they're doing. So what you're seeing is, now, this push that data in the data center and what you do with it, while data is being created out at the edge. And it's actually this interesting dichotomy that we're beginning to see. Dan mentioned some of the work that they're doing in medical and on COVID research. Even at Dell, we're making cycles available for COVID research using our Zenith cluster, that's located in our HPC and AI Innovation Lab. And we continue to partner with organizations like TACC and others on research activities to continue to learn about the virus, how it mutates, and then how you treat it. So if you think about all the things, and data that's getting created, you're seeing that distribution and it's really leading to some really cool innovations going forward. >> Yeah, I want to get to that COVID research, but first, you mentioned a few words I want to get out there. You mentioned Lonestar6. Okay, so first, what is Lonestar6, then we'll get into the system aspect of it. Take us through what that definition is, what is Lonestar6? >> Well, as Dan mentioned, Lonestar6 is a Dell technology system that we developed with TACC, it's located at the University of Texas at Austin. It consists of more than 800 Dell PowerEdge 6525 servers that are powered with 3rd Generation AMD EPYC processors. And just to give you an example of the scale of this cluster, it could perform roughly three quadrillion operations per second. That's three petaFLOPS, and to match what Lonestar6 can compute in one second, a person would have to do one calculation every second for a hundred million years. So it's quite a good-size system, and quite a powerful one as well. >> Dan, what's the role that the system plays, you've got petaFLOPS, what, three petaFLOPS, you mentioned? That's a lot of FLOPS! So obviously urgent computing, what's cranking through the system there? Take us through, what's it like? >> Sure, well, there there's a mix of workloads on it, and on all our systems. So there's the urgent computing work, right? Fast turnaround, near real-time, whether it's COVID research, or doing... Project now where we bring in MRI data and are doing sort of patient-specific dosing for radiation treatments and chemotherapy, tailored to your tumor, instead of just the sort of general for people your size. That all requires sort of real-time turnaround. There's a lot AI research going on now, we're incorporating AI in traditional science and engineering research. And that uses an awful lot of data, but also consumes a huge amount of cycles in training those models. And then there's all of our traditional, simulation-based workloads and materials and digital twins for aircraft and aircraft design, and more efficient combustion in more efficient photovoltaic materials, or photovoltaic materials without using as much lead, and things like that. And I'm sure I'm missing dozens of other topics, 'cause, like I said, that one really runs every field of science. We've really focused the Lonestar line of systems, and this is obviously the sixth one we built, around our sort of Texas-centric users. It's the UT Austin users, and then with contributions from Texas A&M , and Texas Tech and the University of Texas system, MD Anderson Healthcare Center, the University of North Texas. So users all around the state, and every research problem that you might imagine, those are into. We're just ramping up a project in disaster information systems, that's looking at the probabilities of flooding in coastal Texas and doing... Can we make building code changes to mitigate impact? Do we have to change the standard foundation heights for new construction, to mitigate the increasing storm surges from these sort of slow storms that sit there and rain, like hurricanes didn't used to, but seem to be doing more and more. All those problems will run on Lonestar, and on all the systems to come, yeah. >> It's interesting, you mentioned urgent computing, I love that term because it could be an event, it could be some slow kind of brewing event like that rain example you mentioned. It could also be, obviously, with the healthcare, and you mentioned COVID earlier. These are urgent, societal challenges, and having that available, the processing capability, the compute, the data. You mentioned digital twins. I can imagine all this new goodness coming from that. Compare that, where we were 10 years ago. I mean, just from a mind-blowing standpoint, you have, have come so far, take us through, try to give a context to the level of where we are now, to do this kind of work, and where we were years ago. Can you give us a feel for that? >> Sure, there's a lot of ways to look at that, and how the technology's changed, how we operate around those things, and then sort of what our capabilities are. I think one of the big, first, urgent computing things for us, where we sort of realized we had to adapt to this model of computing was about 15 years ago with the big BP Gulf Oil spill. And suddenly, we were dumping thousands of processors of load to figure out where that oil spill was going to go, and how to do mitigation, and what the potential impacts were, and where you need to put your containment, and things like that. And it was, well, at that point we thought of it as sort of a rare event. There was another one, that I think was the first real urgent computing one, where the space shuttle was in orbit, and they knew something had hit it during takeoff. And we were modeling, along with NASA and a bunch of supercomputers around the world, the heat shield and could they make reentry safely? You have until they come back to get that problem done, you don't have months or years to really investigate that. And so, what we've sort of learned through some of those, the Japanese tsunami was another one, there have been so many over the years, is that one, these sort of disasters are all the time, right? One thing or another, right? If we're not doing hurricanes, we're doing wildfires and drought threat, if it's not COVID. We got good and ready for COVID through SARS and through the swine flu and through HIV work, and things like that. So it's that we can do the computing very fast, but you need to know how to do the work, right? So we've spent a lot of time, not only being able to deliver the computing quickly, but having the data in place, and having the code in place, and having people who know the methods who know how to use big computers, right? That's been a lot of what the COVID Consortium, the White House COVID Consortium, has been about over the last few years. And we're actually trying to modify that nationally into a strategic computing reserve, where we're ready to go after these problems, where we've run drills, right? And if there's a, there's a train that derails, and there's a chemical spill, and it's near a major city, we have the tools and the data in place to do wind modeling, and we have the terrain ready to go. And all those sorts of things that you need to have to be ready. So we've really sort of changed our sort of preparedness and operational model around urgent computing in the last 10 years. Also, just the way we scheduled the system, the ability to sort of segregate between these long-running workflows for things that are really important, like we displaced a lot of cancer research to do COVID research. And cancer's still important, but it's less likely that we're going to make an impact in the next two months, right? So we have to shuffle how we operate things and then just, having all that additional capacity. And I think one of the things that's really changed in the models is our ability to use AI, to sort of adroitly steer our simulations, or prune the space when we're searching parameters for simulations. So we have the operational changes, the system changes, and then things like adding AI on the scientific side, since we have the capacity to do that kind of things now, all feed into our sort of preparedness for this kind of stuff. >> Dan, you got me sold, I want to come work with you. Come on, can I join the team over there? It sounds exciting. >> Come on down! We always need good folks around here, so. (laughs) >> Rajesh, when I- >> Almost 200 now, and we're always growing. >> Rajesh, when I hear the stories about kind of the evolution, kind of where the state of the art is, you almost see the innovation trajectory, right? The growth and the learning, adding machine learning only extends out more capabilities. But also, Dan's kind of pointing out this kind of response, rapid compute engine, that they could actually deploy with learnings, and then software, so is this a model where anyone can call up and get some cycles to, say, power an autonomous vehicle, or, hey, I want to point the machinery and the cycles at something? Is the service, do you guys see this going that direction, or... Because this sounds really, really good. >> Yeah, I mean, one thing that Dan talked about was, it's not just the compute, it's also having the right algorithms, the software, the code, right? The ability to learn. So I think when those are set up, yeah. I mean, the ability to digitally simulate in any number of industries and areas, advances the pace of innovation, reduces the time to market of whatever a customer is trying to do or research, or even vaccines or other healthcare things. If you can reduce that time through the leverage of compute on doing digital simulations, it just makes things better for society or for whatever it is that we're trying to do, in a particular industry. >> I think the idea of instrumenting stuff is here forever, and also simulations, whether it's digital twins, and doing these kinds of real-time models. Isn't really much of a guess, so I think this is a huge, historic moment. But you guys are pushing the envelope here, at University of Texas and at TACC. It's not just research, you guys got real examples. So where do you guys see this going next? I see space, big compute areas that might need some data to be cranked out. You got cybersecurity, you got healthcare, you mentioned oil spill, you got oil and gas, I mean, you got industry, you got climate change. I mean, there's so much to tackle. What's next? >> Absolutely, and I think, the appetite for computing cycles isn't going anywhere, right? And it's only going to, it's going to grow without bound, essentially. And AI, while in some ways it reduces the amount of computing we do, it's also brought this whole new domain of modeling to a bunch of fields that weren't traditionally computational, right? We used to just do engineering, physics, chemistry, were all super computational, but then we got into genome sequencers and imaging and a whole bunch of data, and that made biology computational. And with AI, now we're making things like the behavior of human society and things, computational problems, right? So there's this sort of growing amount of workload that is, in one way or another, computational, and getting bigger and bigger. So that's going to keep on growing. I think the trick is not only going to be growing the computation, but growing the software and the people along with it, because we have amazing capabilities that we can bring to bear. We don't have enough people to hit all of them at once. And so, that's probably going to be the next frontier in growing out both our AI and simulation capability, is the human element of it. >> It's interesting, when you think about society, right? If the things become too predictable, what does a democracy even look like? If you know the election's going to be over two years from now in the United States, or you look at these major, major waves >> Human companies don't know. >> of innovation, you say, "Hmm." So it's democracy, AI, maybe there's an algorithm for checking up on the AI 'cause biases... So, again, there's so many use cases that just come out of this. It's incredible. >> Yeah, and bias in AI is something that we worry about and we work on, and on task forces where we're working on that particular problem, because the AI is going to take... Is based on... Especially when you look at a deep learning model, it's 100% a product of the data you show it, right? So if you show it a biased data set, it's going to have biased results. And it's not anything intrinsic about the computer or the personality, the AI, it's just data mining, right? In essence, right, it's learning from data. And if you show it all images of one particular outcome, it's going to assume that's always the outcome, right? It just has no choice, but to see that. So how we deal with bias, how do we deal with confirmation, right? I mean, in addition, you have to recognize, if you haven't, if it gets data it's never seen before, how do you know it's not wrong, right? So there's about data quality and quality assurance and quality checking around AI. And that's where, especially in scientific research, we use what's starting to be called things like physics-informed or physics-constrained AI, where the neural net that you're using to design an aircraft still has to follow basic physical laws in its output, right? Or if you're doing some materials or astrophysics, you still have to obey conservation of mass, right? So I can't say, well, if you just apply negative mass on this other side and positive mass on this side, everything works out right for stable flight. 'Cause we can't do negative mass, right? So you have to constrain it in the real world. So this notion of how we bring in the laws of physics and constrain your AI to what's possible is also a big part of the sort of AI research going forward. >> You know, Dan, you just, to me just encapsulate the science that's still out there, that's needed. Computer science, social science, material science, kind of all converging right now. >> Yeah, engineering, yeah, >> Engineering, science, >> slipstreams, >> it's all there, >> physics, yeah, mmhmm. >> it's not just code. And, Rajesh, data. You mentioned data, the more data you have, the better the AI. We have a world what's going from silos to open control planes. We have to get to a world. This is a cultural shift we're seeing, what's your thoughts? >> Well, it is, in that, the ability to drive predictive analysis based on the data is going to drive different behaviors, right? Different social behaviors for cultural impacts. But I think the point that Dan made about bias, right, it's only as good as the code that's written and the way that the data is actually brought into the system. So making sure that that is done in a way that generates the right kind of outcome, that allows you to use that in a predictive manner, becomes critically important. If it is biased, you're going to lose credibility in a lot of that analysis that comes out of it. So I think that becomes critically important, but overall, I mean, if you think about the way compute is, it's becoming pervasive. It's not just in selected industries as damage, and it's now applying to everything that you do, right? Whether it is getting you more tailored recommendations for your purchasing, right? You have better options that way. You don't have to sift through a lot of different ideas that, as you scroll online. It's tailoring now to some of your habits and what you're looking for. So that becomes an incredible time-saver for people to be able to get what they want in a way that they want it. And then you look at the way it impacts other industries and development innovation, and it just continues to scale and scale and scale. >> Well, I think the work that you guys are doing together is scratching the surface of the future, which is digital business. It's about data, it's about out all these new things. It's about advanced computing meets the right algorithms for the right purpose. And it's a really amazing operation you guys got over there. Dan, great to hear the stories. It's very provocative, very enticing to just want to jump in and hang out. But I got to do theCUBE day job here, but congratulations on success. Rajesh, great to see you and thanks for coming on theCUBE. >> Thanks for having us, John. >> Okay. >> Thanks very much. >> Great conversation around urgent computing, as computing becomes so much more important, bigger problems and opportunities are around the corner. And this is theCUBE, we're documenting it all here. I'm John Furrier, your host. Thanks for watching. (contemplative music)
SUMMARY :
the Texas Advanced Computing Center, good to be here. And of course, I got to love TACC, and around the world. What's the coolest thing and build the new top-10 of the work you're doing. in the optimal routes? and now, with 5G, you got edge, and some of the work that they're doing. but first, you mentioned a few of the scale of this cluster, and on all the systems to come, yeah. and you mentioned COVID earlier. in the models is our ability to use AI, Come on, can I join the team over there? Come on down! and we're always growing. Is the service, do you guys see this going I mean, the ability to digitally simulate So where do you guys see this going next? is the human element of it. of innovation, you say, "Hmm." the AI is going to take... You know, Dan, you just, the more data you have, the better the AI. and the way that the data Rajesh, great to see you are around the corner.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Dan | PERSON | 0.99+ |
Dan Stanzione | PERSON | 0.99+ |
Dave Vellante | PERSON | 0.99+ |
Rajesh | PERSON | 0.99+ |
John | PERSON | 0.99+ |
Rajesh Pohani | PERSON | 0.99+ |
National Science Foundation | ORGANIZATION | 0.99+ |
TACC | ORGANIZATION | 0.99+ |
John Furrier | PERSON | 0.99+ |
Texas A&M | ORGANIZATION | 0.99+ |
February 2022 | DATE | 0.99+ |
NASA | ORGANIZATION | 0.99+ |
100% | QUANTITY | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
Texas Advanced Computing Center | ORGANIZATION | 0.99+ |
United States | LOCATION | 0.99+ |
2020 | DATE | 0.99+ |
COVID Consortium | ORGANIZATION | 0.99+ |
Texas Tech | ORGANIZATION | 0.99+ |
one second | QUANTITY | 0.99+ |
Austin | LOCATION | 0.99+ |
Texas | LOCATION | 0.99+ |
thousands | QUANTITY | 0.99+ |
University of Texas | ORGANIZATION | 0.99+ |
Palo Alto, California | LOCATION | 0.99+ |
first | QUANTITY | 0.99+ |
HPC | ORGANIZATION | 0.99+ |
AI Innovation Lab | ORGANIZATION | 0.99+ |
University of North Texas | ORGANIZATION | 0.99+ |
PowerEdge | ORGANIZATION | 0.99+ |
two years ago | DATE | 0.99+ |
White House COVID Consortium | ORGANIZATION | 0.99+ |
more than 20,000 | QUANTITY | 0.99+ |
10 years ago | DATE | 0.98+ |
Dell Technologies | ORGANIZATION | 0.98+ |
Texas Advanced Computing Center | ORGANIZATION | 0.98+ |
more than 800 | QUANTITY | 0.98+ |
one | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
dozens | QUANTITY | 0.97+ |
PowerEdge 6525 | COMMERCIAL_ITEM | 0.97+ |
one calculation | QUANTITY | 0.96+ |
MD Anderson Healthcare Center | ORGANIZATION | 0.95+ |
top 10 | QUANTITY | 0.95+ |
first responders | QUANTITY | 0.95+ |
One | QUANTITY | 0.94+ |
AMD | ORGANIZATION | 0.93+ |
HIV | OTHER | 0.92+ |
Core Compute | ORGANIZATION | 0.92+ |
over two years | QUANTITY | 0.89+ |
Lonestar | ORGANIZATION | 0.88+ |
last 10 years | DATE | 0.88+ |
every second | QUANTITY | 0.88+ |
Gulf Oil spill | EVENT | 0.87+ |
Almost 200 | QUANTITY | 0.87+ |
a hundred million years | QUANTITY | 0.87+ |
Lonestar6 | COMMERCIAL_ITEM | 0.86+ |
Bala Chandrasekaran, Dell EMC | Dell EMC: Get Ready For AI
(techno music) >> Hey welcome back everybody, Jeff Frick here with theCUBE. We're in Austin, Texas at the Dell EMC HPC and AI Innovation Lab. As you can see behind me, there's racks and racks and racks of gear, where they build all types of vessel configurations around specific applications, whether its Oracle or S.A.P. And more recently a lot more around artificial intelligence, whether it's machine learning, deep learning, so it's a really cool place to be. We're excited to be here. And our next guest is Bala Chandrasekaran. He is in the technical staff as a systems engineer. Bala, welcome! >> Thank you. >> So how do you like playing with all these toys all day long? >> Oh I love it! >> I mean you guys have literally everything in there. A lot more than just Dell EMC gear, but you've got switches and networking gear-- >> Right. >> Everything. >> And not just the gear, it's also all the software components, it's the deep learning libraries, deep learning models, so a whole bunch of things that we can get to play around with. >> Now that's interesting 'cause it's harder to see the software, right? >> Exactly right. >> The software's pumping through all these machines but you guys do all types of really, optimization and configuration, correct? >> Yes, we try to make it easy for the end customer. And the project that I'm working on, machine learning for Hadoop, we try to make things easy for the data scientists. >> Right, so we got all the Hadoop shows, Hadoop World, Hadoop Summit, Strata, Big Data NYC, Silicone Valley, and the knock on Hadoop is always it's too hard, there aren't enough engineers, I can't get enough people to do it myself. It's a cool open source project, but it's not that easy to do. You guys are really helping people solve that problem. >> Yes and what you're saying is true for the infrastructure guys. Now imagine a data scientist, right? So Hadoop cluster accessing it, securing it, is going to be really tough for them. And they shouldn't be worried about it. Right? They should be focused on data science. So those are some of the things that we try to do for them. >> So what are some of the tips and tricks as you build these systems that throw people off all the time that are relatively simple things to fix? And then what are some of the hard stuff where you guys have really applied your expertise to get over those challenges? >> Let me give you a small example. So this is a new project A.I. we hired data scientists. So I walk the data scientist through the lab. He looked at all he cluster and he pulled me aside and said hey you're not going to ask me to work on these things, right? I have no idea how to do these things. So that kind of gives you a sense of what a data scientist should focus on and what what they shouldn't focus on. So some of the things that we do, and some of the things that are probably difficult for them is all the libraries that are needed to run their project, the conflicts between libraries, the dependencies between them. So one of the things that we do deliver this pre-configured engine that you can readily download into our product and run. So data scientist don't have to worry about what library I should use. >> Right. >> They have to worry about the models and accuracy and whatever data science needs to be done, rather than focusing on the infrastructure. >> So you not only package the hardware and the systems, but you've packaged the software distribution and all the kind of surrounding components of that as well. >> Exactly right. Right. >> So when you have the data scientists here talking about the Hadoop cluster, if they didn't want to talk about the hardware and the software, what were you helping them with? How did you engage with the customers here at the lab? >> So the example that I gave is for the data scientist that we newly hired for our team so we had to set up environments for them. so that was the example, but the same thing applies for a customer as well. So again to help them in solving the problem we tried to package some of the things as part of our product and deliver it to them so it's easy for them to deploy and get started on things. >> Now the other piece that's included and again is not in this room is the services -- >> Right. >> And the support so you guys have a full team of professional services. Once you configure and figure out what the optimum solution is for them then you got a team that can actually go deploy it at their actual site. >> So we have packaged things even for our services. So the services would go to the customer side. They would apply the solution and download and deploy our packages and be able to demonstrate how easy it is to think of them as tutorials if you like. So here are the tutorials. Here's how you run various models. So here's how easy it is for you to get started. So that's what they would train the customer on. So there's not just the deployment piece of it but just packaging things for them so they can show customers how to get started quickly, how everything works and kind of of give a green check mark if you will. >> So what are some of your favorite applications that people are using these things for? Do you get involved in the applications stack on the customer side? What are some of the fun use cases that people use in your technology to solve? >> So for the application my project is about mission learning on Hadoop via packaging Cloudera's CDSW that's Cloudera Data Science Workbench as part of the product. So that allows data science access to the Hadoop cluster and abstracting the complexities of the cluster. So they can access the cluster. They can access the data. They can have security without worrying about all the intricacies of the cluster. In addition to that they can create different projects, have different libraries in different projects. So they don't have to conflict with each other and also they can add users to it. They can work collaboratively. So basically choose to help data scientists, software developers, do their job and not worry about the infrastructure. >> Right. >> They should not be. >> Right great. Well Bala it's pretty exciting place to work. I'm sure you're having a ball. >> Yes I am thank you. >> All right. Well thanks for taking a few minutes with us and really enjoyed the conversation. >> I appreciate it thank you. All right he's Bala. I'm Jeff. You're watching theCUBE from Austin, Texas at the Dell EMC High Performance Computing and Artificial Intelligence Labs. Thanks for watching. (techno music)
SUMMARY :
He is in the technical staff as a systems engineer. I mean you guys have literally everything in there. And not just the gear, And the project that I'm working on, but it's not that easy to do. So those are some of the things that we try to do for them. So some of the things that we do, They have to worry about the models and accuracy and all the kind of surrounding components of that as well. Right. So the example that I gave is for the data scientist And the support so you guys So the services would go to the customer side. So for the application my project is about Well Bala it's pretty exciting place to work. All right. at the Dell EMC High Performance Computing
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Frick | PERSON | 0.99+ |
Bala Chandrasekaran | PERSON | 0.99+ |
Oracle | ORGANIZATION | 0.99+ |
Jeff | PERSON | 0.99+ |
Bala | PERSON | 0.99+ |
Austin, Texas | LOCATION | 0.99+ |
AI Innovation Lab | ORGANIZATION | 0.99+ |
one | QUANTITY | 0.98+ |
Dell EMC High Performance Computing | ORGANIZATION | 0.98+ |
Dell EMC | ORGANIZATION | 0.98+ |
Cloudera | ORGANIZATION | 0.97+ |
Dell EMC HPC | ORGANIZATION | 0.96+ |
Hadoop | TITLE | 0.95+ |
S.A.P. | ORGANIZATION | 0.94+ |
Artificial Intelligence Labs | ORGANIZATION | 0.87+ |
NYC | LOCATION | 0.85+ |
theCUBE | ORGANIZATION | 0.83+ |
Silicone Valley | LOCATION | 0.79+ |
Hadoop Summit | EVENT | 0.78+ |
Big Data | EVENT | 0.72+ |
Strata | EVENT | 0.58+ |
Hadoop World | EVENT | 0.44+ |
Hadoop | ORGANIZATION | 0.41+ |
Dell EMC AI Lab Tour | Dell EMC: Get Ready For AI
(upbeat music) >> Thank you for coming to the HBCN AI Innovation Lab. So, I'm sure that you've heard a lot of excitement in the industry about what we can do with AI and machine learning and deep learning. And our team in our lab has been building solutions for this space. So, very similar to what we do with our other solutions, including high performance computing where we take servers, storage, networking, software, and put it all together to build and design targeted solutions for a particular use case and then bring in services and support along with that, so that we have a complete product. That's what we're doing for the AI space, as well. So, whether we're doing with machine learning, algorithms and whether your data, say for example in Hadoop, or whether your doing deep learning, convolution neural networks, R&M. And no matter what technology you're using, right? So, you have different choices for compute, that those compute choices can be CPUs, GPUs, FPGAs, custom ASICs. There's all sorts of different choices for compute. Similarly you have a lot of different choices for networking, for storage, and your actual use case. Right, are you doing image recognition, fraud detection, what are you trying to do? So our goal is multiple form. First, we want to bring in all these new technologies, all these different technologies, see how they work well together. Specifically in the AI space, we want to make sure that we have the right software framework. Because of a big piece of putting these solutions together is making sure that your MXNet and CAP B, and Tensorflow, and all these frameworks are working well together, along with all these different neural network models. So putting all these things together are making sure that we can run standard benchmark datasets so we can do comparisons across configurations, and then as a result of all that work, share best practices and tuning. Including the storage piece as well. Our top 500 cluster is over here, so multiple racks, this is a cluster that is more that 500 servers today, so around 560 servers. And on the latest top 500 list, which is a list that's published twice a year of the 500 fastest supercomputers in the world. We started with a smaller number of CPUs. We had 128 servers. And then we added more servers, we swapped over to the next generation of CPUs, then we added even more servers, and now we have the latest generation Intel CPUs in this cluster. One of the questions we've been getting more and more, is what do you see with liquid cooling? So, Dell has had the capability to do liquid cooled systems for a while now, but we recently added this capability into factory as well. So you can order systems that are direct contact liquid cooled directly from factory. Let's compare the two, right? Right over here, you have an air cooled rack. Here we have the exact same configuration, so the same compute infrastructure, but with liquid cool. The CPU has a cold plate on it, and that's cooled with facilities water. So these pipes actually have water flowing through them, and so each sled has two pipes coming out of it, for the water loop, and these pipes from each server, each sled, go into these rack manifolds, and at the bottom of the rack over there, is where we have our heat exchanger. In our early studies, we have seen that, your efficiency in terms of how much performance you get out of the server, should not matter whether you're air cooled or liquid cooled, if you're air cooling solution can provide enough cooling for your components. So, what they means is, if you have a well air cooled solution, it's not going to perform any worse than a liquid cooled solution. What the liquid cooling allows you to do is in the same rack space, put in a higher level configuration, higher TDP processors, more disks, a configuration that you say cannot adequately air cool, that configuration in the same space in your data center with the same air flow, you will be able to liquid cool. The biggest advantage of liquid cooling today, is to do with PUE ratios. So how much of your infrastructure power are you using for compute and your infrastructure versus for cooling and power. This is production, this is part of the cluster. What we are doing right now is we are running rack level studies, right? So we've done single chassis studies in our thermal lab along with our thermal engineers on the advantages of liquid cooling and what we can do and how it works for our particular workloads. But now we have a rack level solution, and so we are running different types of workloads, manufacturing workloads, weather simulation, some AI workloads, standard high performance, linpack benchmarks, on an entire rack of liquid cooled, an entire rack of air cooled, all these racks have metered PDUs, where we can measure power, so we're going to measure power consumption as well, and then we have sensors which will allow us to measure temperature, and then we can tell you the whole story. And of course, we have a really, you know, phenomenal group of people in our thermal team, our architects, and we also have the ability to come in and evaluate a data center to see, does liquid cooling make sense for you today. It's not a one size fits all, and liquid cooling is what everybody must do and you must do it today, no. It's a, and that's the value of this lab, right? Actual quantitative results, for liquid cooling, for all our technologies, for all our solutions, so that we can give you the right configuration, right optimizations, with the data backing it up for the right decision for you, instead of forcing you into the one solution that we do have. So now we're actually standing right in the middle of our Zenith super computers, so all the racks around you are Zenith. You can hear that the noise level is higher, that's because this is one cluster, it's running workload right now, both from our team and our engineers, as well as from customers who can get access into the lab and run their workload. So that noise level you hear, is an actual super computer, we have C6420 servers in here today, with the Intel Xeon scalable family processors, and that's what you see in these racks behind you and in front of you. And this cluster is interconnected using the Omnipath interconnect. There are thousands and thousands of applications in the HPC space, and over the years we've added more and more capability. So today in the lab we do a lot of work with manufacturing applications, that's computational fluid dynamic, CFDs, CAE, structural mechanics, you know, things like that. We do a lot of work with life sciences, that's next generation sequencing applications, molecular dynamics, cryogenic electron microscopy, we do weather simulation applications, and a whole bunch more. Quantum chromo dynamics, we do a whole bunch of benchmarking of subsystems. So tests, for compute, for network, for memory, for storage, we do a lot of parify systems, and I/O tests, and when I talk about application benchmarking, we're doing that across different compute, network, and storage to see what the full picture looks like. The list that I've given you, is not a complete list. This switch is an Dell Network H-Series switch, which supports the Omnipath fabric, the Omnipath interconnect, that today runs at a hundred gigabits per second. What you have is all the clusters, all the Zenith servers in the lab, are connected to this switch. Because we started with a few number of servers and then scaled, we knew we were going to grow. We chose to start with a director class switch, which allowed us to add leaf modules as we grew. So the servers, the racks, that are closest to the switch have copper cables, the ones that are coming from across the lab have our fiber cables. So, you know, this switch is what allows us to call this HPC cluster, where we have a high-speed interconnect for our parallel and distributed computations, and a lot of our current deep learning work is being done on this cluster as well on the Intel Xeon side. (upbeat music)
SUMMARY :
and then we can tell you the whole story.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
thousands | QUANTITY | 0.99+ |
two pipes | QUANTITY | 0.99+ |
128 servers | QUANTITY | 0.99+ |
two | QUANTITY | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
each sled | QUANTITY | 0.99+ |
First | QUANTITY | 0.99+ |
One | QUANTITY | 0.99+ |
each server | QUANTITY | 0.98+ |
HBCN AI Innovation Lab | ORGANIZATION | 0.98+ |
one solution | QUANTITY | 0.98+ |
both | QUANTITY | 0.98+ |
Xeon | COMMERCIAL_ITEM | 0.97+ |
one cluster | QUANTITY | 0.97+ |
today | DATE | 0.96+ |
twice a year | QUANTITY | 0.96+ |
around 560 servers | QUANTITY | 0.96+ |
C6420 | COMMERCIAL_ITEM | 0.95+ |
Network H-Series | COMMERCIAL_ITEM | 0.95+ |
500 servers | QUANTITY | 0.95+ |
Intel | ORGANIZATION | 0.94+ |
500 fastest supercomputers | QUANTITY | 0.93+ |
Dell EMC | ORGANIZATION | 0.92+ |
single chassis | QUANTITY | 0.9+ |
Hadoop | TITLE | 0.9+ |
Omnipath | COMMERCIAL_ITEM | 0.81+ |
a hundred gigabits per second | QUANTITY | 0.79+ |
applications | QUANTITY | 0.76+ |
Tensorflow | TITLE | 0.71+ |
AI Lab Tour | EVENT | 0.67+ |
CAP | TITLE | 0.64+ |
500 | QUANTITY | 0.6+ |
one | QUANTITY | 0.56+ |
Zenith | ORGANIZATION | 0.55+ |
top 500 | QUANTITY | 0.54+ |
MXNet | TITLE | 0.5+ |
Zenith | COMMERCIAL_ITEM | 0.46+ |
Omnipath | ORGANIZATION | 0.36+ |
Michael Bennett, Dell EMC | Dell EMC: Get Ready For AI
(energetic electronic music) >> Hey, welcome back everybody. Jeff Frick here with The Cube. We're in a very special place. We're in Austin, Texas at the Dell EMC HPC and AI Innovation Lab. High performance computing, artificial intelligence. This is really where it all happens. Where the engineers at Dell EMC are putting together these ready-made solutions for the customers. They got every type of application stack in here, and we're really excited to have our next guest. He's right in the middle of it, he's Michael Bennett, Senior Principal Engineer for Dell EMC. Mike, great to see you. >> Great to see you too. >> So you're working on one particular flavor of the AI solutions, and that's really machine learning with Hadoop. So tell us a little bit about that. >> Sure yeah, the product that I work on is called the Ready Solution for AI Machine Learning with Hadoop, and that product is a Cloudera Hadoop distribution on top of our Dell powered servers. And we've partnered with Intel, who has released a deep learning library, called Big DL, to bring both the traditional machine learning capabilities as well as deep learning capabilities to the product. Product also adds a data science workbench that's released by Cloudera. And this tool allows the customer's data scientists to collaborate together, provides them secure access to the Hadoop cluster, and we think all-around makes a great product to allow customers to gain the power of machine learning and deep learning in their environment, while also kind of reducing some of those overhead complexities that IT often faces with managing multiple environments, providing secure access, things like that. >> Right, cause the big knock always on Hadoop is that it's just hard. It's hard to put in, there aren't enough people, there aren't enough experts. So you guys are really offering a pre-bundled solution that's ready to go? >> Correct, yeah. We've built seven or eight different environments going in the lab at any time to validate different hardware permutations that we may offer of the product as well as, we've been doing this since 2009, so there's a lot of institutional knowledge here at Dell to draw on when building and validating these Hadoop products. Our Dell services team has also been going out installing and setting these up, and our consulting services has been helping customers fit the Hadoop infrastructure into their IT model. >> Right, so is there one basic configuration that you guys have? Or have you found there's two or three different standard-use cases that call for two or three different kinds of standardized solutions? >> We find that most customers are preferring the R7-40XC series. This platform can hold 12 3 1/2" form-factor drives in the front, along with four in the mid-plane, while still providing four SSDs in the back. So customers get a lot of versatility with this. It's also won several Hadoop benchmarking awards. >> And do you find, when you're talking to customers or you're putting this together, that they've tried themselves and they've tried to kind of stitch together and cobble together the open-source proprietary stuff all the way down to network cards and all this other stuff to actually make the solution come together? And it's just really hard, right? >> Yeah, right exactly. What we hear over and over from our product management team is that their interactions with customers, come back with customers saying it's just too hard. They get something that's stable and they come back and they don't know why it's no longer working. They have customized environments that each developer wants for their big data analytics jobs. Things like that. So yeah, overall we're hearing that customers are finding it very complex. >> Right, so we hear time and time again that same thing. And even though we've been going to Hadoop Summit and Hadoop World and Stratus, since 2010. The momentum seems to be a little slower in terms of the hype, but now we're really moving into heavy-duty real time production and that's what you guys are enabling with this ready-made solution. >> So with this product, yeah, we focused on enabling Apache Spark on the Hadoop environment. And that Apache Spark distributed computing has really changed the game as far as what it allows customers to do with their analytics jobs. No longer are we writing things to disc, but multiple transformations are being performed in memory, and that's also a big part of what enables the big DL library that Intel released for the platform to train these deep-learning models. >> Right, cause the Sparks enables the real-time analytics, right? Now you've got streaming data coming into this thing, versus the batch which was kind of the classic play of Hadoop. >> Right and not only do you have streaming data coming in, but Spark also enables you to load your data in memory and perform multiple operations on it. And draw insights that maybe you couldn't before with traditional map-reduce jobs. >> Right, right. So what gets you excited to come to work every day? You've been playing with these big machines. You're in the middle of nerd nirvana I think-- >> Yeah exactly. >> With all of the servers and spin-discs. What gets you up in the morning? What are you excited about, as you see AI get more pervasive within the customers and the solutions that you guys are enabling? >> You know, for me, what's always exciting is trying new things. We've got this huge lab environment with all kinds of lab equipment. So if you want to test a new iteration, let's say tiered HGFS storage with SSDs and traditional hard drives, throw it together in a couple of hours and see what the results are. If we wanted to add new PCIE devices like FPGAs for the inference portion the deep-learning development we can put those in our servers and try them out. So I enjoy that, on top of the validated, thoroughly-worked-through solutions that we offer customers, we can also experiment, play around, and work towards that next generation of technology. >> Right, 'cause any combination of hardware that you basically have at your disposal to try together and test and see what happens? >> Right, exactly. And this is my first time actually working at a OEM, and so I was surprised, not only do we have access to anything that you can see out in the market, but we often receive test and development equipment from partners and vendors, that we can work with and collaborate with to ensure that once the product reaches market it has the features that customers need. >> Right, what's the one thing that trips people up the most? Just some simple little switch configuration that you think is like a minor piece of something, that always seems to get in the way? >> Right, or switches in general. I think that people focus on the application because the switch is so abstracted from what the developer or even somebody troubleshooting the system sees, that oftentimes some misconfiguration or some typo that was entered during the switch configuration process that throws customers off or has somebody scratching their head, wondering why they're not getting the kind of performance that they thought. >> Right, well that's why we need more automation, right? That's what you guys are working on. >> Right yeah exactly. >> Keep the fat-finger typos out of the config settings. >> Right, consistent reproducible. None of that, I did it yesterday and it worked I don't know what changed. >> Right, alright Mike. Well thanks for taking a few minutes out of your day, and don't have too much fun playing with all this gear. >> Awesome, thanks for having me. >> Alright, he's Mike Bennett and I'm Jeff Frick. You're watching The Cube, from Austin Texas at the Dell EMC High Performance Computing and AI Labs. Thanks for watching. (energetic electronic music)
SUMMARY :
at the Dell EMC HPC and AI Innovation Lab. of the AI solutions, and that's really that IT often faces with managing multiple environments, Right, cause the big knock always on Hadoop going in the lab at any time to validate in the front, along with four in the mid-plane, is that their interactions with customers, and that's what you guys are enabling has really changed the game as far as what it allows Right, cause the Sparks enables And draw insights that maybe you couldn't before You're in the middle of nerd nirvana I think-- that you guys are enabling? for the inference portion the deep-learning development that you can see out in the market, the kind of performance that they thought. That's what you guys are working on. Right, consistent reproducible. and don't have too much fun playing with all this gear. at the Dell EMC High Performance Computing and AI Labs.
SENTIMENT ANALYSIS :
ENTITIES
Entity | Category | Confidence |
---|---|---|
Jeff Frick | PERSON | 0.99+ |
Michael Bennett | PERSON | 0.99+ |
two | QUANTITY | 0.99+ |
Mike Bennett | PERSON | 0.99+ |
Dell | ORGANIZATION | 0.99+ |
seven | QUANTITY | 0.99+ |
Mike | PERSON | 0.99+ |
Dell EMC | ORGANIZATION | 0.99+ |
The Cube | TITLE | 0.99+ |
yesterday | DATE | 0.99+ |
2010 | DATE | 0.99+ |
Austin, Texas | LOCATION | 0.98+ |
both | QUANTITY | 0.98+ |
Austin Texas | LOCATION | 0.98+ |
Spark | TITLE | 0.98+ |
2009 | DATE | 0.98+ |
R7-40XC | COMMERCIAL_ITEM | 0.98+ |
Intel | ORGANIZATION | 0.98+ |
each developer | QUANTITY | 0.98+ |
AI Innovation Lab | ORGANIZATION | 0.97+ |
Hadoop | TITLE | 0.97+ |
first time | QUANTITY | 0.96+ |
Dell EMC High Performance Computing | ORGANIZATION | 0.96+ |
four | QUANTITY | 0.95+ |
one | QUANTITY | 0.94+ |
Apache | ORGANIZATION | 0.94+ |
one thing | QUANTITY | 0.93+ |
The Cube | ORGANIZATION | 0.92+ |
12 3 1/2" | QUANTITY | 0.92+ |
Dell EMC HPC | ORGANIZATION | 0.9+ |
three different standard-use cases | QUANTITY | 0.9+ |
eight different environments | QUANTITY | 0.89+ |
three different | QUANTITY | 0.88+ |
Stratus | ORGANIZATION | 0.83+ |
Hadoop World | ORGANIZATION | 0.79+ |
one basic configuration | QUANTITY | 0.76+ |
AI Labs | ORGANIZATION | 0.74+ |
four SSDs | QUANTITY | 0.73+ |
Cloudera | TITLE | 0.71+ |
Hadoop Summit | EVENT | 0.69+ |
hours | QUANTITY | 0.67+ |
Hadoop benchmarking awards | TITLE | 0.67+ |
Sparks | COMMERCIAL_ITEM | 0.48+ |
Hadoop | COMMERCIAL_ITEM | 0.34+ |