Dell EMC AI Lab Tour | Dell EMC: Get Ready For AI

(upbeat music) >> Thank you for coming to the HBCN AI Innovation Lab. So, I'm sure that you've heard a lot of excitement in the industry about what we can do with AI and machine learning and deep learning. And our team in our lab has been building solutions for this space. So, very similar to what we do with our other solutions, including high performance computing where we take servers, storage, networking, software, and put it all together to build and design targeted solutions for a particular use case and then bring in services and support along with that, so that we have a complete product. That's what we're doing for the AI space, as well. So, whether we're doing with machine learning, algorithms and whether your data, say for example in Hadoop, or whether your doing deep learning, convolution neural networks, R&M. And no matter what technology you're using, right? So, you have different choices for compute, that those compute choices can be CPUs, GPUs, FPGAs, custom ASICs. There's all sorts of different choices for compute. Similarly you have a lot of different choices for networking, for storage, and your actual use case. Right, are you doing image recognition, fraud detection, what are you trying to do? So our goal is multiple form. First, we want to bring in all these new technologies, all these different technologies, see how they work well together. Specifically in the AI space, we want to make sure that we have the right software framework. Because of a big piece of putting these solutions together is making sure that your MXNet and CAP B, and Tensorflow, and all these frameworks are working well together, along with all these different neural network models. So putting all these things together are making sure that we can run standard benchmark datasets so we can do comparisons across configurations, and then as a result of all that work, share best practices and tuning. Including the storage piece as well. Our top 500 cluster is over here, so multiple racks, this is a cluster that is more that 500 servers today, so around 560 servers. And on the latest top 500 list, which is a list that's published twice a year of the 500 fastest supercomputers in the world. We started with a smaller number of CPUs. We had 128 servers. And then we added more servers, we swapped over to the next generation of CPUs, then we added even more servers, and now we have the latest generation Intel CPUs in this cluster. One of the questions we've been getting more and more, is what do you see with liquid cooling? So, Dell has had the capability to do liquid cooled systems for a while now, but we recently added this capability into factory as well. So you can order systems that are direct contact liquid cooled directly from factory. Let's compare the two, right? Right over here, you have an air cooled rack. Here we have the exact same configuration, so the same compute infrastructure, but with liquid cool. The CPU has a cold plate on it, and that's cooled with facilities water. So these pipes actually have water flowing through them, and so each sled has two pipes coming out of it, for the water loop, and these pipes from each server, each sled, go into these rack manifolds, and at the bottom of the rack over there, is where we have our heat exchanger. In our early studies, we have seen that, your efficiency in terms of how much performance you get out of the server, should not matter whether you're air cooled or liquid cooled, if you're air cooling solution can provide enough cooling for your components. So, what they means is, if you have a well air cooled solution, it's not going to perform any worse than a liquid cooled solution. What the liquid cooling allows you to do is in the same rack space, put in a higher level configuration, higher TDP processors, more disks, a configuration that you say cannot adequately air cool, that configuration in the same space in your data center with the same air flow, you will be able to liquid cool. The biggest advantage of liquid cooling today, is to do with PUE ratios. So how much of your infrastructure power are you using for compute and your infrastructure versus for cooling and power. This is production, this is part of the cluster. What we are doing right now is we are running rack level studies, right? So we've done single chassis studies in our thermal lab along with our thermal engineers on the advantages of liquid cooling and what we can do and how it works for our particular workloads. But now we have a rack level solution, and so we are running different types of workloads, manufacturing workloads, weather simulation, some AI workloads, standard high performance, linpack benchmarks, on an entire rack of liquid cooled, an entire rack of air cooled, all these racks have metered PDUs, where we can measure power, so we're going to measure power consumption as well, and then we have sensors which will allow us to measure temperature, and then we can tell you the whole story. And of course, we have a really, you know, phenomenal group of people in our thermal team, our architects, and we also have the ability to come in and evaluate a data center to see, does liquid cooling make sense for you today. It's not a one size fits all, and liquid cooling is what everybody must do and you must do it today, no. It's a, and that's the value of this lab, right? Actual quantitative results, for liquid cooling, for all our technologies, for all our solutions, so that we can give you the right configuration, right optimizations, with the data backing it up for the right decision for you, instead of forcing you into the one solution that we do have. So now we're actually standing right in the middle of our Zenith super computers, so all the racks around you are Zenith. You can hear that the noise level is higher, that's because this is one cluster, it's running workload right now, both from our team and our engineers, as well as from customers who can get access into the lab and run their workload. So that noise level you hear, is an actual super computer, we have C6420 servers in here today, with the Intel Xeon scalable family processors, and that's what you see in these racks behind you and in front of you. And this cluster is interconnected using the Omnipath interconnect. There are thousands and thousands of applications in the HPC space, and over the years we've added more and more capability. So today in the lab we do a lot of work with manufacturing applications, that's computational fluid dynamic, CFDs, CAE, structural mechanics, you know, things like that. We do a lot of work with life sciences, that's next generation sequencing applications, molecular dynamics, cryogenic electron microscopy, we do weather simulation applications, and a whole bunch more. Quantum chromo dynamics, we do a whole bunch of benchmarking of subsystems. So tests, for compute, for network, for memory, for storage, we do a lot of parify systems, and I/O tests, and when I talk about application benchmarking, we're doing that across different compute, network, and storage to see what the full picture looks like. The list that I've given you, is not a complete list. This switch is an Dell Network H-Series switch, which supports the Omnipath fabric, the Omnipath interconnect, that today runs at a hundred gigabits per second. What you have is all the clusters, all the Zenith servers in the lab, are connected to this switch. Because we started with a few number of servers and then scaled, we knew we were going to grow. We chose to start with a director class switch, which allowed us to add leaf modules as we grew. So the servers, the racks, that are closest to the switch have copper cables, the ones that are coming from across the lab have our fiber cables. So, you know, this switch is what allows us to call this HPC cluster, where we have a high-speed interconnect for our parallel and distributed computations, and a lot of our current deep learning work is being done on this cluster as well on the Intel Xeon side. (upbeat music)

Published Date : Aug 7 2018

SUMMARY :

and then we can tell you the whole story.

ENTITIES

Entity	Category	Confidence
thousands	QUANTITY	0.99+
two pipes	QUANTITY	0.99+
128 servers	QUANTITY	0.99+
two	QUANTITY	0.99+
Dell	ORGANIZATION	0.99+
each sled	QUANTITY	0.99+
First	QUANTITY	0.99+
One	QUANTITY	0.99+
each server	QUANTITY	0.98+
HBCN AI Innovation Lab	ORGANIZATION	0.98+
one solution	QUANTITY	0.98+
both	QUANTITY	0.98+
Xeon	COMMERCIAL_ITEM	0.97+
one cluster	QUANTITY	0.97+
today	DATE	0.96+
twice a year	QUANTITY	0.96+
around 560 servers	QUANTITY	0.96+
C6420	COMMERCIAL_ITEM	0.95+
Network H-Series	COMMERCIAL_ITEM	0.95+
500 servers	QUANTITY	0.95+
Intel	ORGANIZATION	0.94+
500 fastest supercomputers	QUANTITY	0.93+
Dell EMC	ORGANIZATION	0.92+
single chassis	QUANTITY	0.9+
Hadoop	TITLE	0.9+
Omnipath	COMMERCIAL_ITEM	0.81+
a hundred gigabits per second	QUANTITY	0.79+
applications	QUANTITY	0.76+
Tensorflow	TITLE	0.71+
AI Lab Tour	EVENT	0.67+
CAP	TITLE	0.64+
500	QUANTITY	0.6+
one	QUANTITY	0.56+
Zenith	ORGANIZATION	0.55+
top 500	QUANTITY	0.54+
MXNet	TITLE	0.5+
Zenith	COMMERCIAL_ITEM	0.46+
Omnipath	ORGANIZATION	0.36+

Recommend Videos

Sentiment Analysis

AWS Comprehend

Search Results for AI Lab Tour: