Nvidia chips are now in three of the five fastest supercomputers in the world. But how did Nvidia get there so fast?
The Barcelona Supercomputing Center is developing a new hybrid supercomputer that, for the first time, combines energy-efficient Nvidia Tegra CPUs, based on the ARM chip architecture, with Nvidia GPUs.
(Credit: Barcelona Supercomputing Center)
First, a quick primer on Tesla and graphics chip-based supercomputing. Tesla processors are basically graphics processing units (GPUs) that have been redesigned for supercomputers. The results are impressive enough that some of the most important supercomputing sites have signed on. The US Department of Energy's Oak Ridge National Laboratory, probably the premier US supercomputing site, will use Tesla processors in its next supercomputer, Titan.
"About 28 per cent of all the supercomputer sites use GPGPUs," said Steve Conway, an analyst at IDC, referring to general purpose GPU. "Those sites use at least some GPUs. That's about three times what it was three years ago. But this trend is much wider than it is deep. In the sense that the average supercomputer that does have GPUs in it, 5 per cent of those processors are GPUs. The rest are standard (central processing unit) processors. So, it's still pretty early. GPGPUs right now are mainly experimental," Conway said.
Nvidia's chief technology officer Steve Scott is trying to change that. He spent 20 years at supercomputing giant Cray as the CTO. He's been at Nvidia for only three months.
Why the move to Nvidia?
I worked closely with Nvidia, AMD and Intel on their road maps. I became absolutely convinced that heterogeneous, hybrid computing [GPUs and CPUs] was the only way to move forward given the power constraints affecting system designs and that Nvidia was the only company that had a viable business strategy.
And the Oak Ridge connection?
I worked very closely with Oak Ridge over the last few years. When it came time to replace Jaguar we realised we couldn't do it with normal CPU technology. Basically, voltage scaling has ended. We're no longer able to drop the voltage [to make the processors more power efficient].
Explain the power efficiency argument?
Moore's Law is alive and well. We keep getting to add exponentially more transistors to a chip, but we can't run them all. Because the chip will literally burn up. So we're in an environment where we're constrained entirely by power, from a performance design perspective. It's become entirely about performance per watt. About power efficiency. And that's where the [GPU] accelerator technologies really shine.
The amount of energy per FLOP is over seven times lower in a current accelerator than it is in a CPU. The top-line GPU today has over 500 processors on it — compared to six or eight on a CPU die [chip]. The processors are much smaller and have much lower overhead.
Do you see this conversion to GPU-based supercomputing accelerating?
Take a look at the Top 500 list. Three years ago the very first GPU-enabled computer showed up in the Top 500. Now they're more than doubling each year. At the high end of the list, three of the top five machines are GPU-enabled. And Oak Ridge is going with GPU supercomputing. So, it's a pretty fast ramp.
How does a GPU-centric supercomputer work?
You need to do the vast majority of your work on processing cores that are designed for energy efficiency [GPUs]. They're designed to execute hundreds of parallel threads efficiently. But you also need a small number of cores to run a single thread very fast. The leftover work that can't be parallelised, you want to run that on a CPU-style core.
High-performance computing code is highly parallel in a distributed memory fashion. In other words, you take your job and split it up over hundreds or thousands of separate nodes. And within each node, there will be parallel [GPU] work and serial [CPU] work. And typically the parallel work is going to be well over 90 per cent of the code. Sometimes it's less, sometimes it's 99.9 per cent of the code.
In the Titan machine, approximately 90 per cent of the total FLOPs of that machine will come from the GPUs. And approximately 10 per cent will be on the CPUs. It's 1:1. One CPU coupled with a Tesla GPU.
What about programming for the GPU?
CUDA [an Nvidia programming model] allowed people to program GPUs using C and later Fortran — as if they were standard computers. And more recently was ... something called directives. Tell the compiler, please take this ... and execute it on the accelerator [GPU], and then the compiler and the runtime take care of generating the instructions to run on the accelerator. From a programming model perspective, all the programmer really has to do is identify and expose the parallelism to the compiler just by using a set of directives.
What about Intel's foray into accelerated supercomputing with its Knights Corner many-core chip?
Nvidia GPUs are general purpose processors. They have an instruction set and they can do everything any other processor can do. They're not an x86 instruction set. They're a RISC instruction set. But it's absolutely a general purpose chip where you can do anything that you can do on any other computer. This is really Intel recognising that they can't get where they need to go with standard Xeon multi-core. They have to go to an accelerated model for the exact same reasons I was talking about. For power efficiency. I think it's an endorsement. They're several years behind Nvidia, but it's the right path.