NOTE: The publication of this episode was delayed due to the untimely passing of our partner and pal Rich Brueckner. So what we’re announcing as “breaking news” isn’t so fresh today, but our takes on what NVIDIA’s new A100 processor brings to the table are still valid.
Breaking News! This special edition of RadioFreeHPC takes a deep dive into NVIDIA’s spanking new A100 GPU – which is an impressive achievement in processor-dom. The new chip is built with a 7nm process and weighs in at a hefty 54 billion transistors and capped at 400 Watts. It sports 6,912 FP32 CUDA cores, 3,456 FP64 CUDA cores and 422 Tensor cores.
This 8th generation GPU, using what the company calls its Ampere technology, is a replacement for both their V100 GPU and Turing T4 processors, giving the company a single platform for both AI training and inferencing.
We talk about the specs of the A100, breaking down its game both in terms of typical HPC FP64 processing and FP32 (and lower precision) computing for AI workloads. On the HPC side, the new GPU seems to offer an across the board 25% speedup, which is substantial. But the A100 really shines when it comes to tensor core performance which the company reports at an average speed up of 10x on Tensor Core 32 bit vs. V100 FP32.
New features of the A100 include Sparsity (a mechanism that doubles sparse matrix performance), a much speedier NVLink (2x), and a hardware feature that allows the A100 to be partitioned into as many as 7 GPU instances to support individual workloads.
All in all, this is an amazing new processor, a behemoth, large and hot, but so fast; a chip that is heavily tilted towards new AI and Tensor workloads with a passing but welcome nod to 64-bit HPC apps.