Tue. Jul 5th, 2022

The specifications of NVIDIA’s AD102 “Lovelace” GPU die have finally been confirmed. The fully enabled core will pack an incredible 18,432 FP32 cores, a sizable increase over the AMD Navi 31’s 12,288 shaders. As reported the other day, the RTX 4090 will come with a couple of GPCs partially fused off, bringing down the effective core count to 16,128. If NVIDIA ever plans to launch a 4090 Ti that behemoth will leverage the full-fat AD102 core and its 18,432 shaders.

According to Kopite7kimi, the AD102 die will be able to touch the 100 TFLOPs single-precision performance mark with its core running at 2.8GHz. What he still isn’t sure about is the SM structure. NVIDIA has a habit of mildly restructuring its SM (Compute Unit) every generation. This time around it might be thoroughly overhauled, much like with Maxwell roughly eight years back, or not.

I’ll recap what I had shared a while back about the Maxwell SM and the possible SM design of Lovelace:

This image has an empty alt attribute;  its file name is Screen-Shot-2014-02-23-at-17.10.371.png

With Maxwell, the warp schedulers and the resulting threads per SM / clock were quadrupled, resulting in a 135% performance gain per core. It looks like NVIDIA wants to pull another Maxwella generation known for exceptional performance and power efficiency that absolutely crushed rival AMD’s Radeon offerings.

This image has an empty alt attribute;  its file name is FR91uUracAEI5U7.jpeg

This would mean that the overall core count per SM would remain unchanged (128) but the resources accessible to each cluster would increase drastically. Most notably, the number of concurrent threads would double from 128 to 256. It’s hard to say how much of a performance increase this will translate to but we’ll certainly see a fat gain. Unfortunately, this layout takes up a lot of expensive die space, something NVIDIA is already paying a lot of dough to acquire (TSMC N4). So, it’s hard to say whether Jensen’s team actually managed to pull this off or shelved it for future designs.

This image has an empty alt attribute;  its file name is FR91u6taQAAQBGd-1024x710.jpeg
Lovelace SM with 8 partitions
This image has an empty alt attribute;  its file name is 2019-07-21-image-2-p_1100-1024x381.webp
Fermi vs Kepler vs Maxwell vs Turing SMs

There’s also a chance that Team Green decides to go with a coupled SM design, something already introduced with Hopper. In case you missed out on the Hopper Whitepaper, here’s a small primer on Thread Block Clusters and Distributed Shared Memory (DSM). To make scheduling on GPUs with over 100 SMs more efficient, Hopper and Lovelace will group every two thread blocks in a GPC into a cluster. The primary aim of Thread Block Clusters is to improve multithreading and SM utilization. These Clusters run concurrently across SMs in a GPC.

This image has an empty alt attribute;  its file name is Screenshot-2022-05-16-at-20-41-41-NVIDIA-H100-Tensor-Core-GPU-Architecture-Overview-1024x393.png

Thanks to an SM-to-SM network between the two threads blocks in a cluster, data can be efficiently shared between them. This is going to be one of the key features promoting scalability on Hopper and Lovelace which is a key requirement when you’re increasing the core / ALU count by over 50%.

This image has an empty alt attribute;  its file name is Screenshot-2022-05-16-at-20-41-02-NVIDIA-H100-Tensor-Core-GPU-Architecture-Overview-1024x318.png
GPU TU102 GA102 AD102 AD103 AD104
Arch Turing Ampere Ada Lovelace Ada Lovelace Ada Lovelace
Process TSMC 12nm Sam 8nm LPP TSMC 5nm TSMC 5nm TSMC 5nm
GPC 6 7 12 7 5
TPC 36 42 72 42 30
SMS 72 84 144 84 60
Shaders 4,608 10,752 18,432 10,752 7,680
TP 16.1 37.6 ~ 90 TFLOPs? ~ 50 TFLOPs ~ 35 TFLOPs
L2 Cache 6MB 6MB 96MB 64MB 48MB
Bus Width 384-bit 384-bit 384-bit 256-bit 192-bit
TGP 250W 350W 600W? 350W? 250W?
Launch Sep 2018 Sep 2020 Aug-Sep 2022 Q4 2022 Q4 2022

These are the two potential ways NVIDIA can (nearly) double the core counts without crippling scaling or leaving some of the shaders underutilized. Of course, there’s always a chance that Jensen’s team comes up with something entirely new and unexpected.

Leave a Reply

Your email address will not be published.