NVIDIA Blackwell GPU Architecture Official: 208 Billion Transistors, 5x AI Performance, 192 GB HBM3e Memory, 8 TB/s Bandwidth

NVIDIA Blackwell GPU Architecture Official: 208 Billion Transistors, 5x AI Performance, 192 GB HBM3e Memory, 8 TB/s Bandwidth 1

NVIDIA has officially unveiled its next-gen Blackwell GPU architecture which features up to a 5x performance increase versus Hopper H100 GPUs.

NVIDIA has gone official with the full details of its next-generation AI & Tensor Core GPU architecture codenamed Blackwell. As expected, the Blackwell GPUs are the first to feature NVIDIA's first MCM design which will incorporate two GPUs on the same die.

World’s Most Powerful Chip — Packed with 208 billion transistors, Blackwell-architecture GPUs are manufactured using a custom-built 4NP TSMC process with two-reticle limit GPU dies connected by 10 TB/second chip-to-chip link into a single, unified GPU.

Second-Generation Transformer Engine — Fueled by new micro-tensor scaling support and NVIDIA’s advanced dynamic range management algorithms integrated into NVIDIA TensorRT™-LLM and NeMo Megatron frameworks, Blackwell will support double the compute and model sizes with new 4-bit floating point AI inference capabilities.

Fifth-Generation NVLink — To accelerate performance for multitrillion-parameter and mixture-of-experts AI models, the latest iteration of NVIDIA NVLink® delivers groundbreaking 1.8TB/s bidirectional throughput per GPU, ensuring seamless high-speed communication among up to 576 GPUs for the most complex LLMs.

RAS Engine — Blackwell-powered GPUs include a dedicated engine for reliability, availability and serviceability. Additionally, the Blackwell architecture adds capabilities at the chip level to utilize AI-based preventative maintenance to run diagnostics and forecast reliability issues. This maximizes system uptime and improves resiliency for massive-scale AI deployments to run uninterrupted for weeks or even months at a time and to reduce operating costs.

Secure AI — Advanced confidential computing capabilities protect AI models and customer data without compromising performance, with support for new native interface encryption protocols, which are critical for privacy-sensitive industries like healthcare and financial services.

Decompression Engine — A dedicated decompression engine supports the latest formats, accelerating database queries to deliver the highest performance in data analytics and data science. In the coming years, data processing, on which companies spend tens of billions of dollars annually, will be increasingly GPU-accelerated.

Diving into the details, the NVIDIA Blackwell GPU features a total of 104 Billion transistors on each compute die which is fabricated on the TSMC 4NP process node. Interestingly, both Synopsys and TSMC have utilized NVIDIA's CuLitho technology for the production of Blackwell GPUs which makes making Each chip accelerates the manufacturing of these next-gen AI accelerator chips. The B100 GPUs are equipped with a 10 TB/s high-bandwidth interface which allows super-fast chip-to-chip interconnect. These GPUs are unified as one chip on the same package, offering up to 208 Billion transistors and full GPU cache coherency.

Compared to the Hopper, the NVIDIA Blackwell GPU offers 128 Billion more transistors, 5x the AI performance which is boosted to 20 petaFlops per chip, and 4x the on-die memory. The GPU itself is coupled with 8 HBM3e stacks featuring the world's fastest memory solution, offering 8 TB/s of memory bandwidth across an 8192-bit bus interface and up to 192 GB HBM3e memory. To quickly sum up the performance figures versus Hopper, you are getting:

20 PFLOPS FP8 (2.5x Hopper)

20 PFLOPS FP6 (2.5x Hopper)

40 PFLOPS FP4 (5.0x Hopper)

740B Parameters (6.0x Hopper)

34T Parameters/sec (5.0x Hopper)

7.2 TB/s NVLINK (4.0x Hopper)

NVIDIA will be offering Blackwell GPUs as a full-on platform, combining two of these GPUs which is four compute dies with a singular Grace CPU (72 ARM Neoverse V2 CPU cores). The GPUs will be inter-connected to each other and the Grace CPUs using a 900 GB/s NVLINK protocol.

First up, we have the NVIDIA Blackwell B200 GPU. This is the first of the two Blackwell chips that will be adopted into various designs ranging from SXM modules, PCIe AICs & Superchip platforms. The B200 GPU will be the first NVIDIA GPU to utilize a chiplet design, featuring two compute dies based on the TSMC 4nm process node.

MCM or Multi-Chip-Module has been a long coming on the NVIDIA side of things &it's finally here as the company tries to tackle challenges associated with next-gen process nodes such as yields and cost. Chiplets provide a viable alternative where NVIDIA can still achieve faster gen-over-gen performance without compromising its supply or costs and this is just a stepping stone in its chiplet journey.

The NVIDIA Blackwell B200 GPU will be a monster chip. It incorporates a total of 160 SMs for 20,480 cores. The GPU will feature the latest NVLINK interconnect technology, supporting the same 8 GPU architecture and a 400 GbE networking switch. It's also going to be very power-hungry with a 700W peak TDP though that's also the same as the H100 and H200 chips. Summing this chip up:

TMSC 4NP Process Node

Multi-Chip-Package GPU

1-GPU 104 Billion Transistors

2-GPU 208 Billion Transistors

160 SMs (20,480 Cores)

8 HBM Packages

192 GB HBM3e Memory

8 TB/s Memory Bandwidth

8192-bit Memory Bus Interface

8-Hi Stack HBM3e

PCIe 6.0 Support

700W TDP (Peak)

On the memory side, the Blackwell B200 GPU will pack up to 192 GB of HBM3e memory. This will be featured in eight stacks of 8-hi modules, each featuring 24 GB VRAM capacity across an 8192-bit wide bus interface. This will be a 2.4x increase over the H100 80 GB GPUs which allows the chip to run bigger LLMs.

The NVIDIA Blackwell B200 and its respective platforms will pave a new era of AI computing and offer brutal competition to AMD and Intel's latest chip offerings which are yet to see widespread adoption. With the unveiling of Blackwell, NVIDIA has once again cemented itself as the dominant force of the AI market.