AMD Zen 5 Core Architecture Breakdown At Hot Chips: Zen For A New Chapter In High-Performance Computing
AMD Zen 5 Core Architecture Breakdown At Hot Chips: Zen For A New Chapter In High-Performance Computing

At Hot Chips, AMD is offering an in-depth look at its brand-new Zen 5 core architecture which will be powering its next high-performance PC journey.
AMD's Zen 1 core architecture first launched back in 2017 and since then, the company has introduced five new architectures (Zen+, Zen 2, Zen 3, Zen 4, Zen 5). AMD started the decade, by launching the Zen 3 architecture which brought a 19% IPC improvement to the table, an 8-core complex, and increased L3 caches per CCX while utilizing the 7nm/6nm process technologies.
The company followed up with the Zen 4 release, bringing another 14% IPC improvement, AVX-512 (FP-256) instructions, doubling the L2 cache to 1 MB, support for VNNI/BFLOAT16 and rocking the 5nm and 4nm process technology.
This year, AMD introduced Zen 5, its latest high-performance core architecture which brings a 16% IPC uplift with AVX-512 and FP-512 variants, 8-wide dispatch, 6 ALUs, Dual pipe fetch/decode, and a 4nm/3nm technology utilization. Today, AMD is deep-diving into the full architecture for its Zen 5 at Hot Chips.
AMD starts by stating the design objectives for Zen 5. In terms of performance, Zen 5 aims to deliver another major 1T and NT performance increase, balanced cross-core 1T/NT instruction and data throughput, create front-end parallelism, increase execution parallelism, high throughput, efficient data movement and prefetching, and support AVX512/FP512 data paths for throughput and AI uplifts. Simultaneously, AMD wants to add new capabilities such as additional ISA extensions and new security features along with expanded platform support with its Zen 5 and Zen 5C core variants.
Following is an overview of AMD's Zen 5 core architecture:
2 Threads/Core
NextGen Branch Predictor
Caches:
Dual I-Fetch/decode pipes, 4 inst/pipe
8 ops/cycle dispatched to integer or FP
Execution capabilities:
TLBs:
As for what the Zen 5 offers to provide a balanced throughput, you are looking at:
Front End parallelism:
Execution:
Dataflow:
In terms of Fetch Advances, AMD's Zen 5 core architecture offers:
Branch Prediction: fewer bubbles, more accuracy, and throughput
Memory management:
Icache latency and bandwidth
In terms of Decode Advances, AMD's Zen 5 core architecture offers:
Opcache: higher density with greater coverage and throughput
Dual Decode Pipes
8-wide dispatch to Int and FP execution
In terms of Execution Advances, AMD's Zen 5 core architecture offers:
8-wide dispatch, rename, retire
Integer scheduler advances
6 ALU with 3 multipliers, 3 branch units
4 AGU feed a wider LS with 4 memory addresses per cycle
Execution window growth
AMD has also made major FP changes and added new features such as the aforementioned AVX 512 with full 512b datapath. Zen 5 offers more bandwidth and less latency with 4 1op/cycle execution pipelines, 2 LS/integer register pipelines, 2 512b loads/cycle, 1 512b store/cycle, and 2 cycle FADDs. The execution window has also been widened with 8-wide dispatch in 3 larger schedulers (1/pipe pair) and the physical register file has doubled.
Lastly, we have the Load and Store advances which include:
48KB 12-way L1D keeping 4-cycle load-to-use
More Bandwidth
Larger In-Flight Window
Data prefetching
Moving over to the cache, Zen 5 has seen certain upgrades with 2x the L2/core interface bandwidth with 64B/clk to the L1 and L1D and from the L1D, 2x L2 associativity which is now 16-way and does 3.5 fewer cycles, supports more L3 in-flight misses and configurations include 32/16 MB L3 (Zen 5 / Zen 5C), 4 MB per core (Zen 5) and 2 MB per core (Zen 5C).
Talking about the two configurations, the Zen 5 core is optimized for peak 1T performance while the Zen 5C core is aimed at perf/w and perf/area optimized platforms. Both Zen 5 and Zen 5C use the same ISA which includes the following:
For power efficiency improvements, AMD has built Zen 5 from the ground up and continues to build upon the power gating improvements and 2T support (major pref/watt benefits). The Zen 5 architecture also features reduced power state entry/exit times, better branch prediction to eliminate waster work, and also optimizes operations by eliminating bus, cache, and inter-core traffic through string operations optimizations and prefetcher effectiveness and efficiencies.
Following are the key advances made within Zen 5 versus Zen 4:
AMD is also sharing the Zen 5 core complex's speeds and feeds which offers double the L2 associativity, double the L2 bandwidth, low-latency L3 with 320 L3 in-flight misses, a fast and private L2 cache (1 MB), L3 shared across all cores in the complex, L3 filled from L2 victims & L2 tags duplicated in L3 for probe filtering and fast cache transfer.
Talking about products, AMD's Zen 5 core complexes or CCX's will be featured first across three rounds of products. These include Ryzen 9000 "Granite Ridge" Desktop CPUs, Ryzen AI 300 "Strix" Laptop CPUs, and 5th Gen EPYC "Turin" Data Center CPUs.
AMD just got started with Zen 5 so we can expect even more products in the future as the company fine-tunes the architecture for PCs & servers.
What's Your Reaction?






