IBM Intros Telum II Processor & Spyre AI Accelerator: 8 Cores Clocked at 5.5 GHz With 360 MB Cache

IBM Intros Telum II Processor & Spyre AI Accelerator: 8 Cores Clocked at 5.5 GHz With 360 MB Cache 1

IBM has unveiled its next-gen Telum II Processor & Spyre AI Accelerators for the latest IBM Z mainframe systems powering AI.

Today, IBM is revealing the first architectural details of its Telum II processor and Spyre Accelerator which are meant to advance AI workloads on the next-gen IBM Z mainframes that are designed for AI workloads. These new AI mainframes will accelerate traditional AI workloads along with LLMs using a brand new ensemble method of AI.

Telum II processor: Featuring eight high-performance cores running at 5.5GHz, with 36MB L2 cache per core and a 40% increase in on-chip cache capacity for a total of 360MB. The virtual level-4 cache of 2.88GB per processor drawer provides a 40% increase over the previous generation. The integrated AI accelerator allows for low-latency, high-throughput in-transaction AI inferencing, for example enhancing fraud detection during financial transactions, and provides a fourfold increase in compute capacity per chip over the previous generation.

The new I/O Acceleration Unit DPU is integrated into the Telum II chip. It is designed to improve data handling with a 50% increased I/O density. This advancement enhances the overall efficiency and scalability of IBM Z, making it well suited to handle the large-scale AI workloads and data-intensive applications of today's businesses.

Spyre Accelerator: A purpose-built enterprise-grade accelerator offering scalable capabilities for complex AI models and generative AI use cases is being showcased. It features up to 1TB of memory, built to work in tandem across the eight cards of a regular IO drawer, to support AI model workloads across the mainframe while designed to consume no more than 75W per card. Each chip will have 32 compute cores supporting int4, int8, fp8, and fp16 datatypes for both low-latency and high-throughput AI applications.

via IBM

Starting with the details, we first have the IBM Telum II processor which is composed of the CPU cores, cache, DPU, and AI accelerators. The chip itself is based on the Samsung 5HPP node, rocking 43 Billion transistors in a 600mm2 die size.

The CPU features an 8-core design with increased frequencies of up to 5.5 GHz, increased caches with 36 MB L2 dedicated per core, & a 40% increase in on-chip cache, leading to a total pool count of 360 MB. The chip also features a virtual L4 cache of 2.88 GB (48.5ns latency) per processor drawer, which also marks a 40% increase over the first-generation Telum chips. IBM states that these new cores have brand new branch prediction improvements, register size has grown to 160 and the overall size is reduced by 20% along with a 15% power reduction. The socket performance is improved by 20%.

Each Telum II processor comes with an integrated AI accelerator which offers low-latency and high throughput AI inferencing performance. This accelerator provides 24 TOPS per chip, 192 TOPS per drawer, and 768 TOPS per system. Other new additions include an I/O Acceleration Unit DPU which has been integrated into the Telum II chip and can improve data handling with a 50% uplift in I/O density. The DPU also reduces power for IO management by 70%.

The second AI chip that IBM is introducing today for its IBM Z mainframe is the Spyre AI accelerator which is an enterprise-grade solution offering 300+ TOPS AI performance with 128 GB of LPDDR5 memory capacity and 1 TB of memory across 8 cards that can be plugged into the IBM Z mainframe running the Telum II processor.

Each Spyre AI accelerator features 32 compute cores which support INT4, INT8, FP8 & FP16 data types & come in 75W TDP cards. Each card is designed for low-latency and high-throughput AI applications.

IBM expects its Z mainframe AI systems with Telum II processors to be available to client in 2025 while the Spyre AI accelerator is currently in tech preview and is also expected to be made available by 2025.