Intel Unveils Full Aurora Supercomputer Specifications: 21,248 Xeon CPUs & 63,744 GPUs For Over 2 ExaFlops

Intel has finally unveiled the full specifications of the Aurora supercomputer designed for the Argonne National Laboratory in the US.

The Intel Aurora Supercomputer has been delayed for a long time but it's finally coming to shape. Powered by Intel's Xeon CPU Max and Xeon GPU Max series, the system has been upgraded to a two Exaflop machine compared to its 1 Exaflop initial target. This will bring it on par with the AMD-powered Frontier supercomputer which is currently the fastest on the planet.

In the latest disclosure, Intel revealed that the Aurora supercomputer will be packing a total of 10,624 Nodes which include a mammoth 21,248 Xeon CPUs based on the Sapphire Rapids-SP family and 63,744 GPUs based on the Ponte Vecchio design. This system will be a beast with an insane fabric interconnect that offers a peak injection bandwidth of 2.12 PB/s & a peak bisection bandwidth of 0.69 PB/s.

Argonne is spearheading an international collaboration to advance the project, including Intel; HPE; Department of Energy laboratories; U.S. and international universities; nonprofits; & international partners, such as RIKEN.

Additionally, Intel and Argonne National Laboratory highlighted installation progress, system specs and early performance results for Aurora:

Intel has completed the physical delivery of more than 10,000 blades for the Aurora supercomputer.

Aurora’s full system, built using HPE Cray EX supercomputers, will have 63,744 GPUs and 21,248 CPUs and 1,024 DAOS storage nodes. And it will utilize the HPE Slingshot high-performance Ethernet network.

Early results show leading performance on real-world science and engineering workloads, with up to 2x performance over AMD MI250 GPUs, 20% improvement over H100 on the QMPACK quantum mechanical application, and near linear scaling up to hundreds of nodes.

Aurora is expected to offer more than 2 exaflops of peak double-precision compute performance when launched this year.

via Intel

For memory, the Aurora supercomputer is outfitted with 10.9 PB of DDR5 system DRAM, 1.36 PB of HBM capacity through the CPUs, and 8.16 PB of HBM capacity through the GPUs. The system DRAM achieves a peak bandwidth of 5.95 PB/s, the CPU HBM achieves a peak bandwidth of 30.5 PB/s and the GPU HBM achieves a peak bandwidth of 208.9 PB/s. For storage, the system is equipped with a 230 PB DAOS capacity that runs at a peak bandwidth of 31 TB/s & is configured in a total of 1024 nodes.

Aurora running the latest Intel Data Center GPU Max Series 1550 offers the fastest SimpleFOMP performance, outclassing the NVIDIA A100 and AMD Instinct MI250X accelerators. Intel also touts some impressive relative performance versus those accelerators in Fusion Reactor predictions, the Monte Carlo Methods (Maximized), and QMCPACK (Computing Quantum Mechanical Properties).

At the Intel special presentation, McVeigh highlighted the latest competitive performance results across the full breadth of hardware and shared strong momentum with customers.

The Intel Data Center GPU Max Series outperforms Nvidia H100 PCIe card by an average of 30% on diverse workloads1, while independent software vendor Ansys shows a 50% speedup for the Max Series GPU over H100 on AI-accelerated HPC applications.

The Xeon Max Series CPU, the only x86 processor with high bandwidth memory, exhibits a 65% improvement over AMD’s Genoa processor on the High Performance Conjugate Gradients (HPCG) benchmark1, using less power. High memory bandwidth has been noted as among the most desired features for HPC customers.

4th Gen Intel Xeon Scalable processors – the most widely used in HPC – deliver a 50% average speedup over AMD’s Milan4, and energy company BP’s newest 4th Gen Xeon HPC cluster provides an 8x increase in performance over its previous-generation processors with improved energy efficiency.

The Gaudi2 deep learning accelerator performs competitively on deep learning training and inference, with up to 2.4x faster performance than Nvidia A100.

via Intel

Once again, the Aurora supercomputer is said to launch later this year with peak performance exceeding the 2 Exaflops barrier. The supercomputer will also be running the latest Aurora gen AI model which offers 1 trillion parameters for scientific applications.

In addition to the Aurora Supercomputer, Intel has also announced its brand-new Data Center GPU Max Subsystem which comes in an x8 UBB design with a total of 8 Ponte Vecchio GPUs.