AMD-Powered Frontier Supercomputer Uses 3K of Its 37K MI250X GPUs To Achieves a Whopping 1 Trilllion Parameter LLM Run, Comparable To ChatGPT-4

USA Plans To Produce The Fastest Supercomputer Called Discovery, Surpassing The Frontier By 3-5 Times 1

The AMD-powered Frontier Supercomputer with Instinct MI250X GPUs has achieved a 1 Trillion Parameter LLM run, rivaling ChatGPT-4.

The Frontier supercomputer is the world's leading supercomputer and the only Exascale machine that is currently operating. This machine is powered by AMD's EPYC & Instinct hardware which not only offers the top HPC performance but is also the 2nd most efficient supercomputer on the planet. A submission report on Arxiv by individuals has revealed that the Frontier supercomputer has reached the ability to train one trillion parameters through "hyperparameter tuning", setting a new industry benchmark.

This is 'just' 3k of the 37k MI250X on Frontier. To steal from Feynman, "There is plenty of room at the top!" -- I'm expecting lots of interesting work scaling out further to tens of thousands of nodes.

— Nicholas Malaya (@nicholasmalaya) January 5, 2024

Before we go into the crux, let's take a quick recap on what the Frontier supercomputer holds. The supercomputer by ORNL has been designed from the ground up with AMD's 3rd Gen EPYC Trento CPUs and Instinct MI250X GPU accelerators. It is installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, where it is operated by the Department of Energy (DOE). It currently has achieved 1.194 Exaflop/s using 8,699,904 cores. The HPE Cray EX architecture combines 3rd Gen AMD EPYC CPUs optimized for HPC and AI, with AMD Instinct 250X accelerators and a Slingshot-11 interconnect. Frontier has been able to maintain the number one spot on the Top500.org list of supercomputers, showing its dominance.

The new records achieved by Frontier are a result of implementing effective strategies to train LLMs and use the onboard hardware most efficiently. The team has been able to achieve notable results through their extensive testing of 22 Billion, 175 Billion, and 1 Trillion parameters, and the figures obtained are a result of optimizing and fine-tuning the model training process. The results were achieved by employing up to 3,000 AMD's MI250X AI accelerators, which have shown their prowess despite being a relatively outdated piece of hardware.

What's more interesting is that the whole Frontier supercomputer houses 37,000 MI250X GPUs so one can imagine the kind of performance when using the entire GPU pool to power LLMs. AMD is also on the verge of implementing its MI300 GPU accelerators in brand-new supercomputers with a robust ROCm 6.0 ecosystem that further accelerates AI performance.

For 22 Billion, 175 Billion, and 1 Trillion parameters, we achieved GPU throughputs of 38.38%, 36.14%, and 31.96%, respectively. For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved 100% weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively. We also achieved strong scaling efficiencies of 89% and 87% for these two models.

- Arvix

News Source: Arvix