AMD Instinct MI300X Makes First Appearance In MLPerf v4.1 AI Benchmarks, Tested With Next-Gen EPYC Turin “Zen 5” CPUs

AMD's Instinct MI300X AI accelerators have made their first appearance at MLPerf v4.1 and have been tested with next-gen EPYC "Turin" CPUs.

Today, AMD is sharing the first performance benchmarks of its latest data center and AI-centric hardware at MLPerf Inference v4.1. These workloads are designed to showcase the potential of the latest and upcoming hardware from various tech giants such as AMD, Intel & NVIDIA.

The red team is sharing its first submissions of the Instinct MI300X accelerator at MLPerf ever since the chip was introduced while also giving us a taste of the upcoming EPYC Turin CPUs which are the 5th Gen server lineup based on the Zen 5 core architecture.

For the performance evaluation, AMD submitted the results of its Instinct MI300X AI accelerators running on a Supermicro AS-8125GS-TNMR2 system. Four results were submitted at MLPerf v4.1 with two of them under the Offline scenario and two under the Server scenario. The difference is that two of these tests were conducted using the 4th Gen EPYC "Genoa" CPUs and the other two results were conducted using the upcoming 5th Gen EPYC "Turin" CPUs.

CPU-GPU Performance Combination:

Submission ID 4.1-0002: 8x AMD Instinct MI300X accelerators with 2x AMD EPYC 9374F (Genoa) CPUs in the Available category

This configuration showcased the powerful synergy between AMD Instinct MI300X GPU accelerators and 4th Gen EPYC CPUs (formerly codenamed “Genoa”) for AI workloads, delivering performance within 2-3% of NVIDIA DGX H100 with 4th Gen Intel Xeon CPUs in both server and offline scenarios at FP8 precision

Previewing Performance with Next-Gen CPU:

Submission ID 4.1-0070: 8x AMD Instinct MI300X with 2x AMD EPYC “Turin” CPUs in the Preview category.

Demonstrated the performance gains from the forthcoming 5th Gen AMD EPYC “Turin” CPU with AMD Instinct MI300X GPU accelerators, having a slight edge over NVIDIA DGX H100 with Intel Xeon in the server scenario and maintaining comparable performance in the offline scenario at FP8 precision

Single GPU Efficiency

Submission ID 4.1-0001: 1x AMD Instinct MI300X accelerator with 2x 4th Gen AMD EPYC 9374F CPUs (Genoa) in the Available category.

This entry highlighted the vast 192 GB memory of AMD Instinct MI300X, enabling a single GPU to efficiently run the entire LLaMA2-70B model, avoiding the network overhead associated with model splitting across multiple GPUs at FP8 precision (see Figure 2 below).

Compelling Dell Server Design Results with AMD Instinct MI300X Accelerators

Submission ID 4.1-0022: 8x AMD Instinct MI300X accelerators with 2x Intel(R) Xeon(R) Platinum 8460Y+ in the Available category

In addition to AMD submissions, Dell validated platform-level performance of AMD Instinct accelerators by submitting their results with LLaMA2-70B on an 8x AMD Instinct MI300X setup using their PowerEdge XE9680 server.

via AMD

Looking at the performance results in LLama2-70B, AMD achieved 21,028 tokens/s in the server and 23,514 tokens/s in the offline scenario running on the EPYC Genoa CPUs while 5th Gen EPYC "Turin" CPUs with the same Instinct configuration offer 22,021 tokens/s in server and 24,110 tokens/s in the offline scenario. This marks a 4.7% and 2.5% improvement over the Genoa CPU platform.

Compared to the NVIDIA H100, the Instinct MI300X is slightly slower in server performance while the difference gets larger in the offline scenario. The Turin configuration does end up faster by 2% in the server scenario but lags in the offline scenario. These results seem to match the ones that NVIDIA has published in its own announcement. AMD has also showcased near-perfect scaling in Llama2-70B using a 1 GPU and 8 GPU comparison.

Lastly, AMD highlights the memory advantage offered by its Instinct MI300X AI accelerators which far exceeds what's offered on the NVIDIA H100 platform. MI300X offers enough memory to meet the requirements of the largest language models across a variety of data formats.

We’re excited to continue showcasing the versatility and performance of AMD Instinct accelerators across future benchmarks as we expand our testing and optimization efforts. This is just the beginning of our journey. In the coming months, we plan to launch the next iterations of the AMD Instinct series, featuring among other advances, additional memory, support for lower precision data types, and increased compute power. Future ROCm releases target bringing software enhancements, including kernel improvements and advanced quantization support. Stay tuned for our next MLPerf submission—we look forward to sharing our progress and insights with you.

via AMD

AMD isn't done here as it aims to solidify its ROCm stack with more optimizations towards AI so we can see updates to the performance in the next iteration of MLPerf submissions. While AMD took a good amount of time to submit MI300X numbers, we can hope that the MI325X, which debuts next quarter, will have results submitted much earlier as it's a major product that delivers a 50% capacity increase over MI300X. AMD's EPYC Turin "Zen 5" CPU is also expected to launch later this year so stay tuned.