Intel Gaudi 2 Accelerators Showcase Competitive Performance Per Dollar Against NVIDIA H100 In MLPerf 4.0 GenAI Benchmarks

Intel Gaudi 2 Accelerators Showcase Competitive Performance Per Dollar Against NVIDIA H100 In MLPerf 4.0 GenAI Benchmarks 1

Intel has just released its latest MLPerf v4.0 performance figures covering the Gaudi 2 Accelerators & 5th Gen Xeon "Emerald Rapids" CPUs, with the former showcasing strong performance per dollar values against NVIDIA's H100 GPU.

Intel has been fine-tuning the performance of its Gaudi accelerator lineup in AI workloads using its OneAPI framework for some time now. The result of this ongoing software work was showcased in the latest MLPerf v4.0 performance figures which showcase the GenAI capabilities in workloads like Llama-70B and Stable Diffusion XL where Intel's solutions offer competitive performance against its rivaling chips. More recently, the company showcased how Gaudi 2 accelerators were faster versus NVIDIA's solutions in the latest GenAI workloads such as Stable Diffusion & Llama 2 LLMs. More on that here.

For comparisons, Intel used an x8 Gaudi 2 accelerator configuration against x8 NVIDIA H100 GPUs for FP8 and INT8 performance benchmarking. In relative performance, the NVIDIA H100 without a doubt sits much ahead of the Intel Gaudi 2 accelerators, offering up to 3.35x uplifts in server & up to 2.76x uplifts in offline generation. But where the game completely shifts in Intel's favor is the perf/$ where the Gaudi 2 accelerators become a very competitively positioned product and what Intel terms Gaudi 2 as the only "Benchmarked Alternative" to NVIDIA's H100 for GenAI workloads.

Update [4/12/2024] - Intel has slightly tuned its MLPerf performance benchmark slide to represent the performance/$ claim accurately since it is not an official MLCommons metric but rather Intel's own showcase of how Gaudi compares to the competition.

So in terms of performance per dollar, the Intel Gaudi 2 AI accelerator offers 33% better value versus the NVIDIA H100 solution with the NVIDIA H100 only outpacing Gaudi 2 in Llama-70B (server). Intel has also recently partnered with Qualcomm and Google to tackle NVIDIA's CUDA dominance in AI through oneAPI which can lead to further refinements in the future. The next-generation Gaudi 3 and Falcon Shores AI accelerators are going to have a solid framework of software backing them up by the time they launch.

Intel also shares the MLPerf v4.0 results for its 5th Gen Xeon scalable family codenamed Emerald Rapids. Comparing the chips to the 4th Gen "Sapphire Rapids" family, you can see gains of up to 1.9x and an average 42% gain over the previous generation in workloads such as 3dUnet, BERT, RNN-T, ResNet50, RetinaNet, DLRMv2, GPT-J. Furthermore, Intel's OEM partners have also been submitting their results for 5th Gen Xeon CPUs across a range of mixed workloads. These partners include Dell Technologies, Quanta Computer, Supermicro, Wiwynn, and CISCO.

The highlights of these benchmarks are:

Intel Gaudi 2 accelerators

The ONLY benchmarked alternative to H100 for GenAI

Stable Diffusion and Llama-70B benchmarks show Gaudi2 price-performance advantages vs H100

Intel Guadi model coverage continues to advance and employ SOTA development approaches like TGI, used on Llama submission

5th Gen Xeon processors

Intel remains the only server CPU vendor to submit MLPerf results

5th Gen Xeon results improved by a geomean of 1.42x compared to 4th Gen Xeon's results in MLPerf Inference v3.1

The ever-increasing number of submissions and the growing list of partners are clear indicators that end customers want to maximize the utilization of their existing CPU infrastructure

These MLPerf v4.0 benchmarks showcase that Intel is very serious about its AI ecosystem and the work they have been putting in has started to bear fruit. AI is purely dominated by NVIDIA at the moment but with efforts being put in by Intel, the space can become way more heated in the next few years.