AMD & NVIDIA Fight It Out In The Server CPU Segment: Each Claiming Over 2x Performance Uplift With EPYC Genoa & Grace Chips

AMD & NVIDIA are claiming that they both offer over 2x gain over the competitor with their EPYC Genoa & Grace Superchip CPU platforms for data centers.

NVIDIA's entry into the data center CPU segment with its Grace Superchip and Grace CPU has been a major deal, especially for the x86 market which has enjoyed its dominance in the space for quite some time. While Arm chips have seen decent adoption, NVIDIA's Grace CPU, also based on Arm architecture, is posing a major threat. That's why the company is now facing some heated reaction from AMD who have now responded to some recent performance claims made by NVIDIA.

NVIDIA is already massively successful in the AI space with its GPU-based accelerators and Grace CPUs further challenge x86 chipmakers such as AMD & Intel. With Grace, NVIDIA might have the potential to gain a good chunk of market share in the data center segment given that its Superchip platform is adopted in data center solution based on its Hopper & next-gen Blackwell solution and those are selling like hot cakes at the moment.

As a response, AMD has published a blog post, explaining the importance of data centers and how their high-performance and energy-efficient operations shape the tech world. What followed next was a shocker to not just NVIDIA but to the tech world in general. AMD shows that while EPYC Milan was already top-notch for the x86 data center segment, its latest "4th Gen EPYC" Genoa and Bergamo CPUs take the performance to a whole new level.

AMD shows off a huge performance lead with its 4th Gen EPYC processors against NVIDIA's Grace CPU Superchip. However, do note that these are not third-party benchmarks. So, the final results may vary but it does look like the new EPYC processors are significantly outperforming NVIDIA's Grace Superchip which is based on the ARM processor IP. The Grace Superchip is already a beastly 144-core processor (72 cores per chip) but in the benchmarks published by AMD, it seems to be of no match to the EPYC offerings.

AMD EPYC processors are the best choice for datacenter performance and efficiency, as they outperform the NVIDIA Grace CPU Superchip across ten key workloads, based on extensive industry-standard benchmark publications and testing. AMD EPYC processors also offer the advantage of x86 processor architecture compatibility, which enables you to deploy a broad set of workloads with no compromises, and without expensive architectural transitions to a different ISA.

AMD EPYC processors are the best option for datacenter operators who want to maximize performance while minimizing power and real estate footprint in a datacenter with an easy button. In the age of AI where you need capacity for your emerging AI workloads, AMD provides the best option with no compromises based on industry standards, transparency in data and benchmarks, and broad availability of platforms and solutions across the Ecosystem without expensive architectural transitions.

via AMD

In AMD's benchmark slides, we can see that both the EPYC 9654 and 9754 are outperforming Grace CPU Superchip by over 2x higher performance (over 4x in HPC Workloads such as ESPRESSO). In several data center-related tests, the EPYC CPUs can deliver at least 2.5x better relative performance, reaching up to 4x in HPC Workloads and over 3.5x in Server-Side Java. The EPYC 9754 is the flagship data center GPU featuring a whopping 128-core, 256-thread configuration, while the EPYC 9654 brings a 96-core, 192-thread config.

Next is the efficiency test, where AMD compares both EPYC CPUs against the NVIDIA Grace Superchip with single and dual-socket configs of the EPYC 9754. The EPYC 9654 comes out to be 2.27x more power-efficient, while a single 9754 offers 2.50x better efficiency. With a dual-socket solution, the efficiency increases to 2.75x, signaling that the EPYC is the best overall choice for demanding workloads. AMD also reminds readers about the advantage of its EPYC CPUs due to being based on the x86-64 architecture and how it can help you run a broad set of workloads without any compatibility issues.

SP5-279: As of 07/12/2024, a 1P AMD EPYC™ 9754 system delivers a 2.50x SPECpower_ssj® 2008 overall ssj_ops/watt uplift versus a 2P NVIDIA Grace™ CPU Superchip system. Configurations: 1P 128-core EPYC 9754 (33,014 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2023q3/power_ssj2008-20230524-01270.html) versus 2P 72-core Nvidia Grace Superchip (13,218 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q3/power_ssj2008-20240515-01413.html. SPEC® and SPECpower_ssj® 2008 are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

SP5-280: As of 07/12/2024, a 2P AMD EPYC™ 9754 system delivers a 2.75x SPECpower_ssj® 2008 overall ssj_ops/watt uplift versus a 2P NVIDIA Grace™ CPU Superchip system. Configurations: 2P 128-core EPYC 9754 (36,398 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q2/power_ssj2008-20240327-01386.html) versus 72-core Nvidia Grace Superchip (13,218 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q3/power_ssj2008-20240515-01413.html. SPEC® and SPECpower_ssj® 2008 are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

SP5-278: As of 07/12/2024, a 2P AMD EPYC™ 9654 system delivers a 2.27x SPECpower_ssj® 2008 overall ssj_ops/watt uplift versus a 2P NVIDIA Grace™ CPU Superchip system. Configurations: 2P 96-core AMD EPYC 9654 (30,064 overall ssj_ops/w, 2U, https://www.spec.org/power_ssj2008/results/res2022q4/power_ssj2008-20221204-01203.html) versus 2P 72-core Nvidia Grace Superchip (13,218 overall ssj_ops/watt, 2U, https://www.spec.org/power_ssj2008/results/res2024q3/power_ssj2008-20240515-01413.html. SPEC® and SPECpower_ssj® 2008 are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.

Meanwhile, NVIDIA has also shared a new performance update for Grace CPU Superchip and sticks to its original claims while adding some additional information.

According to the benchmarks published by NVIDIA, the server-side performance of Grace CPU Superchip is up to 2.4x faster than a dual-socket EPYC 9654 CPU platform and around 1.2-1.3x faster on average while also being much ahead of Intel's Sapphire Rapids platforms. The same is the case for Data Center throughput which is up to 3x faster on Grace and around 1.5-2.0x faster on average across multiple tests.

NVIDIA states that one of the key elements in testing Grace's CPU performance is to use optimized code using the latest compilers such as gcc 12. x. The company also recommends the use of optimal math libraries such as BLAS, LAPACK, FFT, and ScaLAPACK to fully leverage the capabilities that the Superchip architecture has to offer.

Customers, such as Murex, Gurobi, and Petrobras, are seeing compelling performance results that are demonstrating benefits of NVIDIA Grace CPUs.

Grace has strong momentum in HPC, with CPU only wins at University of Bristol, BSC, LANL, TACC and NCHC.

via NVIDIA

One thing to note is that both AMD and NVIDIA are using a different set of benchmarks and work-loads for comparisons of their EPYC & Grace CPUs. So the performance is expected to be different but given that the data center market is booming right now, each chipmaker will try to showcase how they sit ahead of the competition. There's no doubt in the fact that both AMD & NVIDIA have some compelling options for data centers but we expect these benchmarks to pave the way for fairer and more accurate representations of the performance in the future, where not only the best practices and optimizations are used for their (AMD/Intel/NVIDIA) own chips but the competition too.

As we look into the future, AMD is expected to introduce its 5th Gen EPYC CPU family, codenamed Turin, later this year and NVIDIA is aiming the launch its next major Arm CPU, codenamed Vera, by 2026.

News Source: AMD