NVIDIA Fires Back At AMD, Claims H100 AI GPU Delivers 47% Faster Performance Than MI300X With Optimized Software
NVIDIA Fires Back At AMD, Claims H100 AI GPU Delivers 47% Faster Performance Than MI300X With Optimized Software

NVIDIA has released a new set of benchmarks for its H100 AI GPU and compared it against AMD's recently unveiled MI300X. The purpose of these latest benchmarks is to showcase how the H100 delivers faster performance than the competition using the right software which wasn't the case during its competitor's recent presentation.
During the "Advancing AI" presentation, AMD launched its Instinct MI300X GPU which aims to be the start of its AI journey in the data center segment. The presentation included various numbers and benchmarks in which the company compared the MI300X against NVIDIA's H100 GPU. AMD claims that the MI300X offers up to 20% faster performance than the H100 when comparing a single GPU and up to 60% faster than the H100 when comparing an 8 GPU server. NVIDIA is quick to respond to these benchmarks and has highlighted that the results are far from the truth.
The NVIDIA H100 GPU was released in 2022 and has seen various improvements on the software side. The most recent TensorRT-LLM improvements have driven even further performance in AI-specific workloads along with Kernel-level optimizations. NVIDIA states that all of these allow the H100 AI GPUs to execute models such as Llama 2 70B using FP8 operations. Following are the AI GPU performance figures in Llama 2 70B presented by AMD during the event:
AMD ran the numbers using its optimized libraries within the ROCm 6.0 suite when comparing the Instinct MI300X to the Hopper H100. However, the same wasn't the case for the NVIDIA H100 GPU which wasn't tested with optimized software such as TensorRT-LLM. In benchmarks published by NVIDIA, the company shows the actual measured performance of a single DGX H100 server with up to 8 H100 GPUs running the Llama 2 70B model in Batch-1.
Footnotes:
AMD’s implied claims for H100 are measured based on the configuration taken from AMD launch presentation footnote #MI300-38. Using vLLM v.02.2.2 inference software with NVIDIA DGX H100 system, Llama 2 70B query with an input sequence length of 2,048 and output sequence length of 128. They claimed relative performance compared to DGX H100 with 8x GPU MI300X system.
For NVIDIA measured data, DGX H100 with 8x NVIDIA H100 Tensor Core GPUs with 80 GB HBM3 with publicly available NVIDIA TensorRT-LLM, v0.5.0 for batch 1 and v0.6.1 for latency threshold measurements. Workload details same as footnote #MI300-38.
The results show that compared to what AMD showcased during its event, the DGX H100 server is 2x faster when using optimized software workflows. The server is also 47% faster than the AMD MI300X 8-GPU solution.
DGX H100 can process a single inference in 1.7 seconds using a batch size of one—in other words, one inference request at a time. A batch size of one results in the fastest possible response time for serving a model. To optimize both response time and data center throughput, cloud services set a fixed response time for a particular service. This enables them to combine multiple inference requests into larger “batches” and increase the overall inferences per second of the server. Industry-standard benchmarks like MLPerf also measure performance with this fixed response time metric.
Small tradeoffs in response time can yield x-factors in the number of inference requests that a server can process in real time. Using a fixed 2.5-second response time budget, an 8-GPU DGX H100 server can process over five Llama 2 70B inferences per second compared to less than one per second with batch one.
via NVIDIA
NVIDIA's use of these new benchmarks is correct in a way that AMD also used optimized software to evaluate the performance of its GPUs so why not do the same when testing NVIDIA's hardware? NVIDIA's software stack around the CUDA ecosystem and now the emerging AI market is robust & has years of effort and development put into it whereas ROCm 6.0 from AMD is new and has yet to be tested in a real-life scenario. With that said, AMD has made a great share of deals with top companies such as Microsoft, META, and more who see their MI300X GPUs as an alternative to the NVIDIA AI solutions.
The Instinct MI300X and MI300A are expected to ramp by 1H 2024 which is around the same time NVIDIA will be introducing an even faster Hopper H200 GPU followed by Blackwell B100 in 2H 2024. So it looks like the competition in the AI space is expected to get even more heated.
What's Your Reaction?






