AMD MI300X Up To 3x Faster Than NVIDIA H100 In LLM Inference AI Benchmarks, Offers Competitive Pricing Too
AMD MI300X Up To 3x Faster Than NVIDIA H100 In LLM Inference AI Benchmarks, Offers Competitive Pricing Too

Tensorwave has published the latest benchmarks of the AMD MI300X in LLM Inference AI workloads, offering 3x higher performance than NVIDIA H100.
AI Cloud provider, Tensorwave, has showcased the performance of AMD's MI300 accelerator within AI LLM Inference benchmarks against the NVIDIA H100. The company is one of the many who are offering cloud instances powered by AMD's latest Instinct accelerators and it looks like AMD might just have the lead, not only in performance but also value.
In a blog post, Tensorwave demonstrates how AMD's MI300 and MK1's accelerated AI engines and models are accelerating the landscape with faster & optimized performance across multiple LLMs (Large Language Models).
The company uses the Mixtral 8x7B model and conducted both online & offline tests on AMD & NVIDIA hardware. The test setup included 8 MI300X accelerators, each with a 192 GB memory pool, and 8 NVIDIA H100 SXM5 accelerators, each with an 80 GB memory pool. AMD's setup was running the latest ROCm 6.12 driver suite with the MK1 inference engine and ROCm AI optimizations for vLLM v0.4.0 while NVIDIA's setup was running the CUDA 12.2 driver stack (the latest is CUDA 12.5) with the vLLM v4.3 inference stack.
AMD
NVIDIA
Notes
In terms of offline performance, the AMD MI300X AI accelerator showcased a performance uplift of 22%, all the way up to 194% (almost 3X) compared to the NVIDIA H100 across batch sizes that ranged from 1 to 1024. The MI300X accelerator was faster than the H100 across all batch sizes.
In online performance, Tensorwave designed a series of online tests to simulate realistic typical chat applications. The key metrics of interest are:
Here, the AMD MI300X accelerator offers 33% more requests per second versus two NVIDIA H100 GPUs while maintaining an average latency of 5 seconds. The MI300X accelerator also offers much higher throughput than the H100, generating text faster at higher volumes of traffic.
Note - It is mentioned that NVIDIA's Hopper H100 GPUs were running the vLLM suite and not the TensorRT-LLM optimizations plus they are running an older CUDA stack from last year. The latest optimizations in the software stack have already boosted the performance of the NVIDIA Hopper GPUs by quite a big margin.
Our benchmarks demonstrate that AMD's MI300X outperforms NVIDIA's H100 in both offline and online inference tasks for MoE architectures like Mixtral 8x7B. The MI300X not only offers higher throughput but also excels in real-world scenarios requiring fast response times.
Given its impressive performance, competitive cost, and hardware availability, the MI300X with MK1 software is an excellent choice for enterprises looking to scale their AI inference capabilities.
via Tensorwave
In the end, Tensorwave appreciated the high performance and very competitive pricing of AMD's MI300X accelerators against the NVIDIA H100. The company's CEO has already highlighted MI300X as the far superior option versus H100. The MI300X is also said to be readily available versus the H100 which is mostly booked out. You can learn more about Tensorwave's MI300X cloud instances here.
What's Your Reaction?






