NVIDIA Blackwell Debuts at MLPerf & Shatters AI Performance Records, Hopper’s Leadership Continues With H100 & H200 Outperforming AMD MI300X
NVIDIA Blackwell Debuts at MLPerf & Shatters AI Performance Records, Hopper’s Leadership Continues With H100 & H200 Outperforming AMD MI300X

NVIDIA's Blackwell AI chips secure record performance at MLPerf while Hopper H100 & H200 chips continue to get even stronger, exceeding the MI300X.
NVIDIA's Blackwell AI chips have finally made their record debut in MLPerf v4.1, securing record performance numbers across all benchmarks. Coming to data centers later this year, the NVIDIA Blackwell AI chips are poised as the strongest AI solution on the market, with up to 4x increase in generational performance.
Today, NVIDIA announced that it has achieved the highest performance in MLPerf Inference v4.1 across all AI benchmarks which include:
In Llama 2 70B, NVIDIA's Blackwell AI solutions offer a massive increase over the Hopper H100 chips. In server workloads, a single Blackwell GPU offers a 4x increase (10,756 Tokens/second) while in Offline scenarios, the single Blackwell GPU offers a 3.7x increase in performance with 11,264 Tokens/second. NVIDIA also offered the first publicly measured performance using FP4 running on Blackwell GPUs.
While Blackwell is the beast it was promised to be, NVIDIA's Hopper continues to get even stronger with more optimizations landing through the CUDA stack. The H200 and H100 chips offer leading performance across every test compared to the competition and also in the latest benchmarks such as the 56-billion parameter "Mixtral 8x7B" LLM.
The NVIDIA HGX H200 with 8 Hopper H200 GPUs and NVSwitch offers strong performance gains in Llama 2 70B, with a token generation speed of 34,864 (Offline) and 32,790 (Server) with a 1000W & 31,303 (Offline) and 30,128 (Server) tokens/second with the 700W configuration.
This is a 50% uplift over the Hopper H100 solution. The H100 still offers better AI performance in Llama 2 versus the AMD Instinct MI300X solution. The added performance comes thanks to software optimizations that apply to both Hopper chips and the 80% higher memory capacity and 40% higher bandwidth associated with H200 chips.
In Mixtral 8x7B using a multi-GPU test server, the NVIDIA H100 and H200 deliver up to 59,022 and 52,416 tokens/second output, respectively. AMD's Instinct MI300X seems to be missing in action in this particular workload as no submission was made by the red team. The same is the case in Stable Diffusion XL where new full-stack improvements boost performance by up to 27% for Hopper AI chips while AMD is yet to submit MLPerf under this specific workload.
NVIDIA's efforts to fine-tune its software have paid off tremendously. The company has seen major boosts in every MLPerf release and the advantage is provided directly to its customers who are running Hopper GPUs within their servers.
We have stated this before and we will say it again, AI and Data Centers aren't all about hardware, it's one component but the other component that's just as crucial (if not more) is software. There's no point in having the strongest hardware if you don't have the proper software to back it up and companies who are investing millions of dollars into AI infrastructure will look at the whole ecosystem.
NVIDIA has that ecosystem well and is ready to roll out to enterprises and AI powerhouses across the world which is why the company is now announcing the general availability of HGX H200 through various partners.
And It's not just the heavyweights Blackwell or Hopper that continue to get optimized. Even Edge solutions such as Jetson AG Orin have seen a 6x boost since MLPerf v4.0 submissions, leading to a huge impact on GenAI workloads at the Edge.
With Blackwell showcasing such strong performance before its launch, we can expect the new architecture, tailor-made for AI, to get even stronger, just like Hopper has, and pass on the optimization benefits to Blackwell Ultra later next year.
What's Your Reaction?






