AMD Radeon 7900 XTX Achieves 890% Speedup In Generative AI With Stable Diffusion Optimization

NVIDIA is absolutely dominating the AI conversation right and for good measure - their GPUs perform out-of-the-box and are a top choice for professionals and businesses that want to dabble in consumer AI. But just this week, both Intel and AMD optimized their software stacks to get massive speedups in generative AI which has seen AMD's RTX 7900 XTX get higher performance per dollar than an NVIDIA RTX 4080 in generative AI (specifically Stable Diffusion with A111/Xformers). Considering Stable Diffusion accounts for the vast majority of non-SaaS, localized generative AI right now - this is a major milestone and finally offers some competition to NVIDIA.

Note: Tuning for GenAI, much like tuning for crypto mining performance, will have mileage vary significantly depending on the model/configuration being used. This article is about the most common A111 Xformers config (you can get a running tally of average performance by GPU here: https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html) but there *are* hyper tuned boutique optimizations where the NVIDIA RTX 4080 is faster still.

Using Microsoft Olive and DirectML instead of the PyTorch pathway results in the AMD 7900 XTX going form a measly 1.87 iterations per second to 18.59 iterations per second! You can read the detailed guide by AMD over here. This level of performance in Automatic111 is pretty close to the SHARK-based approach to Stable Diffusion and definitively puts the company on the map with regards to generative AI. As it turns out, it also makes the 7900 XTX offer slightly higher GenAI performance per dollar (in Stable Diffusion /A111) than the comparative RTX 4080 - at least at current prices.

The cheapest NVIDIA RTX 4080 I could find on Newegg (on 8/19/2023) was the MSI Ventus GeForce RTX 4080 16GB (WBM archived link here) and the cheapest AMD Radeon 7900 XTX I could find on Newegg was the MSI Gaming Radeon RX 7900 XTX 24GB (WBM archived link here). Before we crunch the numbers, I do want to mention the caveat that unlike NVIDIA, the AMD pathway does require the user to be a bit more tech savvy (AMD pathway uses Microsoft Olive instead of PyTorch and most automatic installers will likely not install the dependencies required for this automatically) - so if convenience is a factor for you - NVIDIA is still the way to go. But professionals and small businesses can usually get around an initial setup hassle if the cost basis is good enough and it does seem to be the case here.

As we can see, the AMD silicon is finally starting to shine in GenAI to the point where it offers higher value compared to the 4080 in Stable Diffusion A111. The AMD 7900 XTX offers 18.59 iterations per second making users pay $52.1 per it/s while NVIDIA RTX 4080 gets 19.41 iterations per second making users pay $56.6 per it/s. If users opt for the less-common SHARK implementation, they can drive the value proposition all the way up to just $46.6 per it/s for the Radeon 7900 XTX. So its official - AMD is officially a contender for consumers interested in generative AI.

This also means that given just slightly more attention from AMD - they can be a formidable competitor to NVIDIA's AI ambitions. Most people aren't going to be running LLMs out of their basement but GenAI and SLMs/ULMs are going to be absolutely everywhere within the next 12 months and part of a lot of productivity workflows. How Intel and AMD position themselves in a market that NVIDIA has a massive head start on - will determine how they fare in a world that is going to be dominated by AI.