NVIDIA GeForce RTX 4090 GPU Offers Up To 15X AI Throughput Versus Laptop CPUs, TensorRT-LLM Boosts Perf By Up To 70%

Chinese AI Startups Are Opting For GeForce RTX 4090 GPUs As NVIDIA's H20 Accelerators Offer Poor Value 1

NVIDIA has showcased impressive numbers for its GeForce RTX 40 GPUs including the flagship RTX 4090 in AI models such as Llama & Mistral.

NVIDIA's TensorRT-LLM acceleration for Windows has brought some spectacular performance uplifts on the Windows PC platform. We have seen some impressive gains & new features that have been added to NVIDIA's RTX "AI PC" feature set and things are getting even better with the company showcasing some huge performance figures with its flagship GeForce RTX 4090 GPU.

In a new AI-Decoded blog, NVIDIA has shared how its existing GPU lineup trumps over the entire NPU ecosystem which has only managed to reach 50 TOPS in 2024. Meanwhile, NVIDIA's RTX AI GPUs feature several 100 TOPS and go all the way up to 1321 TOPS using the GeForce RTX 4090, making it the fastest desktop AI solution for running LLMs and more. It's also the fastest gaming graphics card on the planet.

NVIDIA's GeForce RTX GPUs offer up to 24 GB of VRAM while NVIDIA RTX GPUs offer up to 48 GB of VRAM, making them quite the beasts when it comes to handling LLMs (Large Language Models) as these workloads love large amounts of video memory. NVIDIA's RTX hardware comes not only with dedicated video memory but also AI-specific acceleration through Tensor Cores (hardware) and the aforementioned TensorRT-LLM (software).

The number of generated tokens across all batch sizes on NVIDIA's GeForce RTX 4090 GPUs is very fast but it improves significantly, over 4x, when enabling TensorRT-LLM acceleration.

NVIDIA is now sharing some new benchmarks using the open-source Jan.ai platform which has also recently integrated TensorRT-LLM into its local chatbot app. This chatbot makes use of AI models such as Llama or Mistral in an easy-to-use solution. The software provider has now offered a look into some benchmarks run on NVIDIA's GeForce RTX 40 GPUs against laptop CPUs with dedicated AI NPUs.

The NVIDIA GeForce RTX 4090 GPU offers an 8.7x improvement over the AMD Ryzen 9 8945HS CPU without TensorRT-LLM and that lead extends to 15x using the acceleration (a 70% boost over the non-TensorRT-LLM config).

You can process up to 170.63 tokens in a second versus 11.57 tokens/sec on the AMD CPU. Even with the NVIDIA GeForce RTX 4070 Laptop GPU, you get an acceleration of up to 4.45x. Even more interestingly, the company has also shared numbers using an RTX 4090 in an eGPU configuration to showcase how the performance of laptops can be further accelerated using an external GPU for AI workloads. This configuration has a performance uplift of 9.07x over the same AMD laptop CPU.

NVIDIA recently laid out the current landscape of AI computational power and shows how its GeForce RTX 40 Desktop CPUs scale from 242 TOPS at the entry level and up to 1321 TOPS at the high end. That's a 4.84x increase at the lowest end and a 26.42x increase at the very top compared to the latest 45-50 TOPS AI NPUs that we will be seeing on SOCs this year.

Even laptop NVIDIA GeForce RTX 40 options such as the RTX 4050 start at 194 TOPS which is a 3.88x increase over the fastest-coming NPU while the RTX 4090 Laptop chip offers a 13.72x speedup with its 686 TOPS.

Time and time again, NVIDIA has showcased just how much ahead it is in the AI segment versus the competition and these benchmarks once again solidify that if you have a use for AI, then NVIDIA has the right hardware for you.