NVIDIA Brings Up To 5x AI Acceleration To Windows 11 PCs Running RTX 40 & RTX 30 GPUs

NVIDIA is bringing a huge acceleration to AI Workloads to millions of Windows 11 PCs powered by its latest RTX GPUs.

Following up on its previous announcement, NVIDIA has now revealed that TensorRT-LLM is being added to Windows 11 and will be enabled for more than 100 million RTX users when it launches in the latest driver suite on the 21st of November. The announcement was made during Microsoft's Ignite, a key event discussing the future of AI and how it will transform the Windows ecosystem as we move forward.

Today, NVIDIA confirmed that TensorRT-LLM AI acceleration will be available for all RTX Desktops & laptops with more than 8 GB of VRAM. In addition to TensorRT-LLM, NVIDIA and Microsoft are also bringing DirectML enhancements to boost popular AI models such as Stable Diffusion and Llama 2.

Having an NVIDIA RTX GPU that supports TensorRT-LLM means that you will have all your data and projects available locally rather than saving them in the cloud. This would save time & deliver more precise results. RAG or Retrieval Augamanted Generation is one of the techniques used in making AI results faster by using a localized library that can be filled with the dataset you want the LLM to go through & then leverage the language understating capabilities of that LLM to provide you with accurate results.

NVIDIA states a 5x performance boost with TensorRT-LLM v0.6.0 which will be available later this month. Furthermore, it will also enable support for additional LLMs such as Mistral 7B & Nemotron 3 8B.

For those who want to try out the latest release of TensorRT-LLM, it will be available for installation at the official Github link here & you can also grab the latest optimized models from NVIDIA's NGC resource.

Another key update is coming to OpenAI, a very popular AI-based chat API that has a wide range of applications such as helping with documents, email, summarizing web content, data analysis, and a whole lot more. Once again, the data needs to be uploaded or input manually by the user so access to local data is rather limited, especially if it's a large dataset.

To solve this, NVIDIA and Microsoft will offer an API interface to OpenAI's ChatAPI through a new wrapper that will not only add TensorRT-LLM acceleration on Windows PCs but also enable users access to a similar workflow as they would running locally on a PC with RTX or in the cloud. So you don't have to upload any data set to ChatAPI as the entire data set will be available to ChatAPI as if it were available locally.

The custom wrapper will work with almost any LLM that has been optimized for TensorRT-LLM. A few examples of optimized LLMs include Llama 2, Mistral, NV LLM) and more will be added soon. It will also be available on the NVIDIA GitHub page.

These announcements show that NVIDIA wants to accelerate AI not just for enterprises but for mainstream audiences too. With AI, software is more important than the hardware running it & developments such as TensorRT-LLM and bringing it to millions of RTX users is definitely a huge deal. The road to AI supremacy is going to get heated in the coming years as more competitors try to woo audiences with their unique approaches to AI but at the moment, NVIDIA has both the hardware and software expertise to pave the way ahead of them smoothly.