AMD Announces Full Support For Llama 3.1 AI Models Across EPYC CPUs, Instinct Accelerators, Ryzen AI NPUs & Radeon GPUs

AMD Announces Full Support For Llama 3.1 AI Models Across EPYC CPUs, Instinct Accelerators, Ryzen AI NPUs & Radeon GPUs 1

AMD has announced full Llama 3.1 AI model support across its entire portfolio including EPYC, Instinct, Ryzen & Radeon.

Press Release: Our AI strategy at AMD is focused on enabling the AI ecosystem with a broad portfolio of optimized training and inference compute engines, open and proven software capabilities, and deep-rooted co-innovation with our partners and customers. High performance, innovations, and broad compatibility are foundational vectors driving this strategy as the AI universe evolves. A significant focus of ours is to enable the next generation of AI models for everyone, making the benefits of AI pervasive.

With Llama 3.1, the LLM expands context length to 128K, adds support across 8 languages, and introduces Llama 3.1 405B, which according to Meta, is the largest openly available foundation model. With Llama 3.1 405B, it will enable the community to unlock new capabilities, such as synthetic data generation and model distillation.

We are encouraged by the recent release of the Llama 3.1 models from Meta and have them up and running in the labs at AMD on our broad portfolio of compute engines showing positive results. In the meantime, we want to showcase some of the impressive work our teams have done with Llama 3 and what Llama 3.1 means for AMD AI customers.

Every generation of models brings new capabilities and performance to its community of users and Llama 3.1 is no different, revolutionizing complex conversations with unparalleled contextual understanding, reasoning, and text generation, running seamlessly on AMD Instinct MI300X GPU Accelerator and Platform from day 0.

AMD Instinct MI300X GPUs continue to provide the leading memory capacity and bandwidth that enables users to run a single instance of Llama 3 70B on a single MI300X accommodate and up to 8 parallel instances simultaneously on a single server.

But, with the new 405B parameter model, the largest openly available foundation model, the need for memory capacity is more important than ever. We have confirmed that a server powered by eight AMD Instinct MI300X accelerators can fit the entire Llama 3.1 405B parameter model using the FP16 datatype. This means organizations can benefit from significant cost savings, simplified infrastructure management, and enhanced performance efficiency. This is made possible by the industry-leading memory capabilities of the AMD Instinct MI300X platform.

Finally, Meta used the latest versions of the ROCm Open Ecosystem and AMD Instinct MI300X GPUs in parts of the development process of Llama 3.1. This is a continuation of our ongoing collaboration with Meta, and we look forward to furthering this productive collaboration.

Beyond data center GPUs, AMD enables a leading server platform for data center computing, offering high performance, energy efficiency, and x86 compatibility for a variety of data center workloads with our AMD EPYC CPUs. AI is an increasingly vital part of many data center applications, boosting creativity, productivity, and efficiency across myriad workloads.

As most modern data centers support a variety of workloads, using AMD EPYC CPUs gives customers leadership enterprise workload performance, energy efficiency, and the ability to run AI and LLMs for inferencing, small model development, testing, and batch training.

Llama’s use as a benchmark has emerged as a consistent, easy-to-access, and useful tool to help data center customers identify the key characteristics (performance, latency, scale) that guide assessments of technology and infrastructure to help model suitability to business’ data center server needs.

Llama 3.1 extends the value as a source of critical reference data with more scale, flexibility on data generation and synthesis, expanded context length, and language support to better map to global business needs.

For those that are running a CPU-only environment, with a smaller model like Llama 3 8B, our leadership 4th Gen AMD EPYC processors provide compelling performance and efficiency without requiring GPU acceleration. Modestly sized LLMs such as this are proving to be foundational elements to enterprise-class AI implementations.

The ability to test CPU-only performance using the Llama 3 tools has given numerous customers the insight that there are many classes of workloads that they can develop and deploy on readily available compute infrastructure. As the workloads grow more demanding and the models get larger, that same AMD EPYC server infrastructure is a powerful and efficient host to accommodate advanced GPU acceleration solutions such as AMD Instinct or other 3rd party accelerators.

Not a coder? No problem! Harness the power of Meta’s Llama 3.1 at your fingertips with the AMD Ryzen AI series of processors.

While developers can use code blocks and repos to get started with Llama 3.1, AMD is committed to the democratization of AI and lowering the barrier to entry for AI – which is why we partnered with LM Studio to bring Meta’s Llama 3.1 model to customers with AMD AI PCs.

To try it out, please head over to LM Studio and experience a state-of-the-art, completely local, chatbot powered by Llama 3.1 in just a few clicks. You can now use it to type emails, proofread documents, generate code, and a lot more!

For users who are looking to drive generative AI locally, AMD Radeon GPUs can harness the power of on-device AI processing to unlock new experiences and gain access to personalized and real-time AI performance.

LLMs are no longer the preserve of big businesses with dedicated IT departments, running services in the cloud. With the combined power of select AMD Radeon desktop GPUs and AMD ROCm software, new open-source LLMs like Meta's Llama 2 and 3 – including the just released Llama 3.1 – mean that even small businesses can run their customized AI tools locally, on standard desktop PCs or workstations, without the need to store sensitive data online.

AMD AI desktop systems equipped with a Radeon PRO W7900 GPU running AMD ROCm 6.1 software and powered by Ryzen Threadripper PRO processors represent a new client solution to fine-tune and run inference on LLMs with high precision.

As we push the boundaries of AI, the collaboration between AMD and Meta plays a crucial role in advancing open-source AI. The compatibility of Llama 3.1 with AMD Instinct MI300X GPUs, AMD EPYC CPUs, AMD Ryzen AI, AMD Radeon GPUs, and AMD ROCm offers users a diverse choice of hardware and software, ensuring unparalleled performance and efficiency. AMD remains committed to providing cutting-edge technology that empowers innovation and growth across all sectors.