Intel Gaudi 3 AI Accelerator Official: 5nm, 128 GB HBM2e, Up To 900W, 50% Faster Than NVIDIA H100 & 40% More Efficient
Intel Gaudi 3 AI Accelerator Official: 5nm, 128 GB HBM2e, Up To 900W, 50% Faster Than NVIDIA H100 & 40% More Efficient

Intel has finally revealed its next-gen AI Accelerator, the Gaudi 3, based on a 5nm process node and competing directly against NVIDIA's H100 GPUs.
Intel's Gaudi AI accelerators have been a big competitor and the only alternative to NVIDIA's GPUs in the AI segment. We recently saw some heated benchmark comparisons between the Gaudi 2 & the NVIDIA A100/H100 GPUs with Intel showcasing its strong perf/$ lead while NVIDIA remained an overall AI leader in terms of performance. Now begins the third chapter in Intel's AI journey with its Gaudi 3 accelerator which has been fully detailed.
The company announced the Gaudi 3 accelerator which features the latest (5th Gen) Tensor Core architecture with a total of 64 tensor cores packed within two compute dies. The GPU itself has a 96 MB cache pool which is shared across both dies and there are eight HBM sites, each featuring 8-hi stacks of 16 Gb HBM2e DRAM for up to 128 GB capacities & up to 3.7 TB/s bandwidth. The entire chip is fabricated using TSMC 5nm process node technology and there are a total of 24 200GbE interconnect links.
In terms of product offerings, the Intel Gaudi 3 AI accelerators will come in both Mezzanine OAM (HL-325L) form factor with up to 900W standard and over 900W liquid-cooled variants & PCIe AIC with a full-height, double-wide and 10.5" length design. The Gaudi 3 HL-338 PCIe cards will come in passive cooling and support up to 600W TDP with the same specifications as the OAM variant.
The company also announced its own HLB-325 baseboard and HLFB-325L integrated subsystem which can carry up to 8 Gaudi 3 accelerators. This system has a combined TDP of 7.6 Kilowatts & measures 19".
The follow up to Gaudi 3 will come in the form of Falcon Shores which is expected for 2025 and will be combining both Gaudi and Xe IPs in a single GPU programming interface which is built around the Intel oneAPI specification.
Press Release: At Intel Vision, Intel introduces the Intel Gaudi 3 AI accelerator, which delivers 4x AI compute for BF16, 1.5x increase in memory bandwidth, and 2x networking bandwidth for massive system scale-out compared to its predecessor – a significant leap in performance and productivity for AI training and inference on popular large language models (LLMs) and multimodal models.
The Intel Gaudi 3 accelerator will meet these requirements and offer versatility through open community-based software and open industry-standard Ethernet, helping businesses flexibly scale their AI systems and applications.
How Custom Architecture Delivers GenAI Performance and Efficiency: The Intel Gaudi 3 accelerator, architected for efficient large-scale AI compute, is manufactured on a 5 nanometer (nm) process and offers significant advancements over its predecessor. It is designed to allow activation of all engines in parallel — with the Matrix Multiplication Engine (MME), Tensor Processor Cores (TPCs) and Networking Interface Cards (NICs) — enabling the acceleration needed for fast, efficient deep learning computation and scale. Key features include:
Intel Gaudi 3 accelerator will deliver significant performance improvements for training and inference tasks on leading GenAI models. Specifically, the Intel Gaudi 3 accelerator is projected to deliver on average versus NVIDIA H100:
About Market Adoption and Availability: The Intel Gaudi 3 accelerator will be available to original equipment manufacturers (OEMs) in the second quarter of 2024 in industry-standard configurations of Universal Baseboard and open accelerator module (OAM). Among the notable OEM adopters that will bring Gaudi 3 to market are Dell Technologies, HPE, Lenovo, and Supermicro. General availability of Intel Gaudi 3 accelerators is anticipated for the third quarter of 2024, and the Intel Gaudi 3 PCIe add-in card is anticipated to be available in the last quarter of 2024.
The Intel Gaudi 3 accelerator will also power several cost-effective cloud LLM infrastructures for training and inference, offering price-performance advantages and choices to organizations that now include NAVER.
What's Your Reaction?






