Intel Presents Gaudi 2 Accelerator As A Lower Cost Alternative For AI Compute & GenAI, Publishes Fine-Tuned MLPerf Training v4.0 Benchmarks

Intel Presents Gaudi 2 Accelerator As A Lower Cost Alternative For AI Compute & GenAI, Publishes Fine-Tuned MLPerf Training v4.0 Benchmarks

 0
Intel Presents Gaudi 2 Accelerator As A Lower Cost Alternative For AI Compute & GenAI, Publishes Fine-Tuned MLPerf Training v4.0 Benchmarks
Intel Gaudi 2 Accelerators Showcase Competitive Performance Per Dollar Against NVIDIA H100 In MLPerf 4.0 GenAI Benchmarks 1

Intel has published its latest Gaudi 2 accelerator benchmarks in MLPerf Training v4.0, claiming it to be a low-cost alternative for AI Compute and GenAI workloads.

Today's MLPerf Training v4.0 benchmarks are special for Intel because it is the first time that the company has submitted the performance of a large-scale Gaudi 2 system, incorporating 1024 accelerators, all of which are trained on Intel Tuber Developer Cloud to demonstrate the performance and scalability of Intel's AI portfolio. This software eco-system was used to fine-tune the performance of these accelerators within MLPerf's GPT-3 175B parameter model.

Intel states that with these new benchmarks, Intel demonstrates how its Gaudi 2 accelerators are a scalable, affordable, and accessible GenAI & AI Compute solution that can train LLMs with 70 billion, up to 175 billion parameters and the next-gen Gaudi 3 accelerator is going to be the next leap in performance while supporting an open software suite.

Once again, through this latest MLPerf v4.0 benchmark, Intel proves that it is the only benchmarked alternative for AI compute to NVIDIA's H100 which completed the 175B run in 66.9 minutes TTT (Time-To-Train). Not only that but in the new fine-tuned Llama 2 70B model that used LoRa (low-rank adapters), Intel achieved a TTT of 78.1 minutes on just eight Gaudi 2 accelerators.

Intel Gaudi 2 MLPerf Results Demonstrate Transparency: The MLPerf results show Gaudi 2 continues to be the only MLPerf-benchmarked alternative for AI compute to the Nvidia H100. Trained on the Tiber Developer Cloud, Intel’s GPT-3 results for time-to-train (TTT) of 66.9 minutes on an AI system of 1,024 Gaudi accelerators proves strong Gaudi 2 scaling performance on ultra-large LLMs within a developer cloud environment.

The benchmark suite featured a new measurement: fine-tuning the Llama 2 70B parameter model using low-rank adapters (LoRa). Fine-tuning LLMs is a common task for many customers and AI practitioners, making it a relevant benchmark for everyday applications.

Intel’s submission achieved a time-to-train of 78.1 minutes on eight Gaudi 2 accelerators. Intel utilized open-source software from Optimum Habana for the submission, leveraging Zero-3 from DeepSpeed for optimizing memory efficiency and scaling during large model training, as well as Flash-Attention-2 to accelerate attention mechanisms. The benchmark task force – led by the engineering teams from Intel’s Habana Labs and Hugging Face – is responsible for the reference code and benchmark rules.

The major selling point for Gaudi 2 accelerators, as highlighted by Intel, is the prices. At Computex, Intel unveiled that a Gaudi 2 AI system with 8 accelerators and a Universal Baseboard (UBB) will be available for $65,000 which is estimated to be 1/3rd of the cost of the competing solutions. Not only that, but Intel's Gaudi 3 kits within a similar configuration will be available for $125,000 US.

The Gaudi 3 accelerators are expected to be faster than NVIDIA's H100 while being very competitive versus the H200 solutions when they enter the AI market. The higher value of Gaudi system has led to several customers selecting them over NVIDIA's offerings, some of them include:

  • Naver, a South Korean cloud service provider and leading search engine catering to more than 600 million users, is building a new AI ecosystem and lowering barriers to enable wide-scale LLM adoption by reducing development costs and project timelines for its customers.
  • AI Sweden, an alliance between the Swedish government and private businesses, leverages Gaudi for fine-tuning with domain-specific municipal content to improve operational efficiencies and enhance public services for Sweden’s constituents.
  • For its next public MLPerf benchmarks, Intel has confirmed that it will publish the results for its Gaudi 3 accelerators in the upcoming inference benchmark. These AI accelerators will be available generally through OEMs by fall 2024.

    What's Your Reaction?

    like

    dislike

    love

    funny

    angry

    sad

    wow