Elon Musk Begins Training xAI With 100,000 Liquid-Cooled NVIDIA H100 GPUs, The Most Powerful AI Training Cluster On The Planet

Elon Musk Begins Training xAI With 100,000 Liquid-Cooled NVIDIA H100 GPUs, The Most Powerful AI Training Cluster On The Planet

 0
Elon Musk Begins Training xAI With 100,000 Liquid-Cooled NVIDIA H100 GPUs, The Most Powerful AI Training Cluster On The Planet

X Chairman, Elon Musk, announces the commencement of GROK 3 training at Memphis using the current-gen NVIDIA H100 GPUs.

The popular venture 'xAI' from the company's chairman has officially begun training on NVIDIA's most powerful data center H100 GPUs. Elon Musk proudly announced this on X, calling it 'the most powerful AI training cluster in the world!'. In the post, he said that the supercluster will be trained by 100,000 liquid-cooled H100 GPUs on a single RDMA fabric and congratulated xAI, X, and team Nvidia for starting the training at Memphis.

Nice work by @xAI team, @X team, @Nvidia & supporting companies getting Memphis Supercluster training started at ~4:20am local time.

With 100k liquid-cooled H100s on a single RDMA fabric, it’s the most powerful AI training cluster in the world!

— Elon Musk (@elonmusk) July 22, 2024

The training started at 4:20 am Memphis local time and according to another follow-up post, Elon claims that the world's most powerful AI will be ready by December this year. As per the reports, GROK 2 will be ready for release next month and GROK 3 by December. This came around two weeks after xAI and Oracle canceled their $10 billion server deal.

xAI was renting Nvidia's AI chips from Oracle but decided to build its own server, ending the existing deal with Oracle, which was supposed to continue for a few years. The project is now aimed at building its own supercomputer superior to Oracle and this is going to be achieved by using a hundred thousand high-performance H100 GPUs. Each H100 GPU costs roughly $30,000 and while GROK 2 did use 20,000 of them, GROK 3 requires five times the power to develop its AI chatbot.

This decision comes as a surprise since Nvidia is about to ship its newer H200 GPUs in Q3. H200 was in mass production in Q2 and uses the advanced Hopper architecture, providing better memory configuration, resulting in up to 45% better response time for generative AI outputs. Following the H200, it's not far from now when Nvidia is about to launch its Blackwell-based B100 and B200 GPUs right at the end of 2024.

This is a significant advantage in training the world’s most powerful AI by every metric by December this year

— Elon Musk (@elonmusk) July 22, 2024

It was expected that the xAI Gigafactory of Compute would be ready before the fall of 2025 but apparently, the operation of the Gigafactory commenced before the original plan. According to Elon, this advanced large language model will be completely trained by the end of 2024, posing itself to be the fastest AI ever the world has seen till now.

News Source: elonmusk

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow