NVIDIA Preps Even More Hopper AI GPUs For Chinese Market In Accordance With US Policies

NVIDIA Drastically Reduces Delivery Times Of Its AI GPUs As Supply Chain Witnesses Improvement 1

NVIDIA doesn't seem to be giving up on its huge AI market potential in the Chinese market as the company is reportedly making even more Hopper GPUs that comply with US policies.

Update: It seems like NVIDIA's new GPUs aren't trying to circumvent any US policies (as previously speculated) but instead are designed in accordance with the import rules and fully comply with the parameters set.

Rumors On NVIDIA's New China AI AcceleratorsOn my flight back from NY this evening, I read a few of the rumor articles regarding a new set of accelerator cards in excruciating detail that supposedly 'circumvent the rules' and thought I would weigh in...

So, the USG DoC came… pic.twitter.com/1x6IwWrUiC

— Patrick Moorhead (@PatrickMoorhead) November 10, 2023

According to the latest report by Dylan Patel of Semianalysis, it looks like NVIDIA has plans to launch at least three new AI GPUs for the Chinese market which include the H20 SXM, PCIe L20, and the PCIe L2. All of these chips are based on the Hopper GPU architecture and will feature a maximum theoretical performance of 296 TFLOPs.

The exact specifications of these GPU configurations of these chips aren't known yet but the Hopper H20 SMX features 96 GB memory capacities operating at up to 4.0 Tb/s, 296 TFLOPs Compute power, and using the GH100 die with a performance density of 2.9 TFLOPs/die versus H100's 19.4. The H100 SXM is 6.68x faster than the H20 SXM as per the listed table but those are FP16 Tensor Core FLOPs (with Sparsity) and not the INT8 or FP8 FLOPs. The GPU has a 400W TDP and features 8-way configurations in an HGX solution. It retains the 900 GB/s NVLINK connection & also offers 7-Way MIG (Multi-Instance GPU) functionality.

NVIDIA H100 SXM TF16 (Sparsity) FLOPS = 1979

NVIDIA H20 SXM TF16 (Sparsity) FLOPS = 296

The NVIDIA L20 comes with 48 GB of memory and a peak of 239 TFLOPs of compute performance while the L2 is configured with 24 GB of memory and a peak of 193 TFLOPs of compute power. The GPUs come in PCIe form factors making them a viable solution for office-based workstations and servers. These are much more cut-down configurations than what the Chinese customers were getting before in the form of the H800 and A800 but it looks like the NVIDIA software stack for AI and HPC is just too valuable to give up for some of these customers and they will be willing to take the reduced specs in order to get access to these modern Hopper architectures.

L40 TF16 (Sparsity) FLOPs = 362

L20 TF16 (Sparsity) FLOPs = 239

L4 TF16 (Sparsity) FLOPs = 242

L2 TF16 (Sparsity) FLOPs = 193

Also, while they are cut down from a traditional compute point of view, the report states that in LLM inferencing, the H20 SXM will actually be faster than the H100 since it shares similarities to next year's H200. This would suggest that at least one part of the GPU isn't that cut down versus the rest of the chip. The NVIDIA HGX H20 SXM chip and the L20 PCIe GPU are expected to launch by December 2023 while the L2 PCIe accelerator will be available in January 2024. Product sampling is going to commence one month earlier than release.

Furthermore, one of the China specific GPUs is over 20% faster than the H100 in LLM inference, and is more similar to the new GPU that Nvidia is launching early next year than to the H100!

via SemiAnalysis

NVIDIA has yet to officially announce these chips but they are likely going to keep it under wraps and silently update their partners about the plans rather than making it a full-blown AI product announcement. The recent restrictions imposed by the US government on China have prompted the Chinese to search for AI alternatives (listed here).

NVIDIA knows the potential that some of these competing companies hold and will try to support its Chinese customer base as much as possible while conforming to US regulations. The company also has a massive demand for AI across the globe and despite a recent cancellation of $5 Billion US worth of orders, the green team will just reallocate its supply elsewhere so that customers who previously had to wait more than a year to get hands-on GPUs can now get their hands on NVIDIA's AI gold early.