Deci Posts MLPerf Benchmarks of New NLP Model, Achieves A Massive 6.46x Gain With AMD EPYC Milan-X CPUs

Deci Posts MLPerf Benchmarks of New NLP Model, Achieves A Massive 6.46x Gain With AMD EPYC Milan-X CPUs 1

Deep learning company Deci is revealing the outcomes for its Natural Language Processing (NLP) inference model presented to the MLPerf Inference v2.1 benchmark suite which achieves up to a 6.46x performance gain using AMD's EPYC CPUs.

Developed by Deci’s Automated Neural Architecture Construction (AutoNAC) technology, the NLP model, dubbed DeciBERT-Large, ran on the Dell-PowerEdge-R7525-2 hardware using the AMD EPYC 7773X processor. The consequent model exceeded the throughput performance of the BERT-Large model by almost six and a half times more and earned a one percent accuracy boost. The improvement summarizes reductions in cloud cost, enabling more processes to operate on one machine for a portion of the time. It also enables groups to use a more cost-efficient machine while maintaining accurate throughput performance.

The new model was presented under the offline scenario in MLPerf’s open division in the BERT 99.9 category. The objective was to maximize throughput while preserving the accuracy within a 0.1% margin of error from the baseline, which is 90.874 F1 (SQUAD). The DeciBERT-Large model exceeded these goals, achieving a throughput of 116 QueriesPer Second (QPS) and an F1 accuracy score of 91.08. As you can see in the table below, the AMD EPYC 7773X Milan-X chip delivers up to a 6.46x performance bump over the BERT-Large model.

SQUAD (INT8)

ONNX Runtime

FP32

ONNX Runtime

INT8

Deci leveraged its proprietary automated Neural Architecture Construction technology (AutoNAC) engine to develop a new model architecture tailored for the EPYC AMD processor. AutoNAC, an algorithmic optimization engine forging best-in-class deep learning model architectures for any assignment, data set, and inference hardware, generally powers up to a five times increase in inference performance with similar or higher accuracy close to state-of-the-art neural models.

While the key optimization objective when generating the DeciBERT model was to optimize throughput, AutoNAC also managed to significantly reduce the model size – an important accomplishment with several benefits, including the ability to run multiple models on the same server and better utilize cache memory. These results confirm once again the exceptional performance of our AutoNAC technology, which applies to nearly any deep learning domain and inference hardware.

— Prof. Ran El-Yaniv, Deci’s chief scientist and co-founder

MLPerf gathers expert deep learning leaders to create fair and useful benchmarks for calculating the training and inference execution of ML hardware, software, and services.

Deci’s NLP inference acceleration directly decodes into reductions in cloud costs, allowing more processes to execute on the same machine in a smaller amount of time. It enables teams to use cost-efficient machines while retaining the same throughput performance. More increased throughput for some NLP applications, such as answering questions, means a better user experience as the queries are processed quickly, and insights can be rendered in real-time.

News Source: Deci