Laravel

NIST Evaluation: China’s DeepSeek V4 Pro Lags US AI by 8 Months, But Wins on Cost

May 3, 2026 · 1:54 PM

DeepSeek V4 Pro, the latest open-weight AI model from China, has been evaluated by the Center for AI Standards and Innovation (CAISI), a NIST-affiliated body. The report finds the model trails leading US frontier AI by about eight months in capability, though it offers significant cost advantages.

In independent testing that included non-public benchmarks in cybersecurity and software engineering, DeepSeek V4 Pro’s performance matched that of earlier US models, such as GPT-5, rather than the more recent GPT-5.4 as the company had claimed. Despite the gap, the model proved highly cost-efficient—on five of seven benchmarks it was 53% cheaper than GPT-5.4 mini, with only one task costing 41% more.

The study used Item Response Theory to aggregate results across five domains. DeepSeek V4 Pro excels in mathematics but lags in abstract reasoning and agent-based evaluations. CAISI noted it is the most capable model from the People’s Republic of China to date, but still behind the pace set by US frontier labs.

NIST Evaluation: China’s DeepSeek V4 Pro Lags US AI by 8 Months, But Wins on Cost

We Care About Your Privacy

How and why we process data