DeepSeek V4 Pro, the latest open-weight AI model from China, has been evaluated by the Center for AI Standards and Innovation (CAISI), a NIST-affiliated body. The report finds the model trails leading US frontier AI by about eight months in capability, though it offers significant cost advantages.
In independent testing that included non-public benchmarks in cybersecurity and software engineering, DeepSeek V4 Pro’s performance matched that of earlier US models, such as GPT-5, rather than the more recent GPT-5.4 as the company had claimed. Despite the gap, the model proved highly cost-efficient—on five of seven benchmarks it was 53% cheaper than GPT-5.4 mini, with only one task costing 41% more.
The study used Item Response Theory to aggregate results across five domains. DeepSeek V4 Pro excels in mathematics but lags in abstract reasoning and agent-based evaluations. CAISI noted it is the most capable model from the People’s Republic of China to date, but still behind the pace set by US frontier labs.