Laravel

The UK's AI Security Institute (AISI) has released findings from its evaluation of OpenAI's GPT-5.5, revealing that the model performs on par with Anthropic's Claude Mythos Preview in cyberattack simulations. The assessment suggests that the advanced cyber capabilities initially observed in Claude Mythos are part of a broader trend in AI development, driven by improvements in autonomy, reasoning, and coding.

In isolated expert-level security tasks, GPT-5.5 achieved an average success rate of 71.4 percent, slightly edging out Claude Mythos Preview at 68.6 percent. While the difference falls within the statistical margin of error, GPT-5.5 may be the strongest model tested to date. For context, GPT-5.4 scored 52.4 percent and Claude Opus 4.7 achieved 48.6 percent. All current frontier models have fully solved basic tasks since at least February 2026.

Beyond isolated tasks, AISI tested the models on multi-step network attack simulations. In the "The Last Ones" (TLO) simulation—a 32-step scenario involving over 20 hosts across four subnets—GPT-5.5 successfully completed the entire attack in 2 out of 10 attempts. Claude Mythos Preview achieved this in 3 out of 10. The simulation required the AI to find vulnerabilities, steal credentials, move laterally, and reach a protected database—a task estimated to take a human expert about 20 hours.

However, the tests were conducted without active defenders or security monitoring, meaning real-world performance against well-defended systems remains uncertain. A second simulation, "Cooling Tower," which models an attack on industrial control systems, remains unsolved by any model.

AISI also identified a universal jailbreak that bypassed all safety measures in GPT-5.5, including multi-step agent scenarios. The exploit took just six hours to develop. OpenAI subsequently deployed safety updates, but AISI could not verify their effectiveness due to a configuration issue in the deployed version. This highlights ongoing vulnerabilities in even the most capable AI systems.

Unlike Claude Mythos, which Anthropic has kept limited to a small group, GPT-5.5 is already available in ChatGPT and via API. The AISI results suggest that Anthropic's cautious approach may not be solely driven by safety concerns but could also reflect compute constraints.

GPT-5.5 Matches Claude Mythos in Cyber Attack Evaluations, UK AI Security Institute Reports

We Care About Your Privacy

How and why we process data