Laravel

Anthropic Unveils BioMysteryBench: Claude Matches Human Experts on Real Bioinformatics Puzzles

April 30, 2026 · 1:33 PM

Anthropic has introduced a new benchmark called BioMysteryBench, designed to evaluate how well AI models like Claude perform on real-world bioinformatics tasks. The benchmark comprises 99 questions based on genuine, noisy datasets, requiring models to apply practical research skills rather than just recall facts.

Unlike existing tests that focus on factual knowledge or simulated environments, BioMysteryBench tasks involve identifying biological signals from messy data—for example, determining which organ a single-cell RNA sample came from or pinpointing a knocked-out gene. Claude is given access to bioinformatics tools and external databases, and only the final answer is scored.

The results show that Claude's latest model, Mythos Preview, matches human experts on problems that experts could solve, achieving 82.6% accuracy. On problems that stumped all human experts, Claude still achieved a 30% success rate. However, the model's performance on the hardest tasks is inconsistent: successes often occur in only one or two out of five attempts, indicating reliance on a lucky strategy rather than a reproducible method.

An independent benchmark from Genentech and Roche, CompBioBench, has produced similar findings. BioMysteryBench is publicly available on Hugging Face.

Anthropic Unveils BioMysteryBench: Claude Matches Human Experts on Real Bioinformatics Puzzles

We Care About Your Privacy

How and why we process data