Nous Research, the open-source AI startup backed by Paradigm, released NousCoder-14B, a competitive programming model that matches or exceeds larger proprietary systems, trained in just four days on 48 Nvidia B200 GPUs. The model achieves 67.87% accuracy on LiveCodeBench v6, a 7.08-point improvement over its base Qwen3-14B model.
This release comes as Anthropic's Claude Code dominates developer discussions with demonstrations of end-to-end software development. Google engineer Jaana Dogan noted Claude Code approximated a year-long project from a three-paragraph prompt. Nous Research bets open-source models trained on verifiable problems can close the gap.
Radical Openness and Reproducibility
Nous Research published not only the model weights but the complete reinforcement learning environment, benchmark suite, and training harness on its Atropos framework, enabling anyone with sufficient compute to reproduce the work.
Researcher Joe Li, a former competitive programmer, trained the model. He compared its improvement trajectory to his own Codeforces journey: NousCoder-14B advanced from an estimated 1600-1750 rating to 2100-2200 in four days, a leap that took him two years of practice. However, Li solved about 1,000 problems during his training, while the model required 24,000.
Training Process and Techniques
The training used verifiable rewards: the model generates code, which is executed against test cases for a binary pass/fail signal. Nous used Modal for parallel sandboxed execution. Each problem contains hundreds of test cases, with constraints of 15 seconds and 4 GB memory.
Training employed DAPO (Dynamic Sampling Policy Optimization), which discards examples where the model solves all or none attempts. The pipeline overlaps inference and verification for maximum GPU utilization.
Data Scarcity Looming
The training dataset encompasses "a significant portion of all readily available, verifiable competitive programming problems," approaching the limits of high-quality training data for this domain. Li noted the total number of such problems on the Internet is roughly the same order of magnitude as the 24,000 used.