DailyGlimpse

BigCode Releases StarCoder2 and Expanded Stack v2 Dataset for Code AI

AI
April 26, 2026 · 4:35 PM
BigCode Releases StarCoder2 and Expanded Stack v2 Dataset for Code AI

The BigCode project, a collaborative effort between Hugging Face and ServiceNow, has announced the release of StarCoder2 and The Stack v2. StarCoder2 is a family of open-source large language models (LLMs) specialized for code generation, trained on The Stack v2—a significantly expanded dataset of permissively licensed source code.

StarCoder2 comes in three sizes: 3B, 7B, and 15B parameters, each trained on 3.3 trillion to 4 trillion tokens. The models outperform previous versions and many comparably sized baselines on code completion, bug fixing, and explanation tasks. The Stack v2 includes over 600 programming languages and 67 terabytes of code, more than triple the size of the original Stack dataset.

Both models and dataset are released under permissive licenses (Apache 2.0 for the models, various permissive licenses for the data), enabling broad use in research and commercial applications. BigCode emphasizes transparency, providing detailed documentation and training pipelines.