Meta has released Code Llama, a family of state-of-the-art, open-access large language models specialized for coding tasks. Built on Llama 2, these models are now integrated into the Hugging Face ecosystem, making them accessible for both research and commercial use under a permissive license.
Code Llama comes in three sizes: 7 billion, 13 billion, and 34 billion parameters. The base models were initialized from Llama 2 and further trained on 500 billion tokens of code data. Two specialized variants are available: a Python specialist trained on an additional 100 billion tokens, and an instruction-tuned version that can understand and follow natural language commands.
These models achieve state-of-the-art performance across multiple programming languages, including Python, C++, Java, PHP, C#, TypeScript, and Bash. The 7B and 13B versions support code infilling, allowing them to generate code based on surrounding context—ideal for use as an AI coding assistant.
Code Llama was trained with a context window of 16,000 tokens, but through advanced RoPE scaling techniques, it can extrapolate to handle up to 100,000 tokens. This long-context capability enables it to work with larger codebases and more complex tasks.
Hugging Face has rolled out comprehensive support for Code Llama, including integration with Transformers (version 4.33+), Text Generation Inference for production-ready deployment, Inference Endpoints, and a VS Code extension. Users can try a live demo in a dedicated Hugging Face Space or access the 34B instruction-tuned model via HuggingChat.
"Code LLMs are an exciting development for software engineers because they can boost productivity through code completion in IDEs, take care of repetitive tasks, or create unit tests," noted the Hugging Face team.
Developers can leverage the full Hugging Face toolkit for training, fine-tuning, and inference, including support for 4-bit quantization via bitsandbytes and parameter-efficient fine-tuning techniques.