Cohere has unveiled North Mini Code, a 30-billion-parameter Mixture-of-Experts (MoE) model that activates only 3 billion parameters per token, enabling it to solve complex coding challenges with remarkable efficiency. In a live demonstration on a RunPod H100 GPU using vLLM, the model successfully solved four hard LeetCode problems: LRU Cache, Trapping Rain Water, Course Schedule II, and Serialize Binary Tree.
The model uses FP8 quantization and an MoE architecture, which keeps memory usage low while maintaining high accuracy. Cohere North Mini Code is designed to be fast and resource-efficient, making it suitable for real-time coding assistance and competitive programming. The live demo showcased its ability to handle intricate algorithmic tasks without relying on massive compute resources.
This release highlights a growing trend in AI: creating smaller, more specialized models that compete with larger counterparts by optimizing parameter efficiency. Cohere North Mini Code's performance on hard LeetCode problems suggests that MoE architectures can deliver strong coding capabilities without the overhead of full-scale models.