Laravel

Hugging Face Unveils Idefics2: An Open-Source 8B Vision-Language Model

April 26, 2026 · 4:33 PM

Hugging Face has released Idefics2, an open-source 8-billion-parameter vision-language model designed to process both images and text. The model, which builds on the original Idefics, achieves strong performance on multimodal tasks while remaining accessible to the community.

Idefics2 is built on Mistral-7B and uses a visual encoder from SigLIP, allowing it to understand and generate responses involving visual content. The model is available under the Apache 2.0 license, making it free for both research and commercial use.

Key features include:

Multimodal understanding: handles image and text inputs
8B parameters: a balance between performance and efficiency
Open source: fully accessible model weights and code

The release includes a base version and an instruct-tuned variant, the latter optimized for following user instructions. Hugging Face provided benchmarks showing Idefics2 competes with larger models on various vision-language tasks.

"Idefics2 is a significant step for open multimodal AI," said the Hugging Face team. "We aim to democratize access to powerful vision-language models."

The model can be used for tasks like image captioning, visual question answering, and document understanding. The community can access Idefics2 through the Hugging Face Hub and integrate it into their projects.

Hugging Face Unveils Idefics2: An Open-Source 8B Vision-Language Model

We Care About Your Privacy

How and why we process data