DailyGlimpse

Smol-Audio: Open-Source Notebook Collection Simplifies Fine-Tuning of Leading Audio AI Models

AI
April 30, 2026 · 1:48 AM
Smol-Audio: Open-Source Notebook Collection Simplifies Fine-Tuning of Leading Audio AI Models

Audio AI is advancing rapidly, with models like OpenAI's Whisper, NVIDIA's Parakeet, Mistral's Voxtral, and NVIDIA's Audio Flamingo 3 pushing the boundaries of speech recognition and understanding. However, practical knowledge for fine-tuning these models remains scattered. Smol-Audio, a new open-source repository from the Deep-unlearning team, aims to bridge this gap by providing a collection of self-contained Jupyter notebooks designed for Google Colab.

Each notebook targets a specific audio AI task, such as fine-tuning Whisper for automatic speech recognition (ASR) or adapting Parakeet using CTC architecture. The notebooks are built on the Hugging Face ecosystem (transformers, datasets, peft, accelerate) and are optimized to run on a free 16 GB Colab runtime. By exposing every step of the process—from data pipelines to training loops—Smol-Audio makes it easy for ML engineers to learn, modify, and deploy audio models without relying on hidden abstractions.

The repository covers ASR fine-tuning for four model families, each requiring distinct handling: Whisper (sequence-to-sequence), Parakeet (CTC with LoRA support), Voxtral (LLM-based), and Granite Speech. Additionally, notebooks for Audio Flamingo 3 demonstrate zero-shot audio classification and video understanding via the Perception Encoder Audiovisual (PE-AV). With its flat, transparent design, Smol-Audio is a practical resource for both newcomers and experienced practitioners looking to harness the latest in audio AI.