Laravel

The OpenMOSS team, in collaboration with MOSI.AI and the Shanghai Innovation Institute, has introduced MOSS-Audio, an open-source foundation model designed to unify speech, sound, music, and time-aware audio reasoning within a single system.

MOSS-Audio goes beyond simple transcription, offering capabilities such as speaker identification, emotion analysis, background sound detection, music understanding, and the ability to answer time-grounded questions like "What did the speaker say at the 2-minute mark?" This eliminates the need to stitch together multiple specialized models.

Key Capabilities

Speech & Content Understanding: Accurate transcription with word- and sentence-level timestamp alignment.
Speaker, Emotion & Event Analysis: Identifies speakers, analyzes emotions from tone and context, and detects acoustic events.
Scene & Sound Cue Extraction: Interprets background sounds to infer scene context.
Music Understanding: Analyzes style, emotion progression, and instrumentation.
Audio QA & Summarization: Handles questions and summaries across various audio types.
Complex Reasoning: Multi-hop reasoning powered by chain-of-thought training and reinforcement learning.

The model comes in four variants: MOSS-Audio-4B-Instruct, MOSS-Audio-4B-Thinking, MOSS-Audio-8B-Instruct, and MOSS-Audio-8B-Thinking. Instruct variants are optimized for direct instruction following, while Thinking variants excel at chain-of-thought reasoning. The 4B and 8B models use Qwen3-4B and Qwen3-8B LLM backbones, respectively, with total sizes around 4.6B and 8.6B parameters.

Architecture

MOSS-Audio follows a modular design with three components: an audio encoder, a modality adapter, and a large language model. The architecture enables seamless integration of various audio understanding tasks.

Open-source code and weights are available on GitHub for community use and further development.

OpenMOSS Unveils MOSS-Audio: A Unified Open-Source AI for Speech, Sound, Music, and Time-Aware Audio Reasoning

Key Capabilities

Architecture

We Care About Your Privacy

How and why we process data