Hugging Face has launched a new inference endpoint combining automatic speech recognition (ASR), speaker diarization, and speculative decoding for real-time transcription. The pipeline leverages Whisper for ASR, pyannote for diarization, and a custom speculative decoding technique to speed up inference by up to 3x without losing accuracy. This integration allows developers to deploy multilingual, speaker-aware transcription with low latency, making it ideal for meeting transcription, call analysis, and media captioning. The endpoints are available now on Hugging Face's inference infrastructure.
Hugging Face Unveils Fast, Accurate Speech Recognition Pipeline with Speaker Diarization
AI
April 26, 2026 · 4:32 PM