DailyGlimpse

FlashQLA: Qwen Team's New Library Speeds Up Linear Attention 3x on NVIDIA Hopper GPUs

AI
April 30, 2026 · 2:31 PM

The Qwen Team has released FlashQLA, a high-performance kernel library designed to accelerate linear attention mechanisms. FlashQLA achieves up to a 3x speedup on NVIDIA Hopper GPUs, specifically optimized for the Gated Delta Network (GDN) attention mechanism. This breakthrough promises to enhance efficiency in models relying on linear attention, reducing computation times and enabling faster inference and training.