FlashQLA: Qwen Team's New Library Speeds Up Linear Attention 3x on NVIDIA Hopper GPUs

April 30, 2026 · 2:31 PM

The Qwen Team has released FlashQLA, a high-performance kernel library designed to accelerate linear attention mechanisms. FlashQLA achieves up to a 3x speedup on NVIDIA Hopper GPUs, specifically optimized for the Gated Delta Network (GDN) attention mechanism. This breakthrough promises to enhance efficiency in models relying on linear attention, reducing computation times and enabling faster inference and training.

← More AI View original

FlashQLA: Qwen Team's New Library Speeds Up Linear Attention 3x on NVIDIA Hopper GPUs

We Care About Your Privacy

How and why we process data