DailyGlimpse

Würstchen: A Fast, Efficient New Diffusion Model for Image Generation

AI
April 26, 2026 · 4:42 PM
Würstchen: A Fast, Efficient New Diffusion Model for Image Generation

Researchers have introduced Würstchen, a novel diffusion model that achieves a 42x spatial compression of images, dramatically reducing computational costs for both training and inference. Unlike previous models that typically compress images 4-8 times, Würstchen uses a two-stage compression process—Stage A (a VQGAN) and Stage B (a Diffusion Autoencoder)—to encode images into a highly compressed latent space. A third component, Stage C (the Prior), is trained in this space, enabling faster and cheaper image generation.

Würstchen is particularly notable for its efficiency. It can generate high-resolution images (up to 1536x1536) much faster than models like Stable Diffusion XL while using less memory. For instance, Würstchen v1 required only 9,000 GPU hours for training at 512x512, compared to 150,000 GPU hours for Stable Diffusion 1.4—a 16x reduction. Würstchen v2 used 24,602 GPU hours for resolutions up to 1536, still 6x cheaper than SD1.4.

The model is available through the Diffusers library and can be used with a simple Python pipeline. Würstchen also benefits from optimizations like Flash Attention and Torch Compile. All checkpoints are hosted on the Hugging Face Hub, and the model supports features such as automatic SDPA attention, xFormers, model offloading, and prompt weighting.

Würstchen was trained on image resolutions between 1024x1024 and 1536x1536, with good results observed up to 1024x2048. Its fast adaptation to new resolutions suggests fine-tuning at higher resolutions would be computationally cheap.