Researchers have introduced T2I-Adapter-SDXL, a compact plug-and-play model that brings powerful control to Stable Diffusion XL (SDXL) while dramatically reducing computational costs. Unlike ControlNet, which requires running a large model at each denoising step, T2I-Adapter runs only once per generation, making it up to 94% smaller in storage and significantly faster.
The adapter, with just 79 million parameters (compared to ControlNet's 1.25 billion), supports multiple conditioning modes including sketch, Canny edge, lineart, depth, and OpenPose. It was trained on 3 million high-resolution image-text pairs and achieves a competitive balance between speed, memory usage, and output quality.
Developed in collaboration between the Diffusers team and T2I-Adapter authors, the model is now available in the diffusers library. Users can easily integrate it into their workflows by loading the adapter and pipeline, then providing a control image and prompt.
Key parameters like adapter_conditioning_scale and adapter_conditioning_factor allow fine-tuning the influence of the conditioning signal. A demo is available on Hugging Face Spaces, including a sketch-to-image tool called Doodly.