Apple's latest Core ML optimizations, unveiled at WWDC 2023, dramatically accelerate Stable Diffusion on Apple devices, reducing inference time and memory usage. The update introduces 6-bit palettization, a compression technique that shrinks model weights from 16-bit to 6-bit per parameter without significant precision loss. This, combined with improved attention layer optimizations for the Neural Engine, yields up to 30% faster performance on devices like the iPhone 13.
Developers can now convert and quantize Stable Diffusion models using Apple's open-source ml-stable-diffusion tool, which supports bit depths from 8 down to 2 bits. For optimal balance, 6-bit quantization is recommended. The compressed weights are decompressed on-the-fly during inference, reducing memory transfer bottlenecks.
Apple has also released pre-converted Core ML models for Stable Diffusion 1.4, 1.5, 2-base, and 2.1-base on Hugging Face, with both uncompressed and 6-bit palettized variants available. These require iOS/iPadOS 17 or macOS 14 Sonoma betas.
These advancements make on-device AI more practical for developers building apps that prioritize privacy, offline use, and cost savings.