Laravel

Stable Diffusion is a cutting-edge text-to-image model that turns simple text prompts into impressive visuals. Developed by a collaboration between CompVis, Stability AI, and LAION, it was trained on a massive dataset of 512x512 images from the LAION-5B database. This article walks through how to harness its power using the Hugging Face Diffusers library.

Getting Started

First, you'll need to install the necessary packages:

pip install diffusers==0.10.2 transformers scipy ftfy accelerate

With the library installed, you can load the Stable Diffusion model with just a few lines of code:

from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")

If you're using a GPU, move the pipeline to it for faster generation:

pipe.to("cuda")

For systems with limited GPU memory (under 10 GB), it's recommended to load the model in half-precision (float16) to save memory:

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16)

Generating Images

To create an image, simply define your prompt and call the pipeline:

prompt = "a photograph of an astronaut riding a horse"
image = pipe(prompt).images[0]
image.save("astronaut_rides_horse.png")

Each run produces a unique image. If you ever get a black output, the built-in content filter may have flagged it as potentially NSFW. You can adjust your prompt or use a different seed to avoid this.

For reproducible results, set a random seed and pass a generator:

import torch
generator = torch.Generator("cuda").manual_seed(1024)
image = pipe(prompt, guidance_scale=7.5, generator=generator).images[0]

Controlling Output Quality

The guidance_scale parameter (default 7.5) controls how closely the image follows your prompt. Higher values lead to stronger adherence but can reduce diversity. The num_inference_steps parameter (default 50) affects quality: more steps generally yield better results, but take longer. For instance, using 15 steps often results in lower-quality images:

image = pipe(prompt, guidance_scale=7.5, num_inference_steps=15, generator=generator).images[0]

Understanding the Model

Stable Diffusion is a latent diffusion model that uses a UNet to denoise a random latent array, guided by a text prompt encoded via CLIP. The Diffusers library allows you to customize the pipeline by swapping components or adjusting parameters to suit your needs.

Whether you're an artist, a developer, or just curious, Stable Diffusion opens the door to AI-powered creativity. Try it out and see what you can generate!

Generating Art with AI: A Guide to Stable Diffusion and the Diffusers Library

Getting Started

Generating Images

Controlling Output Quality

Understanding the Model

We Care About Your Privacy

How and why we process data