How to Code Directly With Stable Diffusion Within Python On RunPod

How to Code Directly With Stable Diffusion Within Python On RunPod

While there are many useful front ends for prompting Stable Diffusion, in some ways it can be easier to simply it directly within Jupyter Notebook, which comes pre-installed within many RunPod templates. Once you spin up a pod you get instant access to Jupyter as well, allowing you to directly communicate and create images without even having to install a front end at all. Let's go through some easy to use code examples that you can copy and paste straight into your notebook and modify to your needs.

Getting started

Go ahead and deploy a pod with the official RunPod Stable Diffusion template to get started. If you need a refresher on how to do that, feel free to refer to our docs here. Once you've got the pod created, then go to your Pods page and then connect to the pod on port 8888. You'll be able to copy and paste any of the below Python examples directly into your notebook.

Stable Diffusion 1.5

from diffusers import StableDiffusionPipeline, DDPMScheduler
import torch

#Downloads required model from HF
pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4",
                                               variant="fp16", torch_dtype=torch.float16) 

pipe.to("cuda") # Sends to GPU
prompt = "A woman smiling while standing in a field" # Text prompt
scheduler = DDPMScheduler(beta_start=0.00085, beta_end=0.012,
                          beta_schedule="scaled_linear")
image = pipe(
    prompt,
    scheduler=scheduler,
    num_inference_steps=30,
    guidance_scale=7.5,
).images[0]
image.save("woman.png")

This will automatically download a model from Huggingface and allow you to send a prompt to the diffuser models, which will be placed in the same directory as your notebook. There are four major steps within this process:

  • Downloads the model from Huggingface if necessary
  • Passes parameters to the scheduler
  • Pipes the prompt, scheduler type, and any additional variables to the model, such as CFG and number of denoising steps
  • Saves the image

Stable Diffusion XL

Here's another example for Stable Diffusion XL that will do the same while utilizing the refiner model, which adds an extra step:

from diffusers import DiffusionPipeline
import torch

# Load the base model
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", 
    torch_dtype=torch.float16, 
    variant="fp16", 
    use_safetensors=True
)
base.to("cuda")

# Load the refiner model
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

# Set the prompt
prompt = "A woman smiling while standing in a field"

# Generate the initial image using the base model
image = base(
    prompt=prompt,
    num_inference_steps=50,
    denoising_end=0.8,
    output_type="latent",
).images

# Refine the image using the refiner model
image = refiner(
    prompt=prompt,
    num_inference_steps=50,
    denoising_start=0.8,
    image=image,
).images[0]

# Save the final image
image.save("woman_sdxl.png")

So why is this relevant? Isn't it just easier to use a UI?

Not always. User interfaces are great at simplifying the creation process and displaying all available options available, but where they're generally not so useful is something that requires an iterative process.

Let's say you have an assignment where you want to generate a version of the same image, but you want to use the same seed and you want to save a version at every fifth denoising step. You could still do this manually, but it would be a huge pain. You'd need to move the slider, click generate, move the slider, click generate, etc. And while this may be feasible at some low number of iterations, its feasibility depends on there just not being a high number of iterations, and that is not a great position to be in, because sooner or later it's going to become an issue again.

Let's rerun the SDXL with refiner code to accomplish this goal, with an assist from the Pillow library to stitch all the resultant images together into a single version for comparison.

import torch
from diffusers import DiffusionPipeline
from pathlib import Path
import os
import gc
from PIL import Image

def setup_models():
    # Load the base model
    base = DiffusionPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0", 
        torch_dtype=torch.float16, 
        variant="fp16", 
        use_safetensors=True
    )
    base.to("cuda")

    # Load the refiner model
    refiner = DiffusionPipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-refiner-1.0",
        text_encoder_2=base.text_encoder_2,
        vae=base.vae,
        torch_dtype=torch.float16,
        use_safetensors=True,
        variant="fp16",
    )
    refiner.to("cuda")

    return base, refiner

def generate_image(base, refiner, prompt, seed, num_inference_steps):
    # Set the seed for reproducibility
    generator = torch.Generator(device="cuda").manual_seed(seed)
    
    # Generate the initial image using the base model
    latents = base(
        prompt=prompt,
        num_inference_steps=num_inference_steps,
        denoising_end=0.8,
        output_type="latent",
        generator=generator,
    ).images

    # Refine the image using the refiner model
    image = refiner(
        prompt=prompt,
        num_inference_steps=num_inference_steps,
        denoising_start=0.8,
        image=latents,
        generator=generator,
    ).images[0]

    return image

def cleanup_gpu_memory():
    gc.collect()
    torch.cuda.empty_cache()

def stitch_images(image_folder, output_filename, images_per_row=4):
    # Get all PNG images in the folder
    image_files = sorted([f for f in os.listdir(image_folder) if f.endswith('.png')])
    
    if not image_files:
        print("No images found in the specified folder.")
        return

    # Open the first image to get dimensions
    with Image.open(os.path.join(image_folder, image_files[0])) as img:
        img_width, img_height = img.size

    # Calculate the size of the stitched image
    num_images = len(image_files)
    num_rows = (num_images + images_per_row - 1) // images_per_row
    stitched_width = img_width * images_per_row
    stitched_height = img_height * num_rows

    # Create a new blank image
    stitched_image = Image.new('RGB', (stitched_width, stitched_height))

    # Paste each image into the stitched image
    for index, image_file in enumerate(image_files):
        with Image.open(os.path.join(image_folder, image_file)) as img:
            row = index // images_per_row
            col = index % images_per_row
            stitched_image.paste(img, (col * img_width, row * img_height))

    # Save the stitched image
    stitched_image.save(output_filename)
    print(f"Stitched image saved as {output_filename}")

def main():
    base, refiner = setup_models()
    prompt = "A serene landscape with a mountain reflected in a calm lake at sunset"
    seed = 42  # You can change this seed if you want
    output_dir = Path("sdxl_progressive_denoising")
    output_dir.mkdir(exist_ok=True)

    for steps in range(5, 101, 5):
        print(f"Generating image with {steps} denoising steps...")
        image = generate_image(base, refiner, prompt, seed, steps)
        image.save(output_dir / f"image_{steps:03d}_steps.png")

    print("Image generation complete. Check the 'sdxl_progressive_denoising' folder for results.")

    # Stitch images together
    stitch_images(output_dir, output_dir / "stitched_progression.png")

    # Unload models and free up VRAM
    del base
    del refiner
    cleanup_gpu_memory()
    print("Models unloaded and VRAM freed.")

if __name__ == "__main__":
    main()

More than half of this code block is just using Pillow to create the final demonstration image. To complete the actual task set out, it only took a few more lines. So you can see how quickly you can accomplish the actual task with just a few lines of code, rather than sitting there for minutes, waiting on a generation to finish so you can adjust a slider and then click the button again. All it takes is a pretty elementary loop to get the job done.

For reference, this is the final result, and you can see where an assignment like this has value. Do you want the more ethereal looking lower step image, or the more defined higher step image? Here, you can explicitly see them all together for a side by side comparison. How many options might you have missed by just playing with a slider, generating a few images, and calling it a day?

What about CivitAI?

You can absolutely download a model from CivitAI if you prefer. The only thing to note is that you need to set up a free API key to download models from there. Here is the code modified to download a model from there instead, just feed in the API key that you created on your Account page.

import torch
from diffusers import StableDiffusionXLPipeline
from pathlib import Path
import os
import gc
from PIL import Image
import requests

# CivitAI API key (replace with your actual key)
CIVITAI_API_KEY = "YOUR_API_KEY_HERE"

def download_model(model_url, save_path):
    headers = {"Authorization": f"Bearer {CIVITAI_API_KEY}"}
    response = requests.get(model_url, headers=headers, stream=True)
    response.raise_for_status()
    
    with open(save_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)

def setup_model(model_url, local_file_path):
    if not os.path.exists(local_file_path):
        print(f"Downloading model from CivitAI...")
        download_model(model_url, local_file_path)
    
    pipe = StableDiffusionXLPipeline.from_single_file(
        local_file_path,
        torch_dtype=torch.float16,
        use_safetensors=True,
        variant="fp16"
    )
    pipe.to("cuda")
    return pipe

def generate_image(pipe, prompt, seed, num_inference_steps):
    generator = torch.Generator(device="cuda").manual_seed(seed)
    
    image = pipe(
        prompt=prompt,
        num_inference_steps=num_inference_steps,
        generator=generator,
    ).images[0]

    return image

def cleanup_gpu_memory():
    gc.collect()
    torch.cuda.empty_cache()

def stitch_images(image_folder, output_filename, images_per_row=4):
    image_files = sorted([f for f in os.listdir(image_folder) if f.endswith('.png')])
    
    if not image_files:
        print("No images found in the specified folder.")
        return

    with Image.open(os.path.join(image_folder, image_files[0])) as img:
        img_width, img_height = img.size

    num_images = len(image_files)
    num_rows = (num_images + images_per_row - 1) // images_per_row
    stitched_width = img_width * images_per_row
    stitched_height = img_height * num_rows

    stitched_image = Image.new('RGB', (stitched_width, stitched_height))

    for index, image_file in enumerate(image_files):
        with Image.open(os.path.join(image_folder, image_file)) as img:
            row = index // images_per_row
            col = index % images_per_row
            stitched_image.paste(img, (col * img_width, row * img_height))

    stitched_image.save(output_filename)
    print(f"Stitched image saved as {output_filename}")

def main():
    # Replace with your CivitAI model URL
    model_url = "https://civitai.com/api/download/models/712398"
    local_file_path = "civitai_model.safetensors"

    pipe = setup_model(model_url, local_file_path)
    prompt = "A serene landscape with a mountain reflected in a calm lake at sunset"
    seed = 42  # You can change this seed if you want
    output_dir = Path("sdxl_progressive_denoising")
    output_dir.mkdir(exist_ok=True)

    for steps in range(5, 101, 5):
        print(f"Generating image with {steps} denoising steps...")
        image = generate_image(pipe, prompt, seed, steps)
        image.save(output_dir / f"image_{steps:03d}_steps.png")

    print("Image generation complete. Check the 'sdxl_progressive_denoising' folder for results.")

    stitch_images(output_dir, output_dir / "stitched_progression.png")

    del pipe
    cleanup_gpu_memory()
    print("Model unloaded and VRAM freed.")

if __name__ == "__main__":
    main()

Conclusion

Although the code above looks intimidating, in truth it was generated in large part by coding assistants. Using a code copilot in conjunction with Jupyter while running your pod may actually save you time, and that's something we'll get into shortly in a new article. Now it's your turn - create something new and show us what you made on our Discord, where we'll also be happy to answer any questions you might have on the process!