One-step Diffusion with
Distribution Matching Distillation

Tianwei Yin¹, Michaël Gharbi², Richard Zhang², Eli Shechtman²,
Frédo Durand¹, William T. Freeman¹, Taesung Park²

¹Massachusetts Institute of Technology, ²Adobe Research

CVPR 2024

Also check out DMD2 at https://tianweiy.github.io/dmd2/

Paper Video Slides Poster

Our one-step generator achieves comparable image quality with StableDiffusion v1.5 while being 30x faster.

Diffusion models are known to approximate the score function of the distribution they are trained on. In other words, an unrealistic synthetic image can be directed toward higher probability density region through the denoising process (see SDS). Our core idea is training two diffusion models to estimate not only the score function of the target real distribution, but also that of the fake distribution. We construct a gradient update to our generator as the difference between the two scores, essentially nudging the generated images toward higher realism as well as lower fakeness (see VSD). Our method is similar to GANs in that a critic is jointly trained with the generator to minimize a divergence between the real and fake distributions, but differs in that our training does not play an adversarial game that may cause training instability, and our critic can fully leverage the weights of a pretrained diffusion model. Combined with a simple regression loss to match the output of the multi-step diffusion model, our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k, comparable to Stable Diffusion but orders of magnitude faster. Utilizing FP16 inference, our model generates images at 20 FPS on modern hardware.

DMD Method Overview

Method Overview

We train one-step generator G_θ to map random noise z into a realistic image. To match the multi-step sampling outputs of the diffusion model, we pre-compute a collection of noise--image pairs, and occasionally load the noise from the collection and enforce LPIPS regression loss between our one-step generator and the diffusion output. Furthermore, we provide distribution matching gradient ∇_θ D_KL to the fake image to enhance realism. We inject a random amount of noise to the fake image and pass it to two diffusion models, one pretrained on the real data and the other continually trained on the fake images with a diffusion loss, to obtain its denoised versions. The denoising scores (visualized as mean prediction in the plot) indicate directions to make the images more realistic or fake. The difference between the two represents the direction toward more realism and less fakeness and is backpropagated to the one-step generator.

Comparison to Stable Diffusion

Medium shot side profile portrait photo of a warrior chief, sharp facial features, with tribal panther makeup in blue on red, looking away, serious but clear eyes, 50mm portrait, photography, hard rim lighting photography

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

a hyperrealistic photo of a fox astronaut; perfect face, artstation

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

a DSLR photo of a golden retriever in heavy snow

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

a Lightshow at the Dolomities

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

the giant magical deer god of the forest, sniffing flowers on the forest floor. Fireflies evereywhere. A spring of water. Long moss hanging from the tree branches. Moonlight. Photorealism, cinematic shot, cinematic lighting, National Geographic, analagous colors, Award-winning photography

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

3D render baby parrot, Chibi, adorable big eyes. In a garden with butterflies, greenery, lush, whimsical and soft, magical, octane render, fairy dust

Instaflow Image

SD (50 steps)
2590ms

DMD Image

Ours (1 step)
90ms

Comparison to Other Diffusion Distillation Methods

close-up photo of a unicorn in a forest, in a style of movie still

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

amazing photograph of a labrador retriever chasing a tennis ball under water, fisheye lens, close up portrait, crazy image

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

wise old man with a white beard in the enchanted and magical forest

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

macro photo of a miniature toy sloth drinking a soda, shot on a light pastel cyclorama

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

Astronaut on a camel on mars

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

a high-resolution photo of an orange Porsche under sunshine

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

an underwater photo portrait of a beautiful fluffy white cat, hair floating. In a dynamic swimming pose. The sun rays filters through the water. High-angle shot. Shot on Fujifilm X

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

3D animation cinematic style young caveman kid, in its natural environment

SD Image

SD (50 steps)
2590ms

Instaflow Image

Instaflow (1 step)
90ms

LCM-LoRA Image

LCMv1.5 (2 steps)
120ms

DMD Image

Ours (1 step)
90ms

BibTeX

@inproceedings{yin2024onestep,
      title={One-step Diffusion with Distribution Matching Distillation},
      author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung},
      booktitle={CVPR},
      year={2024}
    }