MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Xie, Qingsong; Liao, Zhenyi; Deng, Zhijie; chen, Chen; Tang, Shixiang; Lu, Haonan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.05768v1 (cs)

[Submitted on 9 Jun 2024 (this version), latest version 12 Jun 2024 (v3)]

Title:MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Authors:Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen chen, Shixiang Tang, Haonan Lu

View PDF HTML (experimental)

Abstract:Distilling large latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest. However, the majority of existing methods face a dilemma where they either (i) depend on multiple individual distilled models for different sampling budgets, or (ii) sacrifice generation quality with limited (e.g., 2-4) and/or moderate (e.g., 5-8) sampling steps. To address these, we extend the recent multistep consistency distillation (MCD) strategy to representative LDMs, establishing the Multistep Latent Consistency Models (MLCMs) approach for low-cost high-quality image synthesis. MLCM serves as a unified model for various sampling steps due to the promise of MCD. We further augment MCD with a progressive training strategy to strengthen inter-segment consistency to boost the quality of few-step generations. We take the states from the sampling trajectories of the teacher model as training data for MLCMs to lift the requirements for high-quality training datasets and to bridge the gap between the training and inference of the distilled model. MLCM is compatible with preference learning strategies for further improvement of visual quality and aesthetic appeal. Empirically, MLCM can generate high-quality, delightful images with only 2-8 sampling steps. On the MSCOCO-2017 5K benchmark, MLCM distilled from SDXL gets a CLIP Score of 33.30, Aesthetic Score of 6.19, and Image Reward of 1.20 with only 4 steps, substantially surpassing 4-step LCM [23], 8-step SDXL-Lightning [17], and 8-step HyperSD [33]. We also demonstrate the versatility of MLCMs in applications including controllable generation, image style transfer, and Chinese-to-image generation.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.05768 [cs.CV]
	(or arXiv:2406.05768v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.05768

Submission history

From: Qingsong Xie [view email]
[v1] Sun, 9 Jun 2024 12:55:50 UTC (15,861 KB)
[v2] Tue, 11 Jun 2024 06:22:53 UTC (15,861 KB)
[v3] Wed, 12 Jun 2024 02:57:00 UTC (15,861 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MLCM: Multistep Consistency Distillation of Latent Diffusion Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators