Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Shao, Shitong; Dai, Xu; Li, Lujun; Chen, Huanran; Hu, Yang; Yin, Shouyi

Computer Science > Machine Learning

arXiv:2305.10769 (cs)

[Submitted on 18 May 2023 (v1), last revised 13 Dec 2024 (this version, v5)]

Title:Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Authors:Shitong Shao, Xu Dai, Lujun Li, Huanran Chen, Yang Hu, Shouyi Yin

View PDF HTML (experimental)

Abstract:Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accelerated sampling algorithms via knowledge distillation rely on pre-trained model weights and discrete time step scenarios, necessitating additional training sessions to achieve their goals. To address these issues, we propose the Catch-Up Distillation (CUD), which encourages the current moment output of the velocity estimation model ``catch up'' with its previous moment output. Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output, utilizing Runge-Kutta-based multi-step alignment distillation for precise ODE estimation while preventing asynchronous updates. Furthermore, we investigate the design space for CUDs under continuous time-step scenarios and analyze how to determine the suitable strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we obtain a FID of 2.80 by sampling in 15 steps under one-session training and the new state-of-the-art FID of 3.37 by sampling in one step with additional training. This latter result necessitated only 620k iterations with a batch size of 128, in contrast to Consistency Distillation, which demanded 2100k iterations with a larger batch size of 256. Our code is released at this https URL.

Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.10769 [cs.LG]
	(or arXiv:2305.10769v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2305.10769

Submission history

From: Shitong Shao [view email]
[v1] Thu, 18 May 2023 07:23:12 UTC (5,404 KB)
[v2] Sun, 21 May 2023 09:45:45 UTC (5,404 KB)
[v3] Tue, 30 May 2023 16:40:27 UTC (5,404 KB)
[v4] Tue, 13 Jun 2023 08:00:49 UTC (16,990 KB)
[v5] Fri, 13 Dec 2024 06:36:28 UTC (19,035 KB)

Computer Science > Machine Learning

Title:Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators