research-article

Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models

Authors:

Weining QianAuthors Info & Claims

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

Pages 463 - 472

https://doi.org/10.1145/3583780.3614999

Published: 21 October 2023 Publication History

Abstract

In recent years, diffusion models have become the most popular and powerful methods in the field of image synthesis, even rivaling human artists in artistic creativity. However, the key issue currently limiting the application of diffusion models is its extremely slow generation process. Although several methods were proposed to speed up the generation process, there still exists a trade-off between efficiency and quality. In this paper, we first provide a detailed theoretical and empirical analysis of the generation process of the diffusion models based on schedulers. We transform the designing problem of schedulers into the determination of several parameters, and further transform the accelerated generation process into an expansion process of the linear subspace. Based on these analyses, we consequently propose a novel method called Optimal Linear Subspace Search (OLSS), which accelerates the generation process by searching for the optimal approximation process of the complete generation process in the linear subspaces spanned by latent variables. OLSS is able to generate high-quality images with a very small number of steps. To demonstrate the effectiveness of our method, we conduct extensive comparative experiments on open-source diffusion models. Experimental results show that with a given number of steps, OLSS can significantly improve the quality of generated images. Using an NVIDIA A100 GPU, we make it possible to generate a high-quality image by Stable Diffusion within only one second without other optimization techniques.

References

[1]

Åke Björck. 1990. Least squares methods. Handbook of numerical analysis, Vol. 1 (1990), 465--652.

[2]

John C Butcher. 2000. Numerical methods for ordinary differential equations in the 20th century. J. Comput. Appl. Math., Vol. 125, 1--2 (2000), 1--29.

Digital Library

[3]

Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, Vol. 34 (2021), 8780--8794.

[4]

Patrick Esser, Johnathan Chiu, Parmida Atighehchian, Jonathan Granskog, and Anastasis Germanidis. 2023. Structure and content-guided video synthesis with diffusion models. arXiv preprint arXiv:2302.03011 (2023).

[5]

Zhida Feng, Zhenyu Zhang, Xintong Yu, Yewei Fang, Lanxin Li, Xuyi Chen, Yuxiang Lu, Jiaxiang Liu, Weichong Yin, Shikun Feng, et al. 2023. ERNIE-ViLG 2.0: Improving text-to-image diffusion model with knowledge-enhanced mixture-of-denoising-experts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10135--10145.

[6]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM, Vol. 63, 11 (2020), 139--144.

Digital Library

[7]

Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Niu Minzhe, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, et al. 2022. Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark. Advances in Neural Information Processing Systems, Vol. 35 (2022), 26418--26431.

[8]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, Vol. 30 (2017).

Digital Library

[9]

Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, Vol. 33 (2020), 6840--6851.

[10]

Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.

[11]

Xun Huang, Yixuan Li, Omid Poursaeed, John Hopcroft, and Serge Belongie. 2017. Stacked generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5077--5086.

[12]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).

[13]

Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. 2022. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 26565--26577.

[14]

Andrew Kerr, Dan Campbell, and Mark Richards. 2009. QR decomposition on GPUs. In Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. 71--78.

Digital Library

[15]

Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. In International Conference on Learning Representations.

[16]

Jinglin Liu, Chengxi Li, Yi Ren, Feiyang Chen, and Zhou Zhao. 2022. Diffsinger: Singing voice synthesis via shallow diffusion mechanism. In Proceedings of the AAAI conference on artificial intelligence, Vol. 36. 11020--11028.

[17]

Luping Liu, Yi Ren, Zhijie Lin, and Zhou Zhao. 2021. Pseudo Numerical Methods for Diffusion Models on Manifolds. In International Conference on Learning Representations.

[18]

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022a. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, Vol. 35 (2022), 5775--5787.

[19]

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. 2022b. Dpm-solver: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022).

[20]

Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning. PMLR, 8162--8171.

[21]

Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier gans. In International conference on machine learning. PMLR, 2642--2651.

[22]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[23]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[24]

Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, and Honglak Lee. 2016. Generative adversarial text to image synthesis. In International conference on machine learning. PMLR, 1060--1069.

[25]

Phillip A Regalia. 1993. Numerical stability properties of a QR-based fast least squares algorithm. IEEE Transactions on Signal Processing, Vol. 41, 6 (1993), 2096--2109.

Digital Library

[26]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.

[27]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference, Munich, Germany, October 5--9, 2015, Proceedings, Part III 18. Springer, 234--241.

[28]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. 2022a. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, Vol. 35 (2022), 36479--36494.

[29]

Chitwan Saharia, Jonathan Ho, William Chan, Tim Salimans, David J Fleet, and Mohammad Norouzi. 2022b. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

Digital Library

[30]

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in neural information processing systems, Vol. 29 (2016).

[31]

Tim Salimans and Jonathan Ho. 2021. Progressive Distillation for Fast Sampling of Diffusion Models. In International Conference on Learning Representations.

[32]

Christoph Schuhmann, Romain Beaumont, Richard Vencu, Cade Gordon, Ross Wightman, Mehdi Cherti, Theo Coombes, Aarush Katta, Clayton Mullis, Mitchell Wortsman, et al. 2022. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems, Vol. 35 (2022), 25278--25294.

[33]

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning. PMLR, 2256--2265.

[34]

Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020. Denoising Diffusion Implicit Models. In International Conference on Learning Representations.

[35]

Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, Vol. 32 (2019).

[36]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2818--2826.

[37]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[38]

Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, and Thomas Wolf. 2022. Diffusers: State-of-the-art diffusion models. https://github.com/huggingface/diffusers.

[39]

Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. 2022. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations.

[40]

Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Yingxia Shao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2022. Diffusion models: A comprehensive survey of methods and applications. arXiv preprint arXiv:2209.00796 (2022).

[41]

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).

[42]

Qinsheng Zhang and Yongxin Chen. 2022. Fast Sampling of Diffusion Models with Exponential Integrator. In The Eleventh International Conference on Learning Representations.

Cited By

Lee DLee WSerra ESpezzano F(2024)Learning Prompt-Level Quality Variance for Cost-Effective Text-to-Image GenerationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679954(3847-3851)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679954

Index Terms

Optimal Linear Subspace Search: Learning to Construct Fast and High-Quality Schedulers for Diffusion Models
1. Applied computing
  1. Arts and humanities
    1. Media arts

Recommendations

On ill-posed anisotropic diffusion models
ICIP '95: Proceedings of the 1995 International Conference on Image Processing (Vol.2)-Volume 2 - Volume 2

This paper describes a class of ill-posed anisotropic diffusion models of the type presented by Perona and Malik (1990). The analysis is based on a previous result that anisotropic diffusion is a steepest descent motion on an energy surface and its ...
Multi-particle cellular-automata models for diffusion simulation
MTPP'10: Proceedings of the Second Russia-Taiwan conference on Methods and tools of parallel programming multicomputers

Two new cellular-automata models of the diffusion process are proposed. They are based on integer states of cells instead of Boolean ones in the known models: asynchronous naive diffusion by Toffolli and block-synchronous Margolus diffusion. Computing ...
Diffusion-Like reconstruction schemes from linear data models
DAGM'06: Proceedings of the 28th conference on Pattern Recognition

In this paper we extend anisotropic diffusion with a diffusion tensor to be applicable to data that is well modeled by linear models. We focus on its variational theory, and investigate simple discretizations and their performance on synthetic data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

October 2023

5508 pages

ISBN:9798400701245

DOI:10.1145/3583780

General Chairs:
Ingo Frommholz
University of Wolverhampton, UK
,
Frank Hopfgartner
University of Koblenz, Germany
,
Mark Lee
University of Birmingham, UK
,
Michael Oakes
University of Birmingham, UK
,
Program Chairs:
Mounia Lalmas
Spotify, UK
,
Min Zhang
Tsinghua University, China
,
Rodrygo Santos
Federal University of Minas Gerais, Brazil

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Alibaba Group

Conference

CIKM '23

Sponsor:

CIKM '23: The 32nd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2023

Birmingham, United Kingdom

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
117
Total Downloads

Downloads (Last 12 months)78
Downloads (Last 6 weeks)3

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lee DLee WSerra ESpezzano F(2024)Learning Prompt-Level Quality Variance for Cost-Effective Text-to-Image GenerationProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679954(3847-3851)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679954

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents