Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset

Published: 23 September 2024 Publication History

Abstract

Denoising diffusion probabilistic models (DDPM) have shown impressive performance in various domains as a class of deep generative models. In this article, we introduce the mixture noise-based DDPM (Mix-DDPM), which considers the Markov diffusion posterior as a Gaussian mixture model. Specifically, Mix-DDPM randomly selects a Gaussian component and then adds the chosen Gaussian noise, which can be demonstrated as a more efficient way to perturb the signals into a simple known distribution. We further define the reverse probabilistic model as a parameterized Gaussian mixture kernel. Due to the intractability in calculating the KL divergence between Gaussian mixture models, we derive a variational bound to maximize the likelihood, offering a concise formulation for optimizing the denoising model and valuable insights for designing the sampling strategies. Our theoretical derivation highlights that Mix-DDPM need only shift image which requires the inclusion of a global stochastic offset in both the diffusion and reverse processes, which can be efficiently implemented with just several lines of code. The global stochastic offset effectively fits a Gaussian mixture distribution enhancing the degrees of freedom of the entire diffusion model. Furthermore, we present three streamlined sampling strategies that interface with diverse fast dedicated solvers for diffusion ordinary differential equations, boosting the efficacy of image representation in the sampling phase and alleviating the issue of slow generation speed, thereby enhancing both efficiency and accuracy. Extensive experiments on benchmark datasets demonstrate the effectiveness of Mix-DDPM and its superiority over the original DDPM.

References

[1]
Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. 2023. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv:2305.13840. DOI:
[2]
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. 2020. Stargan v2: Diverse image synthesis for multiple domains. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 8188–8197.
[3]
Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, and Mubarak Shah. 2023. Diffusion models in vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 9 (2023), 10850–10869. DOI:
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[5]
Zijun Deng, Xiangteng He, and Yuxin Peng. 2023. LFR-GAN: Local feature refinement based generative adversarial network for text-to-image generation. ACM Transactions on Multimedia Computing, Communications and Applications 19, 6, Article 207 (Jul. 2023), 18 pages. DOI:
[6]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat GANs on image synthesis. In Proceedings of the Advances in Neural Information Processing Systems. Marc’Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy Shuo Liang, and Jennifer Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 8780–8794.
[7]
Tim Dockhorn, Arash Vahdat, and Karsten Kreis. 2022. GENIE: Higher-order denoising diffusion solvers. In Proceedings of the Advances in Neural Information Processing Systems. Sola Koyejo, Sameer Mohamed, Ankit Agarwal, David Belgrave, Kevin Cho, and Alice Oh (Eds.), Vol. 35. Curran Associates, Inc., 30150–30166.
[8]
Yilun Du and Igor Mordatch. 2019. Implicit generation and modeling with energy based models. In Proceedings of the Advances in Neural Information Processing Systems. Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, François d’Alché-Buc, Emma Fox, and Roman Garnett (Eds.), Vol. 32. Curran Associates, Inc., 3608–3618.
[9]
Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, and Bo Dai. 2023. Generative diffusion prior for unified image restoration and enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 9935–9946.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems. Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil Lawrence, and Kilian Q. Weinberger (Eds.), Vol. 27. Curran Associates, Inc., 2672–2680.
[11]
Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, and Chongxuan LI. 2023. Gaussian mixture solvers for diffusion models. In Proceedings of the Advances in Neural Information Processing Systems. Alice Oh, Timo Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine (Eds.), Vol. 36. Curran Associates, Inc., 25598–25626.
[12]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs trained by a two time-scale update rule converge to a local nash equilibrium. In Proceedings of the Advances in Neural Information Processing Systems. Isabelle Guyon, Ulrike Von Luxburg, Samy Bengio, Hanna Wallach, Rob Fergus, S. Vishwanathan, and Roman Garnett (Eds.), Vol. 30. Curran Associates, Inc., 6629–6640.
[13]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems. Hugo Larochelle, Marc’Aurelio Ranzato, Ross Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.), Vol. 33. Curran Associates, Inc., 6840–6851.
[14]
Jonathan Ho, Chitwan Saharia, William Chan, David J. Fleet, Mohammad Norouzi, and Tim Salimans. 2022a. Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research 23, 47 (2022), 1–33.
[15]
Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. 2022b. Video diffusion models. In Proceedings of the Advances in Neural Information Processing Systems. Sola Koyejo, Sameer Mohamed, Ankit Agarwal, David Belgrave, Kevin Cho, and Alice Oh (Eds.), Vol. 35. Curran Associates, Inc., 8633–8646.
[16]
Bahjat Kawar, Michael Elad, Stefano Ermon, and Jiaming Song. 2022. Denoising diffusion restoration models. In Proceedings of the Advances in Neural Information Processing Systems. Sola Koyejo, Sameer Mohamed, Ankit Agarwal, David Belgrave, Kevin Cho, and Alice Oh (Eds.), Vol. 35. Curran Associates, Inc., 23593–23606.
[17]
Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with invertible 1x1 convolutions. In Proceedings of the Advances in Neural Information Processing Systems. Samy Bengio, Hanna Wallach, Hugo Larochelle, Kristen Grauman, Nicolo Cesa-Bianchi, and Roman Garnett (Eds.), Vol. 31. Curran Associates, Inc., 10215–10224.
[18]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv:1312.6114. DOI:
[19]
Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. 2021. DiffWave: A versatile diffusion model for audio synthesis. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=a-xFK8Ymz5J
[20]
Alex Krizhevsky and Geoffrey Hinton. 2009. Learning multiple layers of features from tiny images.
[21]
Tuomas Kynkäänniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2019. improved precision and recall metric for assessing generative models. In Proceedings of the Advances in Neural Information Processing Systems. Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, François d’Alché-Buc, Emma Fox, and Roman Garnett (Eds.), Vol. 32. Curran Associates, Inc., 3927–3936.
[22]
Kwonjoon Lee, Huiwen Chang, Lu Jiang, Han Zhang, Zhuowen Tu, and Ce Liu. 2022. ViTGAN: Training GANs with vision transformers. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=dwg5rXg1WS_
[23]
Bo Li, Kaitao Xue, Bin Liu, and Yu-Kun Lai. 2023. BBDM: Image-to-image translation with Brownian bridge diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 1952–1961.
[24]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), 3730–3738.
[25]
Zhiming Liu, Kai Niu, and Zhiqiang He. 2023. ML-CookGAN: Multi-label generative adversarial network for food image generation. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 2s, Article 85 (Feb. 2023), 21 pages. DOI:
[26]
Fuchen Long, Zhaofan Qiu, Ting Yao, and Tao Mei. 2024. VideoDrafter: Content-consistent multi-scene video generation with LLM. arXiv:2401.01256. DOI:
[27]
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan LI, and Jun Zhu. 2022. DPM-solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In Proceedings of the Advances in Neural Information Processing Systems. Sola Koyejo, Sameer Mohamed, Ankit Agarwal, David Belgrave, Kevin Cho, and Alice Oh (Eds.), Vol. 35. Curran Associates, Inc., 5775–5787.
[28]
Hengyuan Ma, Li Zhang, Xiatian Zhu, and Jianfeng Feng. 2023b. Approximated Anomalous Diffusion: Gaussian Mixture Score-Based Generative Models. Retrieved from https://openreview.net/forum?id=yc9xen7EAzd
[29]
Yiwei Ma, Yijun Fan, Jiayi Ji, Haowei Wang, Xiaoshuai Sun, Guannan Jiang, Annan Shu, and Rongrong Ji. 2023a. X-dreamer: Creating high-quality 3D content by bridging the domain gap between text-to-2D and text-to-3D generation. arXiv:2312.00085. DOI:
[30]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=B1QRgziT-
[31]
Eliya Nachmani, Robin San Roman, and Lior Wolf. 2021. Non Gaussian denoising diffusion models. arXiv:2106.07582. DOI:
[32]
Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. Marina Meila and Tong Zhang (Eds.). PMLR, 8162–8171.
[33]
Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. 2022. GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162. Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (Eds.). PMLR, 16784–16804.
[34]
Mang Ning, Mingxiao Li, Jianlin Su, Albert Ali Salah, and Itir Onal Ertugrul. 2024. Elucidating the exposure bias in diffusion models. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=xEJMoj1SpX
[35]
Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. 2023. DreamFusion: Text-to-3D using 2D diffusion. In Proceedings of the 11th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=FjNys5c7VyY
[36]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv:2204.06125. DOI:
[37]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 10684–10695.
[38]
Chitwan Saharia, William Chan, Huiwen Chang, Chris Lee, Jonathan Ho, Tim Salimans, David Fleet, and Mohammad Norouzi. 2022a. Palette: Image-to-image diffusion models. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings. ACM, New York, NY, 1–10.
[39]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J Fleet, and Mohammad Norouzi. 2022b. Photorealistic text-to-image diffusion models with deep language understanding. In Proceedings of the Advances in Neural Information Processing Systems. Sola Koyejo, Sameer Mohamed, Ankit Agarwal, David Belgrave, Kevin Cho, and Alice Oh (Eds.), Vol. 35. Curran Associates, Inc., 36479–36494.
[40]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, Xi Chen, and Xi Chen. 2016. Improved techniques for training GANs. In Proceedings of the Advances in Neural Information Processing Systems. Daniel D. Lee, Masashi Sugiyama, Ulrike V. Luxburg, Isabelle Guyon, and Roman Garnett (Eds.), Vol. 29. Curran Associates, Inc., 2234–2242.
[41]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. Francis Bach and David Blei (Eds.). PMLR, 2256–2265.
[42]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising diffusion implicit models. In Proceedings of the International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=St1giarCHLP
[43]
Yang Song and Stefano Ermon. 2019. generative modeling by estimating gradients of the data distribution. In Proceedings of the Advances in Neural Information Processing Systems. Hanna Wallach, Hugo Larochelle, Alina Beygelzimer, Florence d’Alché-Buc, Emily B. Fox, and Roman Garnett (Eds.), Vol. 32. Curran Associates, Inc., 11918–11930.
[44]
Yang Song and Stefano Ermon. 2020. Improved techniques for training score-based generative models. In Proceedings of the Advances in Neural Information Processing Systems. Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin (Eds.), Vol. 33. Curran Associates, Inc., 12438–12448.
[45]
Junshu Tang, Tengfei Wang, Bo Zhang, Ting Zhang, Ran Yi, Lizhuang Ma, and Dong Chen. 2023. Make-it-3D: High-fidelity 3D creation from a single image with diffusion prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 22819–22829.
[46]
Arash Vahdat, Karsten Kreis, and Jan Kautz. 2021. Score-based generative modeling in latent space. In Proceedings of the Advances in Neural Information Processing Systems. Marc’Aurelio Ranzato, Alex Beygelzimer, Yann Dauphin, Percy S. Liang, and Jennifer Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 11287–11302.
[47]
Aaron van den Oord, Nal Kalchbrenner, Lasse Espeholt, Koray Kavukcuoglu, Oriol Vinyals, and Alex Graves. 2016. Conditional image generation with PixelCNN decoders. In Proceedings of the Advances in Neural Information Processing Systems. Daniel D. Lee, Masashi Sugiyama, Ulrike V. Luxburg, Isabelle Guyon, and Roman Garnett (Eds.), Vol. 29. Curran Associates, Inc., 4797–4805.
[48]
Hanzhang Wang, Haoran Wang, Jinze Yang, Zhongrui Yu, Zeke Xie, Lei Tian, Xinyan Xiao, Junjun Jiang, Xianming Liu, and Mingming Sun. 2024. HiCAST: highly customized arbitrary style transfer with adapter enhanced diffusion models. arXiv:2401.05870. DOI:
[49]
Yinhuai Wang, Jiwen Yu, and Jian Zhang. 2023. Zero-shot image restoration using denoising diffusion null-space model. In Proceedings of the 11th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=mRieQgMtNTQ
[50]
Max Welling and Yee W. Teh. 2011. Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning. 681–688.
[51]
Xintian Wu, Huanyu Wang, Yiming Wu, and Xi Li. 2023. D3T-GAN: Data-dependent domain transfer GANs for image generation with limited data. ACM Transactions on Multimedia Computing, Communications, and Applications 19, 4, Article 146 (Mar. 2023), 20 pages. DOI:
[52]
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56, 4 (2023), 1–39. DOI:
[53]
Qinsheng Zhang and Yongxin Chen. 2022. Fast sampling of diffusion models with exponential integrator. In Proceedings of the NeurIPS 2022 Workshop on Score-Based Methods. Retrieved from https://openreview.net/forum?id=hiZ98L9tX1k
[54]
Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. 2023. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 10146–10156.

Cited By

View all
  • (2024)Decentralized Identity Management for Metaverse-Enhanced Education: A Literature ReviewElectronics10.3390/electronics1319388713:19(3887)Online publication date: 30-Sep-2024
  • (2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3665893Online publication date: 23-May-2024
  • (2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
  • Show More Cited By

Index Terms

  1. Mix-DDPM: Enhancing Diffusion Models through Fitting Mixture Noise with Global Stochastic Offset

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 9
      September 2024
      780 pages
      EISSN:1551-6865
      DOI:10.1145/3613681
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 September 2024
      Online AM: 07 June 2024
      Accepted: 04 June 2024
      Revised: 22 April 2024
      Received: 02 February 2024
      Published in TOMM Volume 20, Issue 9

      Check for updates

      Author Tags

      1. Diffusion models
      2. global stochastic offset
      3. Gaussian mixture noise
      4. image generation

      Qualifiers

      • Research-article

      Funding Sources

      • National Natural Science Foundation of China

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)231
      • Downloads (Last 6 weeks)87
      Reflects downloads up to 10 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Decentralized Identity Management for Metaverse-Enhanced Education: A Literature ReviewElectronics10.3390/electronics1319388713:19(3887)Online publication date: 30-Sep-2024
      • (2024)ASIFusion: An Adaptive Saliency Injection-Based Infrared and Visible Image Fusion NetworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3665893Online publication date: 23-May-2024
      • (2024)Heterogeneous Fusion and Integrity Learning Network for RGB-D Salient Object DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365647620:7(1-24)Online publication date: 15-May-2024
      • (2024)MultiRider: Enabling Multi-Tag Concurrent OFDM Backscatter by Taming In-band InterferenceProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661862(292-303)Online publication date: 3-Jun-2024
      • (2024)Depth-Assisted Semi-Supervised RGB-D Rail Surface Defect InspectionIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338794925:7(8042-8052)Online publication date: 24-Apr-2024
      • (2024)Fast detection and obstacle avoidance on UAVs using lightweight convolutional neural network based on the fusion of radar and cameraApplied Intelligence10.1007/s10489-024-05768-554:22(11510-11524)Online publication date: 1-Nov-2024
      • (2024)Driver intention prediction based on multi-dimensional cross-modality information interactionMultimedia Systems10.1007/s00530-024-01282-330:2Online publication date: 15-Mar-2024

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media