Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3652583.3658020acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Open access

SBCR: Stochasticity Beats Content Restriction Problem in Training and Tuning Free Image Editing

Published: 07 June 2024 Publication History

Abstract

Text-conditional image editing is a practical AIGC task that has recently emerged with great commercial and academic value. For real image editing, most diffusion model-based methods use DDIM Inversion as a first stage before editing. However, DDIM Inversion often results in reconstruction failure, leading to unsatisfactory performance for downstream editing. Many inversion-based works modify the formula to address this problem but this leads to another content restriction problem. To solve the content restriction problem, we first analyze why the reconstruction via DDIM Inversion fails and then propose Reconstruction-and-Generation Balancing Noises (R&G-B noises) that can achieve superior reconstruction and editing performance with the following advantages: 1) It can perfectly reconstruct real images without fine-tuning. 2) It can overcome the content restriction problem and generate diverse content.

References

[1]
Omri Avrahami, Ohad Fried, and Dani Lischinski. 2022a. Blended latent diffusion. arXiv preprint arXiv:2206.02779 (2022).
[2]
Omri Avrahami, Ohad Fried, and Dani Lischinski. 2022b. Blended Latent Diffusion. arXiv preprint arXiv:2206.02779 (2022).
[3]
Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolinário Passos. 2023. LEDITS: Limitless Image Editing using Text-to-Image Models. arXiv preprint arXiv:2311.16711 (2023).
[4]
Tim Brooks, Aleksander Holynski, and Alexei A Efros. 2023. Instructpix2pix: Learning to follow image editing instructions. In CVPR.
[5]
Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, Xiaohu Qie, and Yinqiang Zheng. 2023. MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing. In ICCV.
[6]
Songyan Chen and Jiancheng Huang. 2023 a. FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing. arXiv preprint arXiv:2309.14934 (2023).
[7]
Songyan Chen and Jiancheng Huang. 2023 b. SpecRef: A Fast Training-free Baseline of Specific Reference-Condition Real Image Editing. In 2023 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML). IEEE, 369--375.
[8]
Guillaume Couairon, Jakob Verbeek, Holger Schwenk, and Matthieu Cord. 2022. DiffEdit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427 (2022).
[9]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. NeurIPS, Vol. 34 (2021), 8780--8794.
[10]
Tsu-Jui Fu, Wenze Hu, Xianzhi Du, William Yang Wang, Yinfei Yang, and Zhe Gan. 2023. Guiding instruction-based image editing via multimodal large language models. arXiv preprint arXiv:2309.17102 (2023).
[11]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR.
[12]
Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, and Baining Guo. 2022. Vector quantized diffusion model for text-to-image synthesis. In CVPR.
[13]
L. Han, et al. 2024. ProxEdit: Improving Tuning-Free Real Image Editing With Proximal Guidance. In WACV.
[14]
Ligong Han, Song Wen, Qi Chen, Zhixing Zhang, Kunpeng Song, Mengwei Ren, Ruijiang Gao, Yuxiao Chen, Di Liu, Qilong Zhangli, et al. 2023. Improving Negative-Prompt Inversion via Proximal Guidance. arXiv preprint arXiv:2306.05414 (2023).
[15]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. In ICLR.
[16]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In NeurIPS.
[17]
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded Diffusion Models for High Fidelity Image Generation. The Journal of Machine Learning Research, Vol. 23 (2022), 2249--2281.
[18]
Jonathan Ho and Tim Salimans. 2021. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
[19]
Jonathan Ho and Tim Salimans. 2022. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598 (2022).
[20]
Jiancheng Huang, Yifan Liu, and Shifeng Chen. 2023 b. Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement. arXiv preprint arXiv:2309.14709 (2023).
[21]
Jiancheng Huang, Yifan Liu, Yi Huang, and Shifeng Chen. 2023 c. Seal2Real: Prompt Prior Learning on Diffusion Model for Unsupervised Document Seal Data Generation and Realisation. arXiv preprint arXiv:2310.00546 (2023).
[22]
Jiancheng Huang, Yifan Liu, Jin Qin, and Shifeng Chen. 2023 d. KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing. arxiv: 2309.16608 [cs.CV]
[23]
Yi Huang, Jiancheng Huang, Jianzhuang Liu, Yu Dong, Jiaxi Lv, and Shifeng Chen. 2023 a. WaveDM: Wavelet-Based Diffusion Models for Image Restoration. arXiv preprint arXiv:2305.13819 (2023).
[24]
Inbar Huberman-Spiegelglas, Vladimir Kulikov, and Tomer Michaeli. 2023. An Edit Friendly DDPM Noise Space: Inversion and Manipulations. arXiv preprint arXiv:2304.06140 (2023).
[25]
Betker James, Goh Gabriel, Jing Li, Brooks Tim, Wang Jianfeng, Li Linjie, Ouyang Long, and et.al. 2023. Improving Image Generation with Better Captions. (2023). https://cdn.openai.com/papers
[26]
Jaeseok Jeong, Mingi Kwon, and Youngjung Uh. 2024. Training-Free Content Injection Using H-Space in Diffusion Models. In WACV. 5151--5161.
[27]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2023. Imagic: Text-based real image editing with diffusion models. In CVPR.
[28]
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2023. Multi-Concept Customization of Text-to-Image Diffusion. In CVPR.
[29]
Hyunsoo Lee, Minsoo Kang, and Bohyung Han. 2023. Conditional Score Guidance for Text-Driven Image-to-Image Translation. NIPS (2023).
[30]
Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, and Shih-Fu Chang. 2022. Clip-event: Connecting text and images with event structures. In CVPR.
[31]
Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. 2023. Tf-icon: Diffusion-based training-free cross-domain image composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2294--2305.
[32]
B. Meiri, et al. 2023. Fixed-point Inversion for Text-to-image diffusion models. arXiv (2023).
[33]
Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Image synthesis and editing with stochastic differential equations. In ICLR.
[34]
Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. 2023. Negative-prompt Inversion: Fast Image Inversion for Editing with Text-guided Diffusion Models. arXiv preprint arXiv:2305.16807 (2023).
[35]
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2023. Null-text Inversion for Editing Real Images using Guided Diffusion Models. In CVPR.
[36]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2022. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In ICML.
[37]
Alexander Quinn Nichol and Prafulla Dhariwal. 2021. Improved denoising diffusion probabilistic models. In ICML. PMLR, 8162--8171.
[38]
Z. Pan, et al. 2023. Effective real image editing with accelerated iterative diffusion inversion. In ICCV.
[39]
Dong Huk Park, Grace Luo, Clayton Toste, Samaneh Azadi, Xihui Liu, Maka Karalashvili, Anna Rohrbach, and Trevor Darrell. 2024. Shape-Guided Diffusion With Inside-Outside Attention. In WACV. 4198--4207.
[40]
Geon Yeong Park, Jeongsol Kim, Beomsu Kim, Sang Wan Lee, and Jong Chul Ye. 2023. Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models. NIPS (2023).
[41]
Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu. 2023. Zero-shot Image-to-Image Translation. In ACM SIGGRAPH.
[42]
Or Patashnik, Daniel Garibi, Idan Azuri, Hadar Averbuch-Elor, and Daniel Cohen-Or. 2023. Localizing Object-level Shape Variations with Text-to-Image Diffusion Models. ICCV (2023).
[43]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[44]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In CVPR.
[45]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR.
[46]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, et al. 2022. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS.
[47]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In ICML.
[48]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2020a. Denoising diffusion implicit models. In ICLR.
[49]
Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. In NeurIPS.
[50]
Yang Song and Stefano Ermon. 2020. Improved techniques for training score-based generative models. NeurIPS (2020).
[51]
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020b. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
[52]
Linoy Tsaban and Apolinário Passos. 2023. LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance. arxiv: 2307.00522 [cs.CV]
[53]
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. 2023. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. In CVPR.
[54]
Yuxiang Wei, Yabo Zhang, Zhilong Ji, Jinfeng Bai, Lei Zhang, and Wangmeng Zuo. 2023. ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation. In ICCV.
[55]
Chen Henry Wu and Fernando De la Torre. 2023. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7378--7387.
[56]
Guangxuan Xiao, Tianwei Yin, William T Freeman, Frédo Durand, and Song Han. 2023. FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention. arXiv preprint arXiv:2305.10431 (2023).
[57]
Jiahui Yu, Yuanzhong Xu, Jing Yu Koh, Thang Luong, Gunjan Baid, Zirui Wang, Vijay Vasudevan, Alexander Ku, Yinfei Yang, Burcu Karagol Ayan, et al. 2022. Scaling autoregressive models for content-rich text-to-image generation. Transactions on Machine Learning Research (2022).
[58]
Zihao Yu, Haoyang Li, Fangcheng Fu, Xupeng Miao, and Bin Cui. 2024. FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference. AAAI (2024).
[59]
Lvmin Zhang and Maneesh Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. ICCV (2023).
[60]
Jing Zhao, Heliang Zheng, Chaoyue Wang, Long Lan, Wanrong Huang, and Wenjing Yang. 2023. Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator. arXiv preprint arXiv:2305.06710 (2023).

Cited By

View all
  • (2025)PFB-Diff: Progressive Feature Blending diffusion for text-driven image editingNeural Networks10.1016/j.neunet.2024.106777181(106777)Online publication date: Jan-2025

Index Terms

  1. SBCR: Stochasticity Beats Content Restriction Problem in Training and Tuning Free Image Editing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval
    May 2024
    1379 pages
    ISBN:9798400706196
    DOI:10.1145/3652583
    This work is licensed under a Creative Commons Attribution-NoDerivatives International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2024

    Check for updates

    Author Tags

    1. aigc
    2. diffusion model
    3. generative model.
    4. real image editing
    5. text-to-image generation

    Qualifiers

    • Research-article

    Funding Sources

    • Shenzhen Science and Technology Innovation Commission

    Conference

    ICMR '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 254 of 830 submissions, 31%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)141
    • Downloads (Last 6 weeks)40
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)PFB-Diff: Progressive Feature Blending diffusion for text-driven image editingNeural Networks10.1016/j.neunet.2024.106777181(106777)Online publication date: Jan-2025

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media