Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3588432.3591506acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article
Open access

Key-Locked Rank One Editing for Text-to-Image Personalization

Published: 23 July 2023 Publication History

Abstract

Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that “locks” new concepts’ cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model. Importantly, it can span different operating points across the Pareto front without additional training. We compare our approach to strong baselines and demonstrate its qualitative and quantitative strengths.

Supplemental Material

MP4 File
presentation
PDF File
Appendix
PDF File
Appendix

References

[1]
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432–4441.
[2]
Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18208–18218.
[3]
Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, and Ming-Yu Liu. 2022. eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324 (2022).
[4]
David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, and Antonio Torralba. 2020. Rewriting a Deep Generative Model. In Proceedings of the European Conference on Computer Vision (ECCV).
[5]
Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2022. InstructPix2Pix: Learning to Follow Image Editing Instructions. ArXiv abs/2211.09800 (2022).
[6]
Niv Cohen, Rinon Gal, Eli A. Meirom, Gal Chechik, and Yuval Atzmon. 2022. "This is my unicorn, Fluffy": Personalizing frozen vision-language representations. In European Conference on Computer Vision (ECCV).
[7]
Giannis Daras and Alexandros G. Dimakis. 2022. Multiresolution Textual Inversion. ArXiv abs/2211.17115 (2022).
[8]
darkstorm2150. 2022. Protogen-v3.4. https://huggingface.co/darkstorm2150/Protogen_x3.4_Official_Release/tree/main
[9]
Envvi. 2022. InkPunk-v2. https://huggingface.co/Envvi/Inkpunk-Diffusion
[10]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https://doi.org/10.48550/ARXIV.2208.01618
[11]
Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. 2022. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. arXiv preprint arXiv:2203.14680 (2022).
[12]
Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. 2020. Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913 (2020).
[13]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).
[14]
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276 (2022).
[15]
Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2022. Multi-Concept Customization of Text-to-Image Diffusion. arXiv (2022).
[16]
Timo Lüddecke and Alexander S Ecker. 2021. Prompt-Based Multi-Modal Image Segmentation. arXiv preprint arXiv:2112.10003 (2021).
[17]
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations.
[18]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. Advances in Neural Information Processing Systems 36 (2022).
[19]
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).
[20]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.
[21]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
[22]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).
[23]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]
[24]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).
[25]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022).
[26]
Dvir Samuel, Yuval Atzmon, and Gal Chechik. 2020. From generalized zero-shot learning to long-tail with class descriptors. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2020), 286–295.
[27]
Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung M. Bui, Tong Yu, Zhe Lin, Yang Zhang, and Shiyu Chang. 2022. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. ArXiv abs/2212.08698 (2022).
[28]
Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. 2022. Paint by Example: Exemplar-based Image Editing with Diffusion Models. ArXiv abs/2211.13227 (2022).
[29]
Zhixing Zhang, Ligong Han, Arna Ghosh, Dimitris N. Metaxas, and Jian Ren. 2022. SINE: SINgle Image Editing with Text-to-Image Diffusion Models. ArXiv abs/2212.04489 (2022).

Cited By

View all
  • (2025)Advances in diffusion models for image data augmentation: a review of methods, models, evaluation metrics and future research directionsArtificial Intelligence Review10.1007/s10462-025-11116-x58:4Online publication date: 30-Jan-2025
  • (2024)Generative active learning for long-tailed instance segmentationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694650(62349-62368)Online publication date: 21-Jul-2024
  • (2024)Non-confusing generation of customized concepts in diffusion modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693276(29935-29948)Online publication date: 21-Jul-2024
  • Show More Cited By

Index Terms

  1. Key-Locked Rank One Editing for Text-to-Image Personalization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings
      July 2023
      911 pages
      ISBN:9798400701597
      DOI:10.1145/3588432
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 23 July 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Diffusion
      2. Personalization
      3. Rank-One
      4. Text-to-Image

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      SIGGRAPH '23
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)930
      • Downloads (Last 6 weeks)63
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Advances in diffusion models for image data augmentation: a review of methods, models, evaluation metrics and future research directionsArtificial Intelligence Review10.1007/s10462-025-11116-x58:4Online publication date: 30-Jan-2025
      • (2024)Generative active learning for long-tailed instance segmentationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694650(62349-62368)Online publication date: 21-Jul-2024
      • (2024)Non-confusing generation of customized concepts in diffusion modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693276(29935-29948)Online publication date: 21-Jul-2024
      • (2024)An image is worth multiplewordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692963(22210-22243)Online publication date: 21-Jul-2024
      • (2024)Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual TransformerSensors10.3390/s2424810324:24(8103)Online publication date: 19-Dec-2024
      • (2024)MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image GenerationSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687662(1-12)Online publication date: 3-Dec-2024
      • (2024)PALP: Prompt Aligned Personalization of Text-to-Image ModelsSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687604(1-11)Online publication date: 3-Dec-2024
      • (2024)Customizing Text-to-Image Diffusion with Object Viewpoint ControlSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687564(1-13)Online publication date: 3-Dec-2024
      • (2024)Infusion: Preventing Customized Text-to-Image Diffusion from OverfittingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680894(3568-3577)Online publication date: 28-Oct-2024
      • (2024)Generative Active Learning for Image Synthesis PersonalizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680773(10669-10677)Online publication date: 28-Oct-2024
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media