research-article

Open access

Key-Locked Rank One Editing for Text-to-Image Personalization

Authors:

Yuval AtzmonAuthors Info & Claims

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

Article No.: 12, Pages 1 - 11

https://doi.org/10.1145/3588432.3591506

Published: 23 July 2023 Publication History

All formats PDF

Abstract

Text-to-image models (T2I) offer a new level of flexibility by allowing users to guide the creative process through natural language. However, personalizing these models to align with user-provided visual concepts remains a challenging problem. The task of T2I personalization poses multiple hard challenges, such as maintaining high visual fidelity while allowing creative control, combining multiple personalized concepts in a single image, and keeping a small model size. We present Perfusion, a T2I personalization method that addresses these challenges using dynamic rank-1 updates to the underlying T2I model. Perfusion avoids overfitting by introducing a new mechanism that “locks” new concepts’ cross-attention Keys to their superordinate category. Additionally, we develop a gated rank-1 approach that enables us to control the influence of a learned concept during inference time and to combine multiple concepts. This allows runtime efficient balancing of visual-fidelity and textual-alignment with a single 100KB trained model. Importantly, it can span different operating points across the Pareto front without additional training. We compare our approach to strong baselines and demonstrate its qualitative and quantitative strengths.

Supplemental Material

MP4 File

presentation

Download
167.70 MB

PDF File

Appendix

Download
26.40 MB

PDF File

Appendix

Download
26.40 MB

References

[1]

Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432–4441.

[2]

Omri Avrahami, Dani Lischinski, and Ohad Fried. 2022. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18208–18218.

[3]

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro, Tero Karras, and Ming-Yu Liu. 2022. eDiff-I: Text-to-Image Diffusion Models with Ensemble of Expert Denoisers. arXiv preprint arXiv:2211.01324 (2022).

[4]

David Bau, Steven Liu, Tongzhou Wang, Jun-Yan Zhu, and Antonio Torralba. 2020. Rewriting a Deep Generative Model. In Proceedings of the European Conference on Computer Vision (ECCV).

Digital Library

[5]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. 2022. InstructPix2Pix: Learning to Follow Image Editing Instructions. ArXiv abs/2211.09800 (2022).

[6]

Niv Cohen, Rinon Gal, Eli A. Meirom, Gal Chechik, and Yuval Atzmon. 2022. "This is my unicorn, Fluffy": Personalizing frozen vision-language representations. In European Conference on Computer Vision (ECCV).

Digital Library

[7]

Giannis Daras and Alexandros G. Dimakis. 2022. Multiresolution Textual Inversion. ArXiv abs/2211.17115 (2022).

[8]

darkstorm2150. 2022. Protogen-v3.4. https://huggingface.co/darkstorm2150/Protogen_x3.4_Official_Release/tree/main

[9]

Envvi. 2022. InkPunk-v2. https://huggingface.co/Envvi/Inkpunk-Diffusion

[10]

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit H. Bermano, Gal Chechik, and Daniel Cohen-Or. 2022. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https://doi.org/10.48550/ARXIV.2208.01618

[11]

Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. 2022. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space. arXiv preprint arXiv:2203.14680 (2022).

[12]

Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. 2020. Transformer feed-forward layers are key-value memories. arXiv preprint arXiv:2012.14913 (2020).

[13]

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. 2022. Prompt-to-prompt image editing with cross attention control. (2022).

[14]

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. 2022. Imagic: Text-Based Real Image Editing with Diffusion Models. arXiv preprint arXiv:2210.09276 (2022).

[15]

Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shechtman, and Jun-Yan Zhu. 2022. Multi-Concept Customization of Text-to-Image Diffusion. arXiv (2022).

[16]

Timo Lüddecke and Alexander S Ecker. 2021. Prompt-Based Multi-Modal Image Segmentation. arXiv preprint arXiv:2112.10003 (2021).

[17]

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. 2021. Sdedit: Guided image synthesis and editing with stochastic differential equations. In International Conference on Learning Representations.

[18]

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. 2022. Locating and Editing Factual Associations in GPT. Advances in Neural Information Processing Systems 36 (2022).

[19]

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. 2021. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741 (2021).

[20]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763.

[21]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.

Digital Library

[22]

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125 (2022).

[23]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2021. High-Resolution Image Synthesis with Latent Diffusion Models. arxiv:2112.10752 [cs.CV]

[24]

Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2022. DreamBooth: Fine Tuning Text-to-image Diffusion Models for Subject-Driven Generation. (2022).

[25]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S Sara Mahdavi, Rapha Gontijo Lopes, 2022. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. arXiv preprint arXiv:2205.11487 (2022).

[26]

Dvir Samuel, Yuval Atzmon, and Gal Chechik. 2020. From generalized zero-shot learning to long-tail with class descriptors. 2021 IEEE Winter Conference on Applications of Computer Vision (WACV) (2020), 286–295.

[27]

Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung M. Bui, Tong Yu, Zhe Lin, Yang Zhang, and Shiyu Chang. 2022. Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models. ArXiv abs/2212.08698 (2022).

[28]

Binxin Yang, Shuyang Gu, Bo Zhang, Ting Zhang, Xuejin Chen, Xiaoyan Sun, Dong Chen, and Fang Wen. 2022. Paint by Example: Exemplar-based Image Editing with Diffusion Models. ArXiv abs/2211.13227 (2022).

[29]

Zhixing Zhang, Ligong Han, Arna Ghosh, Dimitris N. Metaxas, and Jian Ren. 2022. SINE: SINgle Image Editing with Text-to-Image Diffusion Models. ArXiv abs/2212.04489 (2022).

Cited By

Alimisis PMademlis IRadoglou-Grammatikis PSarigiannidis PPapadopoulos G(2025)Advances in diffusion models for image data augmentation: a review of methods, models, evaluation metrics and future research directionsArtificial Intelligence Review10.1007/s10462-025-11116-x58:4Online publication date: 30-Jan-2025
https://doi.org/10.1007/s10462-025-11116-x
Zhu MFan CChen HLiu YMao WXu XShen CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Generative active learning for long-tailed instance segmentationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694650(62349-62368)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694650
Lin WChen JShi JZhu YLiang CMiao JJin TZhao ZWu FYan SZhang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Non-confusing generation of customized concepts in diffusion modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693276(29935-29948)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693276
Show More Cited By

Index Terms

Key-Locked Rank One Editing for Text-to-Image Personalization
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Generating semantically enriched user profiles for Web personalization

Traditional collaborative filtering generates recommendations for the active user based solely on ratings of items by other users. However, most businesses today have item ontologies that provide a useful source of content descriptors that can be used ...
PALP: Prompt Aligned Personalization of Text-to-Image Models
SA '24: SIGGRAPH Asia 2024 Conference Papers
Content creators often aim to create personalized images using personal subjects that go beyond the capabilities of conventional text-to-image models. Additionally, they may want the resulting image to encompass a specific location, style, ambiance, and ...
Personalization of social media
FDIA'07: Proceedings of the 1st BCS IRSG conference on Future Directions in Information Access

This article describes a framework that captures collaborative tagging systems, and derives from it an overview of user tasks that qualify for personalization in such a system. Major research areas have focused on some of these tasks, but we identify ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGGRAPH '23: ACM SIGGRAPH 2023 Conference Proceedings

July 2023

911 pages

ISBN:9798400701597

DOI:10.1145/3588432

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGGRAPH: ACM Special Interest Group on Computer Graphics and Interactive Techniques

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

SIGGRAPH '23

Sponsor:

SIGGRAPH

SIGGRAPH '23: Special Interest Group on Computer Graphics and Interactive Techniques Conference

August 6 - 10, 2023

CA, Los Angeles, USA

Acceptance Rates

Overall Acceptance Rate 1,822 of 8,601 submissions, 21%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
1,690
Total Downloads

Downloads (Last 12 months)930
Downloads (Last 6 weeks)63

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Alimisis PMademlis IRadoglou-Grammatikis PSarigiannidis PPapadopoulos G(2025)Advances in diffusion models for image data augmentation: a review of methods, models, evaluation metrics and future research directionsArtificial Intelligence Review10.1007/s10462-025-11116-x58:4Online publication date: 30-Jan-2025
https://doi.org/10.1007/s10462-025-11116-x
Zhu MFan CChen HLiu YMao WXu XShen CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Generative active learning for long-tailed instance segmentationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694650(62349-62368)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694650
Lin WChen JShi JZhu YLiang CMiao JJin TZhao ZWu FYan SZhang HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Non-confusing generation of customized concepts in diffusion modelsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693276(29935-29948)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693276
Jin CTanno RSaseendran ADiethe TTeare PSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)An image is worth multiplewordsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692963(22210-22243)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692963
Zhou YRen YWu CXue M(2024)Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual TransformerSensors10.3390/s2424810324:24(8103)Online publication date: 19-Dec-2024
https://doi.org/10.3390/s24248103
Wang KOstashev DFang YTulyakov SAberman K(2024)MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image GenerationSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687662(1-12)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687662
Arar MVoynov AHertz AAvrahami OFruchter SPritch YCohen-Or DShamir A(2024)PALP: Prompt Aligned Personalization of Text-to-Image ModelsSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687604(1-11)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687604
Kumari NSu GZhang RPark TShechtman EZhu J(2024)Customizing Text-to-Image Diffusion with Object Viewpoint ControlSIGGRAPH Asia 2024 Conference Papers10.1145/3680528.3687564(1-13)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3680528.3687564
Zeng WYan YZhu QChen ZChu PZhao WYang XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Infusion: Preventing Customized Text-to-Image Diffusion from OverfittingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680894(3568-3577)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680894
Zhang XZhang WWei XWu JZhang ZLei ZLi QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Generative Active Learning for Image Synthesis PersonalizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680773(10669-10677)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680773
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten