INSTASTYLE: Inversion Noise of a Stylized Image is Secretly a Style Adviser

Cui, Xing; Li, Zekun; Li, Peipei; Huang, Huaibo; Liu, Xuannan; He, Zhaofeng

doi:10.1007/978-3-031-72983-6_26

Xing Cui¹³,
Zekun Li¹⁴,
Peipei Li¹³,
Huaibo Huang¹⁵,
Xuannan Liu¹³ &
…
Zhaofeng He¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15109))

Included in the following conference series:

European Conference on Computer Vision

205 Accesses
1 Citations

Abstract

Stylized text-to-image generation focuses on creating images from textual descriptions while adhering to a style specified by reference images. However, subtle style variations within different reference images can hinder the model from accurately learning the target style. In this paper, we propose InstaStyle, a novel approach that excels in generating high-fidelity stylized images with only a single reference image. Our approach is based on the finding that the inversion noise from a stylized reference image inherently carries the style signal, as evidenced by their non-zero signal-to-noise ratio. We employ DDIM inversion to extract this noise from the reference image and leverage a diffusion model to generate new stylized images from the “style” noise. Additionally, the inherent ambiguity and bias of textual prompts impede the precise conveying of style during image inversion. To address this, we devise prompt refinement, which learns a style token assisted by human feedback. Qualitative and quantitative experimental results demonstrate that InstaStyle achieves superior performance compared to current benchmarks. Furthermore, our approach also showcases its capability in the creative task of style combination with mixed inversion noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Diversified image style transfer—approaches, new methods and directed variability control

Article 17 January 2025

Training-Free Diffusion Models for Content-Style Synthesis

Notes

References

Alaluf, Y., Richardson, E., Metzer, G., Cohen-Or, D.: A neural space-time representation for text-to-image personalization. ACM TOG 42, 1–10 (2023)
Article Google Scholar
Chen, H., et al.: DualAST: dual style-learning networks for artistic style transfer. In: CVPR (2021)
Google Scholar
Cho, J., Nam, G., Kim, S., Yang, H., Kwak, S.: PromptStyler: prompt-driven style generation for source-free domain generalization. In: ICCV (2023)
Google Scholar
Cui, X., Li, P., Li, Z., Liu, X., Zou, Y., He, Z.: Localize, understand, collaborate: semantic-aware dragging via intention reasoner. arXiv preprint arXiv:2406.00432 (2024)
Cui, X., et al.: ChatEdit: towards multi-turn interactive facial image editing via dialogue. In: EMNLP (2023)
Google Scholar
Deng, Y., et al.: StyTr2: image style transfer with transformers. In: CVPR (2022)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: NeurIPS (2021)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: Taming transformers for high-resolution image synthesis. In: CVPR (2021)
Google Scholar
Everaert, M.N., Bocchio, M., Arpa, S., Süsstrunk, S., Achanta, R.: Diffusion in style. In: ICCV (2023)
Google Scholar
Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. In: ICLR (2022)
Google Scholar
Gal, R., Patashnik, O., Maron, H., Bermano, A.H., Chechik, G., Cohen-Or, D.: StyleGAN-NADA: clip-guided domain adaptation of image generators. ACM TOG 41, 1–13 (2022)
Article Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer. In: CVPR (2017)
Google Scholar
Goodfellow, I., et al.: Generative adversarial networks. In: CACM (2020)
Google Scholar
Gu, B., Fan, H., Zhang, L.: Two birds, one stone: a unified framework for joint learning of image and video style transfers. In: ICCV (2023)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Google Scholar
Ho, J., Salimans, T.: Classifier-free diffusion guidance. In: NeurIPSW (2021)
Google Scholar
Hong, K., et al.: AesPA-Net: aesthetic pattern-aware style transfer networks. In: ICCV (2023)
Google Scholar
Hu, E.J., et al.: LoRa: low-rank adaptation of large language models. In: ICLR (2021)
Google Scholar
Huang, S., An, J., Wei, D., Luo, J., Pfister, H.: QuantART: quantizing image style transfer towards high visual fidelity. In: CVPR (2023)
Google Scholar
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Google Scholar
Jia, G., Li, P., He, R.: Theme-aware aesthetic distribution prediction with full-resolution photographs. In: TNNLS (2022)
Google Scholar
Jing, Y., et al.: Dynamic instance normalization for arbitrary style transfer. In: AAAI (2020)
Google Scholar
Jing, Y.: Stroke controllable fast style transfer with adaptive receptive fields. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 244–260. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_15
Chapter Google Scholar
Jing, Y., et al.: Learning graph neural networks for image style transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. Lecture Notes in Computer Science, vol. 13667, pp. 111–128. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20071-7_7
Chapter Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Google Scholar
Ke, Z., Liu, Y., Zhu, L., Zhao, N., Lau, R.W.: Neural preset for color style transfer. In: CVPR (2023)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Krizhevsky, A., et al.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
Google Scholar
Kumari, N., Zhang, B., Zhang, R., Shechtman, E., Zhu, J.Y.: Multi-concept customization of text-to-image diffusion. In: CVPR (2023)
Google Scholar
Li, P., Hu, Y., He, R., Sun, Z.: Global and local consistent wavelet-domain age synthesis. TIFS 14, 2943–2957 (2019)
Google Scholar
Li, P., Liu, X., Huang, J., Xia, D., Yang, J., Lu, Z.: Progressive generation of 3D point clouds with hierarchical consistency. PR 136, 109200 (2023)
Google Scholar
Li, P., Wang, R., Huang, H., He, R., He, Z.: Pluralistic aging diffusion autoencoder. In: ICCV (2023)
Google Scholar
Li, P., Wu, X., Hu, Y., He, R., Sun, Z.: M2FPA: a multi-yaw multi-pitch high-quality dataset and benchmark for facial pose analysis. In: ICCV (2019)
Google Scholar
Lin, S., Liu, B., Li, J., Yang, X.: Common diffusion noise schedules and sample steps are flawed. arXiv preprint arXiv:2305.08891 (2023)
Lin, T., et al.: Drafting and revision: Laplacian pyramid network for fast high-quality artistic style transfer. In: CVPR (2021)
Google Scholar
Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_26
Chapter Google Scholar
Liu, S., et al.: AdaAttN: revisit attention mechanism in arbitrary neural style transfer. In: ICCV (2021)
Google Scholar
Mao, J., Wang, X., Aizawa, K.: Guided image synthesis via initial image editing in diffusion model. In: ACM MM (2023)
Google Scholar
Meng, C., et al.: SDEdit: guided image synthesis and editing with stochastic differential equations. In: ICLR (2021)
Google Scholar
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models. In: CVPR (2023)
Google Scholar
Nichol, A.Q., et al.: GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In: ICLR (2022)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML (2021)
Google Scholar
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: ICML (2021)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: CVPR (2023)
Google Scholar
Ruiz, N., et al.: HyperDreamBooth: hypernetworks for fast personalization of text-to-image models. arXiv preprint arXiv:2307.06949 (2023)
Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. In: ICLR (2021)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)
Google Scholar
Sohn, K., et al.: StyleDrop: text-to-image generation in any style. In: NeurIPS (2023)
Google Scholar
Sohn, K., et al.: Learning disentangled prompts for compositional image synthesis. arXiv preprint arXiv:2306.00763 (2023)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2020)
Google Scholar
Tang, H., et al.: Master: meta style transformer for controllable zero-shot and few-shot artistic style transfer. In: CVPR (2023)
Google Scholar
Tang, J., et al.: Tri-clustered tensor completion for social-aware image tag refinement. TPAMI 39, 1662–1674 (2016)
Article Google Scholar
Teng, Q., Wang, R., Cui, X., Li, P., He, Z.: Exploring 3D-aware lifespan face aging via disentangled shape-texture representations. In: ICME (2024)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wang, H., Li, Y., Wang, Y., Hu, H., Yang, M.H.: Collaborative distillation for ultra-resolution universal style transfer. In: CVPR (2020)
Google Scholar
Wang, R., et al.: StableGarment: garment-centric generation via stable diffusion. arXiv preprint arXiv:2403.10783 (2024)
Wang, R., Li, P., Huang, H., Cao, C., He, R., He, Z.: Learning-to-rank meets language: boosting language-driven ordering alignment for ordinal classification. In: NeurIPS (2023)
Google Scholar
Wang, Z., Zhao, L., Xing, W.: StyleDiffusion: controllable disentangled style transfer via diffusion models. In: ICCV (2023)
Google Scholar
Wei, Y., Zhang, Y., Ji, Z., Bai, J., Zhang, L., Zuo, W.: Elite: Encoding visual concepts into textual embeddings for customized text-to-image generation. In: ICCV (2023)
Google Scholar
Wen, L., Gao, C., Zou, C.: CAP-VSTNet: content affinity preserved versatile style transfer. In: CVPR (2023)
Google Scholar
Wu, X., Hu, Z., Sheng, L., Xu, D.: StyleFormer: real-time arbitrary style transfer via parametric style composition. In: ICCV (2021)
Google Scholar
Wu, Z., Zhu, Z., Du, J., Bai, X.: CCPL: contrastive coherence preserving loss for versatile style transfer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13676, pp. 189–206. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19787-1_11
Chapter Google Scholar
Xie, X., Li, Y., Huang, H., Fu, H., Wang, W., Guo, Y.: Artistic style discovery with independent components. In: CVPR (2022)
Google Scholar
Xu, W., Long, C., Nie, Y.: Learning dynamic style kernels for artistic style transfer. In: CVPR (2023)
Google Scholar
Xu, Z., Sangineto, E., Sebe, N.: StylerDALLE: language-guided style transfer using a vector-quantized tokenizer of a large-scale generative model. In: ICCV (2023)
Google Scholar
Yang, S., Hwang, H., Ye, J.C.: Zero-shot contrastive loss for text-guided diffusion image style transfer. In: ICCV (2023)
Google Scholar
Zhang, Y., et al.: Inversion-based style transfer with diffusion models. In: CVPR (2023)
Google Scholar
Zhang, Y., et al.: Domain enhanced arbitrary image style transfer via contrastive learning. In: ACM SIGGRAPH (2022)
Google Scholar
Zhang, Z., Li, B., Nie, X., Han, C., Guo, T., Liu, L.: Towards consistent video editing with text-to-image diffusion models. In: NeurIPS (2023)
Google Scholar
Zhou, Y., Zhang, R., Sun, T., Xu, J.: Enhancing detail preservation for customized text-to-image generation: a regularization-free approach. arXiv preprint arXiv:2305.13579 (2023)
Zhu, M., He, X., Wang, N., Wang, X., Gao, X.: All-to-key attention for arbitrary style transfer. In: ICCV (2023)
Google Scholar

Download references

Acknowledgement

This research is sponsored by National Natural Science Foundation of China (Grant No. 62306041, U21B2045, 62176025), Beijing Nova Program (Grant No. Z211100002121106, 20230484488, 20230484276), Youth Innovation Promotion Association CAS (Grant No. 2022132), and Beijing Municipal Science & Technology Commission (Z231100007423015).

Author information

Authors and Affiliations

Beijing University of Posts and Telecommunications, Beijing, China
Xing Cui, Peipei Li, Xuannan Liu & Zhaofeng He
University of California, Santa Barbara, Santa Barbara, USA
Zekun Li
MAIS and NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Huaibo Huang

Authors

Xing Cui
View author publications
You can also search for this author in PubMed Google Scholar
Zekun Li
View author publications
You can also search for this author in PubMed Google Scholar
Peipei Li
View author publications
You can also search for this author in PubMed Google Scholar
Huaibo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xuannan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaofeng He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peipei Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 17799 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cui, X., Li, Z., Li, P., Huang, H., Liu, X., He, Z. (2025). INSTASTYLE: Inversion Noise of a Stylized Image is Secretly a Style Adviser. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15109. Springer, Cham. https://doi.org/10.1007/978-3-031-72983-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-72983-6_26
Published: 29 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72982-9
Online ISBN: 978-3-031-72983-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

INSTASTYLE: Inversion Noise of a Stylized Image is Secretly a Style Adviser

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Diversified image style transfer—approaches, new methods and directed variability control

Training-Free Diffusion Models for Content-Style Synthesis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 17799 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

INSTASTYLE: Inversion Noise of a Stylized Image is Secretly a Style Adviser

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

Diversified image style transfer—approaches, new methods and directed variability control

Training-Free Diffusion Models for Content-Style Synthesis

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 17799 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation