Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Designing an encoder for StyleGAN image manipulation

Published: 19 July 2021 Publication History

Abstract

Recently, there has been a surge of diverse methods for performing image editing by employing pre-trained unconditional generators. Applying these methods on real images, however, remains a challenge, as it necessarily requires the inversion of the images into their latent space. To successfully invert a real image, one needs to find a latent code that reconstructs the input image accurately, and more importantly, allows for its meaningful manipulation. In this paper, we carefully study the latent space of StyleGAN, the state-of-the-art unconditional generator. We identify and analyze the existence of a distortion-editability tradeoff and a distortion-perception tradeoff within the StyleGAN latent space. We then suggest two principles for designing encoders in a manner that allows one to control the proximity of the inversions to regions that StyleGAN was originally trained on. We present an encoder based on our two principles that is specifically designed for facilitating editing on real images by balancing these tradeoffs. By evaluating its performance qualitatively and quantitatively on numerous challenging domains, including cars and horses, we show that our inversion method, followed by common editing techniques, achieves superior real-image editing quality, with only a small reconstruction accuracy drop.

Supplementary Material

VTT File (3450626.3459838.vtt)
ZIP File (a133-tov.zip)
a133-tov.zip
MP4 File (a133-tov.mp4)
MP4 File (3450626.3459838.mp4)
Presentation.

References

[1]
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE international conference on computer vision. 4432--4441.
[2]
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020a. Image2StyleGAN++: How to Edit the Embedded Images?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8296--8305.
[3]
Rameen Abdal, Peihao Zhu, Niloy Mitra, and Peter Wonka. 2020b. StyleFlow: Attribute-conditioned Exploration of StyleGAN-Generated Images using Conditional Continuous Normalizing Flows. arXiv:2008.02401 [cs.CV]
[4]
Yuval Alaluf, Or Patashnik, and Daniel Cohen-Or. 2021. Only a Matter of Style: Age Transformation Using a Style-Based Regression Model. arXiv:2102.02754 [cs.CV]
[5]
Baylies. 2019. stylegan-encoder. https://github.com/pbaylies/stylegan-encoder. Accessed: April 2020.
[6]
Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6228--6237.
[7]
Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. 2020. Improved Baselines with Momentum Contrastive Learning. arXiv:2003.04297 [cs.CV]
[8]
Edo Collins, Raja Bala, Bob Price, and Sabine Susstrunk. 2020. Editing in Style: Uncovering the Local Semantics of GANs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5771--5780.
[9]
Antonia Creswell and Anil Anthony Bharath. 2018. Inverting the generator of a generative adversarial network. IEEE transactions on neural networks and learning systems 30, 7 (2018), 1967--1974.
[10]
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. 2019. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4690--4699.
[11]
Emily Denton, Ben Hutchinson, Margaret Mitchell, and Timnit Gebru. 2019. Detecting bias with generative counterfactual face attribute augmentation. arXiv preprint arXiv:1906.06439 (2019).
[12]
Rinon Gal, Dana Cohen, Amit Bermano, and Daniel Cohen-Or. 2021. SWAGAN: A Style-based Wavelet-driven Generative Model. arXiv:2102.06108 [cs.CV]
[13]
Lore Goetschalckx, Alex Andonian, Aude Oliva, and Phillip Isola. 2019. GANalyze: Toward Visual Definitions of Cognitive Image Properties. arXiv:1906.10112 [cs.CV]
[14]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2 (Montreal, Canada) (NIPS'14). MIT Press, Cambridge, MA, USA, 2672--2680.
[15]
Shanyan Guan, Ying Tai, Bingbing Ni, Feida Zhu, Feiyue Huang, and Xiaokang Yang. 2020. Collaborative Learning for Faster StyleGAN Embedding. arXiv preprint arXiv:2007.01758 (2020).
[16]
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. GANSpace: Discovering Interpretable GAN Controls. arXiv preprint arXiv:2004.02546 (2020).
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep Residual Learning for Image Recognition. arXiv:1512.03385 [cs.CV]
[18]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2018. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. arXiv:1706.08500 [cs.LG]
[19]
Ali Jahanian, Lucy Chai, and Phillip Isola. 2019. On the "steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171 (2019).
[20]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2017. Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 (2017).
[21]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Proc. NeurIPS.
[22]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4401--4410.
[23]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.
[24]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3D Object Representations for Fine-Grained Categorization. In 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-13). Sydney, Australia.
[25]
Gihyun Kwon and Jong Chul Ye. 2021. Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation. arXiv:2103.16146 [cs.CV]
[26]
Zachary C Lipton and Subarna Tripathi. 2017. Precise recovery of latent vectors from generative adversarial networks. arXiv preprint arXiv:1702.04782 (2017).
[27]
Yunfan Liu, Qi Li, Zhenan Sun, and Tieniu Tan. 2020. Style Intervention: How to Achieve Spatial Disentanglement with Style-based Generators? arXiv:2011.09699 [cs.CV]
[28]
Sachit Menon, Alexandru Damian, Shijia Hu, Nikhil Ravi, and Cynthia Rudin. 2020. PULSE: Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2437--2445.
[29]
Lars Mescheder, Andreas Geiger, and Sebastian Nowozin. 2018. Which training methods for GANs do actually converge? arXiv preprint arXiv:1801.04406 (2018).
[30]
Yotam Nitzan, A. Bermano, Yangyan Li, and D. Cohen-Or. 2020. Face identity disentanglement via latent space mapping. ACM Transactions on Graphics (TOG) 39 (2020), 1 -- 14.
[31]
Guim Perarnau, Joost Van De Weijer, Bogdan Raducanu, and Jose M Álvarez. 2016. Invertible conditional gans for image editing. arXiv preprint arXiv:1611.06355 (2016).
[32]
Stanislav Pidhorskyi, Donald Adjeroh, and Gianfranco Doretto. 2020. Adversarial Latent Autoencoders. arXiv:2004.04467 [cs.LG]
[33]
Antoine Plumerault, Hervé Le Borgne, and Céline Hudelot. 2020. Controlling generative models with continuous factors of variations. arXiv preprint arXiv:2001.10238 (2020).
[34]
Julien Rabin, Gabriel Peyré, Julie Delon, and Marc Bernot. 2012. Wasserstein Barycenter and Its Application to Texture Mixing. In Scale Space and Variational Methods in Computer Vision, Alfred M. Bruckstein, Bart M. ter Haar Romeny, Alexander M. Bronstein, and Michael M. Bronstein (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 435--446.
[35]
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2020. Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation. arXiv preprint arXiv:2008.00951 (2020).
[36]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016).
[37]
Omry Sendik, Dani Lischinski, and Daniel Cohen-Or. 2020. Unsupervised K-modal Styled Content Generation. ACM Transactions on Graphics (TOG) (2020).
[38]
Kim Seonghyeon. 2019. stylegan2-pytorch. https://github.com/rosinality/stylegan2-pytorch.
[39]
Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. 2020. Interpreting the latent space of gans for semantic face editing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9243--9252.
[40]
Yujun Shen and Bolei Zhou. 2020. Closed-Form Factorization of Latent Semantics in GANs. arXiv preprint arXiv:2007.06600 (2020).
[41]
Nurit Spingarn-Eliezer, Ron Banner, and Tomer Michaeli. 2020. GAN Steerability without optimization. arXiv preprint arXiv:2012.05328 (2020).
[42]
Ayush Tewari, Mohamed Elgharib, Gaurav Bharaj, Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020a. StyleRig: Rigging StyleGAN for 3D Control over Portrait Images. arXiv preprint arXiv:2004.00121 (2020).
[43]
Ayush Tewari, Mohamed Elgharib, Mallikarjun B R., Florian Bernard, Hans-Peter Seidel, Patrick Pérez, Michael Zollhöfer, and Christian Theobalt. 2020b. PIE: Portrait Image Embedding for Semantic Control. arXiv:2009.09485 [cs.CV]
[44]
Andrey Voynov and Artem Babenko. 2020. Unsupervised Discovery of Interpretable Directions in the GAN Latent Space. arXiv preprint arXiv:2002.03754 (2020).
[45]
Binxu Wang and Carlos R Ponce. 2021. A Geometric Analysis of Deep Generative Image Models and Its Applications. In International Conference on Learning Representations. https://openreview.net/forum?id=GH7QRzUDdXG
[46]
Zongze Wu, Dani Lischinski, and Eli Shechtman. 2020. StyleSpace Analysis: Disentangled Controls for StyleGAN Image Generation. arXiv:2011.12799 [cs.CV]
[47]
Jonas Wulff and Antonio Torralba. 2020. Improving Inversion and Generation Diversity in StyleGAN using a Gaussianized Latent Space. arXiv:2009.06529 [cs.CV]
[48]
Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, and Ming-Hsuan Yang. 2021. GAN Inversion: A Survey. arXiv:2101.05278 [cs.CV]
[49]
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2016. LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop. arXiv:1506.03365 [cs.CV]
[50]
Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. 2018. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. arXiv:1801.03924 [cs.CV]
[51]
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020b. In-domain gan inversion for real image editing. arXiv preprint arXiv:2004.00049 (2020).
[52]
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European conference on computer vision. Springer, 597--613.
[53]
Peihao Zhu, Rameen Abdal, Yipeng Qin, and Peter Wonka. 2020a. Improved StyleGAN Embedding: Where are the Good Latents? arXiv:2012.09036 [cs.CV]

Cited By

View all
  • (2025)FICE: Text-conditioned fashion-image editing with guided GAN inversionPattern Recognition10.1016/j.patcog.2024.111022158(111022)Online publication date: Feb-2025
  • (2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
  • (2025)LoopNet for fine-grained fashion attributes editingExpert Systems with Applications10.1016/j.eswa.2024.125182259(125182)Online publication date: Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 40, Issue 4
August 2021
2170 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3450626
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 July 2021
Published in TOG Volume 40, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. generative adversarial networks
  2. image editing
  3. latent space

Qualifiers

  • Research-article

Funding Sources

  • Israel Science Foundation

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)568
  • Downloads (Last 6 weeks)43
Reflects downloads up to 23 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2025)FICE: Text-conditioned fashion-image editing with guided GAN inversionPattern Recognition10.1016/j.patcog.2024.111022158(111022)Online publication date: Feb-2025
  • (2025)SwapInpaint2: Towards high structural consistency in identity-guided inpainting via background-preserving GAN inversionPattern Recognition10.1016/j.patcog.2024.110969158(110969)Online publication date: Feb-2025
  • (2025)LoopNet for fine-grained fashion attributes editingExpert Systems with Applications10.1016/j.eswa.2024.125182259(125182)Online publication date: Jan-2025
  • (2024)TB-SMGAN: A GAN Based Hybrid Data Augmentation Framework on Chest X-ray Images and ReportsGazi University Journal of Science Part A: Engineering and Innovation10.54287/gujsa.150109811:3(497-506)Online publication date: 28-Sep-2024
  • (2024)Similarity-Based Three-Way Clustering by Using Dimensionality ReductionMathematics10.3390/math1213195112:13(1951)Online publication date: 24-Jun-2024
  • (2024)ControlFace: Feature Disentangling for Controllable Face SwappingJournal of Imaging10.3390/jimaging1001002110:1(21)Online publication date: 11-Jan-2024
  • (2024)Video and Audio Deepfake Datasets and Open Issues in Deepfake Technology: Being Ahead of the CurveForensic Sciences10.3390/forensicsci40300214:3(289-377)Online publication date: 13-Jul-2024
  • (2024)Frequency-Auxiliary One-Shot Domain Adaptation of Generative Adversarial NetworksElectronics10.3390/electronics1313264313:13(2643)Online publication date: 5-Jul-2024
  • (2024)RSCAN: Residual Spatial Cross-Attention Network for High-Fidelity Architectural Image Editing by Fusing Multi-Latent SpacesElectronics10.3390/electronics1312232713:12(2327)Online publication date: 14-Jun-2024
  • (2024)Insights and Considerations in Development and Performance Evaluation of Generative Adversarial Networks (GANs): What Radiologists Need to KnowDiagnostics10.3390/diagnostics1416175614:16(1756)Online publication date: 13-Aug-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media