research-article

Open access

Semantic photo manipulation with a generative image prior

Authors:

Hendrik Strobelt,

William Peebles,

Antonio TorralbaAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 38, Issue 4

Article No.: 59, Pages 1 - 11

https://doi.org/10.1145/3306346.3323023

Published: 12 July 2019 Publication History

Abstract

Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.

Supplementary Material

MP4 File (gensub_176.mp4)

Download
266.99 MB

References

[1]

Xiaobo An and Fabio Pellacini. 2008. AppProp: all-pairs appearance-space edit propagation. In ACM Transactions on Graphics (TOG), Vol. 27. ACM, 40.

Digital Library

[2]

Shai Avidan and Ariel Shamir. 2007. Seam carving for content-aware image resizing. In ACM Transactions on graphics (TOG), Vol. 26. ACM, 10.

Digital Library

[3]

Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (ToG) 28, 3 (2009), 24.

Digital Library

[4]

David Bau, Jun-Yan Zhu, Hendrik Strobelt, Zhou Bolei, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In ICLR.

[5]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large scale gan training for high fidelity natural image synthesis. (2019).

[6]

Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Neural photo editing with introspective adversarial networks. In ICLR.

[7]

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS.

Digital Library

[8]

Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In NIPS.

Digital Library

[9]

Frédo Durand and Julie Dorsey. 2002. Fast bilateral filtering for the display of high-dynamic-range images. In ACM transactions on graphics (TOG), Vol. 21. ACM, 257--266.

Digital Library

[10]

Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. CVPR (2016).

[11]

Jiahao Geng, Tianjia Shao, Youyi Zheng, Yanlin Weng, and Kun Zhou. 2018. Warp-guided GANs for single-photo facial animation. In SIGGRAPH Asia. 231.

Digital Library

[12]

Michaël Gharbi, Jiawen Chen, Jonathan T Barron, Samuel W Hasinoff, and Frédo Durand. 2017. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG) 36, 4 (2017), 118.

Digital Library

[13]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS.

Digital Library

[14]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. ACM TOG 35, 4 (2016).

Digital Library

[15]

Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 107.

Digital Library

[16]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR.

[17]

Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of gans for improved quality, stability, and variation. In ICLR.

[18]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR.

[19]

Kevin Karsch, Varsha Hedau, David Forsyth, and Derek Hoiem. 2011. Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics (TOG) 30, 6 (2011), 157.

Digital Library

[20]

Natasha Kholgade, Tomas Simon, Alexei Efros, and Yaser Sheikh. 2014. 3D object manipulation in a single photograph using stock 3D models. ACM Transactions on Graphics (TOG) 33, 4 (2014), 127.

Digital Library

[21]

Tae-Hoon Kim and Sang Il Park. 2018. Deep context-aware descreening and rescreening of halftone images. ACM Transactions on Graphics (TOG) 37, 4 (2018), 48.

Digital Library

[22]

Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.

[23]

Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. ICLR (2014).

[24]

Jean-François Lalonde, Derek Hoiem, Alexei A Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM transactions on graphics (TOG) 26, 3 (2007), 3.

Digital Library

[25]

Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. In ACM transactions on graphics (tog), Vol. 23. ACM, 689--694.

Digital Library

[26]

Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and Jan Kautz. 2018. A closed-form solution to photorealistic image stylization. In ECCV.

[27]

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. In ICLR.

[28]

Koki Nagano, Jaewoo Seo, Jun Xing, Lingyu Wei, Zimo Li, Shunsuke Saito, Aviral Agarwal, Jens Fursund, Hao Li, Richard Roberts, and others. 2018. paGAN: real-time avatars using dynamic textures. In SIGGRAPH Asia. 258.

Digital Library

[29]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. In CVPR.

[30]

Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context Encoders:Feature Learning by Inpainting. CVPR (2016).

[31]

Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, and Jose M Álvarez. 2016. Invertible conditional gans for image editing. In NIPS Workshop on Adversarial Training.

[32]

Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Transactions on graphics (TOG) 22, 3 (2003), 313--318.

Digital Library

[33]

Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. Faceshop: Deep Sketch-based Face Image Editing. ACM Transactions on Graphics (TOG) 37, 4 (July 2018), 99:1--99:13.

Digital Library

[34]

Erik Reinhard, Michael Adhikhmin, Bruce Gooch, and Peter Shirley. 2001. Color transfer between images. IEEE Computer graphics and applications 21, 5 (2001), 34--41.

Digital Library

[35]

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. In CVPR.

[36]

Assaf Shocher, Nadav Cohen, and Michal Irani. 2018. "Zero-Shot" Super-Resolution using Deep Internal Learning. In CVPR.

[37]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.

[38]

Michael W Tao, Micah K Johnson, and Sylvain Paris. 2010. Error-tolerant image compositing. In ECCV.

Digital Library

[39]

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In CVPR.

[40]

Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR.

[41]

Su Xue, Aseem Agarwala, Julie Dorsey, and Holly Rushmeier. 2012. Understanding and improving the realism of image composites. ACM Transactions on Graphics (TOG) 31, 4 (2012), 84.

Digital Library

[42]

Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).

[43]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]

Edward Zhang, Michael F Cohen, and Brian Curless. 2016a. Emptying, refurnishing, and relighting indoor spaces. ACM Transactions on Graphics (TOG) 35, 6 (2016), 174.

Digital Library

[45]

Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017a. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV.

[46]

Richard Zhang, Phillip Isola, and Alexei A Efros. 2016b. Colorful Image Colorization. In ECCV.

[47]

Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. 2017b. Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Transactions on Graphics (TOG) 9, 4 (2017).

Digital Library

[48]

Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV.

[49]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In ICCV.

Cited By

Fang LWang LLi H(2024)Iterative and mixed-spaces image gradient inversion attack in federated learningCybersecurity10.1186/s42400-024-00227-77:1Online publication date: 5-Apr-2024
https://doi.org/10.1186/s42400-024-00227-7
Nakashima YYang MBaba Y(2024)SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645141(675-685)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645141
Hu JHui KLiu ZLi RFu C(2024)Neural Wavelet-domain Diffusion for 3D Shape Generation, Inversion, and ManipulationACM Transactions on Graphics10.1145/363530443:2(1-18)Online publication date: 3-Jan-2024
https://dl.acm.org/doi/10.1145/3635304
Show More Cited By

Index Terms

Semantic photo manipulation with a generative image prior
1. Computing methodologies

Recommendations

Semantic image inpainting with boundary equilibrium GAN
AIPR '19: Proceedings of the 2nd International Conference on Artificial Intelligence and Pattern Recognition

Recently, due to the vigorous development of deep learning, many methods in the field of image inpainting have been proposed which are different from the traditional image inpainting methods. This paper uses the high-quality image generation technology ...
Image Manipulation with Perceptual Discriminators
Computer Vision – ECCV 2018
Abstract
Systems that perform image manipulation using deep convolutional networks have achieved remarkable realism. Perceptual losses and losses based on adversarial discriminators are the two main classes of learning objectives behind these advances. In ...
Photo-realistic dehazing via contextual generative adversarial networks
Abstract
Single image dehazing is a challenging task due to its ambiguous nature. In this paper we present a new model based on generative adversarial networks (GANs) for single image dehazing, called as dehazing GAN. In contrast to estimating the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 38, Issue 4

August 2019

1480 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/3306346

Editor:
Olga Sorkine-Hornung
ETH Zurich

Issue’s Table of Contents

Copyright © 2019 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2019

Published in TOG Volume 38, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

DARPA
NSF

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

206
Total Citations
View Citations
2,403
Total Downloads

Downloads (Last 12 months)224
Downloads (Last 6 weeks)52

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fang LWang LLi H(2024)Iterative and mixed-spaces image gradient inversion attack in federated learningCybersecurity10.1186/s42400-024-00227-77:1Online publication date: 5-Apr-2024
https://doi.org/10.1186/s42400-024-00227-7
Nakashima YYang MBaba Y(2024)SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645141(675-685)Online publication date: 18-Mar-2024
https://dl.acm.org/doi/10.1145/3640543.3645141
Hu JHui KLiu ZLi RFu C(2024)Neural Wavelet-domain Diffusion for 3D Shape Generation, Inversion, and ManipulationACM Transactions on Graphics10.1145/363530443:2(1-18)Online publication date: 3-Jan-2024
https://dl.acm.org/doi/10.1145/3635304
Zhu JShen YXu YZhao DChen QZhou B(2024)In-Domain GAN Inversion for Faithful Reconstruction and EditabilityIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.331087246:5(2607-2621)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1109/TPAMI.2023.3310872
Chen XZhang JWang KWei PLin L(2024)Multi-Person 3D Pose Estimation With Occlusion ReasoningIEEE Transactions on Multimedia10.1109/TMM.2023.327273626(878-889)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TMM.2023.3272736
Miao YZhao XWang JFu XWang Y(2024)Snapshot Compressive Imaging Using Domain-Factorized Deep Video PriorIEEE Transactions on Computational Imaging10.1109/TCI.2023.334630110(93-102)Online publication date: 2024
https://doi.org/10.1109/TCI.2023.3346301
Kong ZChaudhuri K(2024)Data Redaction from Conditional Generative Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00035(569-591)Online publication date: 9-Apr-2024
https://doi.org/10.1109/SaTML59370.2024.00035
Zhang WLiu YChen LShi JHong XWang X(2024)Semantically-Disentangled Progressive Image Compression for Deep Space Communications: Exploring the Ultra-Low Rate RegimeIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.336965442:5(1130-1144)Online publication date: 26-Feb-2024
https://dl.acm.org/doi/10.1109/JSAC.2024.3369654
Ye NSun Y(2024)CDM: Text-Driven Image Editing with Composable Diffusion Models2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649901(1-8)Online publication date: 30-Jun-2024
https://doi.org/10.1109/IJCNN60899.2024.10649901
Demir İÇiftçi U(2024)MixSyn: Compositional Image Synthesis with Fuzzy Masks and Style Fusion2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00741(7460-7469)Online publication date: 17-Jun-2024
https://doi.org/10.1109/CVPRW63382.2024.00741
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents