Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Semantic photo manipulation with a generative image prior

Published: 12 July 2019 Publication History

Abstract

Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating the high-level attributes of an existing natural photograph with GANs is challenging for two reasons. First, it is hard for GANs to precisely reproduce an input image. Second, after manipulation, the newly synthesized pixels often do not fit the original image. In this paper, we address these issues by adapting the image prior learned by GANs to image statistics of an individual image. Our method can accurately reconstruct the input image and synthesize new content, consistent with the appearance of the input image. We demonstrate our interactive system on several semantic image editing tasks, including synthesizing new objects consistent with background, removing unwanted objects, and changing the appearance of an object. Quantitative and qualitative comparisons against several existing methods demonstrate the effectiveness of our method.

Supplementary Material

MP4 File (gensub_176.mp4)

References

[1]
Xiaobo An and Fabio Pellacini. 2008. AppProp: all-pairs appearance-space edit propagation. In ACM Transactions on Graphics (TOG), Vol. 27. ACM, 40.
[2]
Shai Avidan and Ariel Shamir. 2007. Seam carving for content-aware image resizing. In ACM Transactions on graphics (TOG), Vol. 26. ACM, 10.
[3]
Connelly Barnes, Eli Shechtman, Adam Finkelstein, and Dan B Goldman. 2009. Patch-Match: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (ToG) 28, 3 (2009), 24.
[4]
David Bau, Jun-Yan Zhu, Hendrik Strobelt, Zhou Bolei, Joshua B. Tenenbaum, William T. Freeman, and Antonio Torralba. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In ICLR.
[5]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large scale gan training for high fidelity natural image synthesis. (2019).
[6]
Andrew Brock, Theodore Lim, James M Ritchie, and Nick Weston. 2017. Neural photo editing with introspective adversarial networks. In ICLR.
[7]
Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, and Pieter Abbeel. 2016. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In NIPS.
[8]
Alexey Dosovitskiy and Thomas Brox. 2016. Generating images with perceptual similarity metrics based on deep networks. In NIPS.
[9]
Frédo Durand and Julie Dorsey. 2002. Fast bilateral filtering for the display of high-dynamic-range images. In ACM transactions on graphics (TOG), Vol. 21. ACM, 257--266.
[10]
Leon A Gatys, Alexander S Ecker, and Matthias Bethge. 2016. Image Style Transfer Using Convolutional Neural Networks. CVPR (2016).
[11]
Jiahao Geng, Tianjia Shao, Youyi Zheng, Yanlin Weng, and Kun Zhou. 2018. Warp-guided GANs for single-photo facial animation. In SIGGRAPH Asia. 231.
[12]
Michaël Gharbi, Jiawen Chen, Jonathan T Barron, Samuel W Hasinoff, and Frédo Durand. 2017. Deep bilateral learning for real-time image enhancement. ACM Transactions on Graphics (TOG) 36, 4 (2017), 118.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In NIPS.
[14]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2016. Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. ACM TOG 35, 4 (2016).
[15]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. 2017. Globally and locally consistent image completion. ACM Transactions on Graphics (TOG) 36, 4 (2017), 107.
[16]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In CVPR.
[17]
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of gans for improved quality, stability, and variation. In ICLR.
[18]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In CVPR.
[19]
Kevin Karsch, Varsha Hedau, David Forsyth, and Derek Hoiem. 2011. Rendering synthetic objects into legacy photographs. ACM Transactions on Graphics (TOG) 30, 6 (2011), 157.
[20]
Natasha Kholgade, Tomas Simon, Alexei Efros, and Yaser Sheikh. 2014. 3D object manipulation in a single photograph using stock 3D models. ACM Transactions on Graphics (TOG) 33, 4 (2014), 127.
[21]
Tae-Hoon Kim and Sang Il Park. 2018. Deep context-aware descreening and rescreening of halftone images. ACM Transactions on Graphics (TOG) 37, 4 (2018), 48.
[22]
Diederik Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
[23]
Diederik P Kingma and Max Welling. 2014. Auto-encoding variational bayes. ICLR (2014).
[24]
Jean-François Lalonde, Derek Hoiem, Alexei A Efros, Carsten Rother, John Winn, and Antonio Criminisi. 2007. Photo clip art. ACM transactions on graphics (TOG) 26, 3 (2007), 3.
[25]
Anat Levin, Dani Lischinski, and Yair Weiss. 2004. Colorization using optimization. In ACM transactions on graphics (tog), Vol. 23. ACM, 689--694.
[26]
Yijun Li, Ming-Yu Liu, Xueting Li, Ming-Hsuan Yang, and Jan Kautz. 2018. A closed-form solution to photorealistic image stylization. In ECCV.
[27]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral normalization for generative adversarial networks. In ICLR.
[28]
Koki Nagano, Jaewoo Seo, Jun Xing, Lingyu Wei, Zimo Li, Shunsuke Saito, Aviral Agarwal, Jens Fursund, Hao Li, Richard Roberts, and others. 2018. paGAN: real-time avatars using dynamic textures. In SIGGRAPH Asia. 258.
[29]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic Image Synthesis with Spatially-Adaptive Normalization. In CVPR.
[30]
Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell, and Alexei A Efros. 2016. Context Encoders:Feature Learning by Inpainting. CVPR (2016).
[31]
Guim Perarnau, Joost van de Weijer, Bogdan Raducanu, and Jose M Álvarez. 2016. Invertible conditional gans for image editing. In NIPS Workshop on Adversarial Training.
[32]
Patrick Pérez, Michel Gangnet, and Andrew Blake. 2003. Poisson image editing. ACM Transactions on graphics (TOG) 22, 3 (2003), 313--318.
[33]
Tiziano Portenier, Qiyang Hu, Attila Szabó, Siavash Arjomand Bigdeli, Paolo Favaro, and Matthias Zwicker. 2018. Faceshop: Deep Sketch-based Face Image Editing. ACM Transactions on Graphics (TOG) 37, 4 (July 2018), 99:1--99:13.
[34]
Erik Reinhard, Michael Adhikhmin, Bruce Gooch, and Peter Shirley. 2001. Color transfer between images. IEEE Computer graphics and applications 21, 5 (2001), 34--41.
[35]
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. 2017. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. In CVPR.
[36]
Assaf Shocher, Nadav Cohen, and Michal Irani. 2018. "Zero-Shot" Super-Resolution using Deep Internal Learning. In CVPR.
[37]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
[38]
Michael W Tao, Micah K Johnson, and Sylvain Paris. 2010. Error-tolerant image compositing. In ECCV.
[39]
Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep image prior. In CVPR.
[40]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR.
[41]
Su Xue, Aseem Agarwala, Julie Dorsey, and Holly Rushmeier. 2012. Understanding and improving the realism of image composites. ACM Transactions on Graphics (TOG) 31, 4 (2012), 84.
[42]
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).
[43]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2018. Generative Image Inpainting With Contextual Attention. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[44]
Edward Zhang, Michael F Cohen, and Brian Curless. 2016a. Emptying, refurnishing, and relighting indoor spaces. ACM Transactions on Graphics (TOG) 35, 6 (2016), 174.
[45]
Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017a. StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks. In ICCV.
[46]
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016b. Colorful Image Colorization. In ECCV.
[47]
Richard Zhang, Jun-Yan Zhu, Phillip Isola, Xinyang Geng, Angela S Lin, Tianhe Yu, and Alexei A Efros. 2017b. Real-Time User-Guided Image Colorization with Learned Deep Priors. ACM Transactions on Graphics (TOG) 9, 4 (2017).
[48]
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A. Efros. 2016. Generative Visual Manipulation on the Natural Image Manifold. In ECCV.
[49]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. In ICCV.

Cited By

View all
  • (2024)Iterative and mixed-spaces image gradient inversion attack in federated learningCybersecurity10.1186/s42400-024-00227-77:1Online publication date: 5-Apr-2024
  • (2024)SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645141(675-685)Online publication date: 18-Mar-2024
  • (2024)Neural Wavelet-domain Diffusion for 3D Shape Generation, Inversion, and ManipulationACM Transactions on Graphics10.1145/363530443:2(1-18)Online publication date: 3-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 38, Issue 4
August 2019
1480 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3306346
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2019
Published in TOG Volume 38, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. deep learning
  2. generative adversarial networks
  3. image editing
  4. vision for graphics

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)224
  • Downloads (Last 6 weeks)52
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Iterative and mixed-spaces image gradient inversion attack in federated learningCybersecurity10.1186/s42400-024-00227-77:1Online publication date: 5-Apr-2024
  • (2024)SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space ExplorationProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645141(675-685)Online publication date: 18-Mar-2024
  • (2024)Neural Wavelet-domain Diffusion for 3D Shape Generation, Inversion, and ManipulationACM Transactions on Graphics10.1145/363530443:2(1-18)Online publication date: 3-Jan-2024
  • (2024)In-Domain GAN Inversion for Faithful Reconstruction and EditabilityIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.331087246:5(2607-2621)Online publication date: 1-Feb-2024
  • (2024)Multi-Person 3D Pose Estimation With Occlusion ReasoningIEEE Transactions on Multimedia10.1109/TMM.2023.327273626(878-889)Online publication date: 1-Jan-2024
  • (2024)Snapshot Compressive Imaging Using Domain-Factorized Deep Video PriorIEEE Transactions on Computational Imaging10.1109/TCI.2023.334630110(93-102)Online publication date: 2024
  • (2024)Data Redaction from Conditional Generative Models2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)10.1109/SaTML59370.2024.00035(569-591)Online publication date: 9-Apr-2024
  • (2024)Semantically-Disentangled Progressive Image Compression for Deep Space Communications: Exploring the Ultra-Low Rate RegimeIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.336965442:5(1130-1144)Online publication date: 26-Feb-2024
  • (2024)CDM: Text-Driven Image Editing with Composable Diffusion Models2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10649901(1-8)Online publication date: 30-Jun-2024
  • (2024)MixSyn: Compositional Image Synthesis with Fuzzy Masks and Style Fusion2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)10.1109/CVPRW63382.2024.00741(7460-7469)Online publication date: 17-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media