Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3503161.3548415acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Customizing GAN Using Few-shot Sketches

Published: 10 October 2022 Publication History

Abstract

Generative adversarial networks (GANs) have demonstrated remarkable success in image synthesis applications, but their performance deteriorates under limited data regimes. The fundamental challenge is that it is extremely difficult to synthesize photo-realistic and highly diversified images while capturing meaningful attributes of the targets under minimum supervision. Previous methods either fine-tune or rewrite the model weights to adapt to few-shot datasets. However, this either overfits or requires access to large-scale data on which they are trained. To tackle the problem, we propose a framework that repurposes the existing pre-trained generative models using only a few samples (e.g., <30) of sketches. Unlike previous works, we transfer the sample diversity and quality without accessing the source data using inter-domain distance consistency. By employing cross-domain adversarial learning, we encourage the model output to closely resemble the input sketches in both shape and pose. Extensive experiments show that our method significantly outperforms the existing approaches in terms of sample quality and diversity. The qualitative and quantitative results on various standard datasets also demonstrate its efficacy. On the most popularly used dataset, Gabled church, we achieve a Fréchet inception distance (FID) score of 15.63.

Supplementary Material

MP4 File (MM22-fp3116.mp4)
Our approach adapts an off-the-shelf GAN to the input sketch by feeding it one or more hand-drawn sketches. The pre-trained and customized models both make use of the same noise z. While the geometry and position of an object are modified by our new model, other visual signals like color, texture, and background are accurately maintained. The realism from the pre-trained source model is adapted by optimizing the image-based adversarial objective. The synthesized samples from the target model are transformed into sketches via an image to sketch model and then input to the sketch discriminator for adversarial loss. As our training data is extremely low, to avoid overfitting and mode collapse we use inter-domain distance consistency loss on feature level between source and target model. After extensive evaluation, the proposed approach surpasses the existing state-of-the-art methods.

References

[1]
Rameen Abdal, Yipeng Qin, and Peter Wonka. 2019. Image2stylegan: How to embed images into the stylegan latent space?. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4432--4441.
[2]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2018. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In International Conference on Learning Representations.
[3]
Tu Bui, Leo Ribeiro, Moacir Ponti, and John Collomosse. 2017. Compact descrip- tors for sketch-based image retrieval using a triplet loss convolutional neural network. Computer Vision and Image Understanding 164 (2017), 27--37.
[4]
Salman Cheema, Sumit Gulwani, and Joseph LaViola. 2012. QuickDraw: improving drawing experience for geometric diagrams. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 1037--1064.
[5]
Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, and Shi-Min Hu. 2009. Sketch2photo: Internet image montage. ACM Transactions on Graphics (TOG) 28, 5 (2009), 1--10.
[6]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning. PMLR, 1597--1607.
[7]
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8789--8797.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. Ieee, 248--255.
[9]
August DuMont Schütte, Jürgen Hetzel, Sergios Gatidis, Tobias Hepp, Benedikt Dietz, Stefan Bauer, and Patrick Schwab. 2021. Overcoming barriers to data sharing with medical image generation: a comprehensive evaluation. NPJ Digital Medicine 4, 1 (2021), 1--14.
[10]
Mathias Eitz, Ronald Richter, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2011. Photosketcher: interactive sketch-based image synthesis. IEEE Computer Graphics and Applications 31, 6 (2011), 56--66.
[11]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta- learning for fast adaptation of deep networks. In International Conference on Machine Learning. PMLR, 1126--1135.
[12]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 (2014).
[13]
Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. Advances in Neural Information Processing Systems 30 (2017).
[14]
Erik Härkönen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. 2020. Ganspace: Discovering interpretable gan controls. Advances in Neural Information Processing Systems 33 (2020), 9841--9850.
[15]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9729--9738.
[16]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.
[17]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to- image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1125--1134.
[18]
Ali Jahanian, Xavier Puig, Yonglong Tian, and Phillip Isola. 2021. Generative Models as a Data Source for Multiview Representation Learning. arXiv preprint arXiv:2106.05258 (2021).
[19]
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual losses for real- time style transfer and super-resolution. In European Conference on Computer Vision. Springer, 694--711.
[20]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems 33 (2020), 12104--12114.
[21]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4401--4410.
[22]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8110--8119.
[23]
Durk P Kingma and Prafulla Dhariwal. 2018. Glow: Generative flow with in-vertible 1x1 convolutions. Advances in Neural Information Processing Systems 31 (2018).
[24]
Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunning- ham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al. 2017. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4681--4690.
[25]
Mengtian Li, Zhe Lin, Radomir Mech, Ersin Yumer, and Deva Ramanan. 2019. Photo-sketching: Inferring contour drawings from images. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 1403--1412.
[26]
Yijun Li, Richard Zhang, Jingwan Cynthia Lu, and Eli Shechtman. 2020. Few-shot Image Generation with Elastic Weight Consolidation. Advances in Neural Information Processing Systems 33 (2020), 15885--15896.
[27]
Yen-Liang Lin, Cheng-Yu Huang, Hao-Jeng Wang, and Winston Hsu. 2013. 3D sub-query expansion for improving sketch-based multi-view image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3495--3502.
[28]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 10551--10560.
[29]
Xudong Mao, Qing Li, Haoran Xie, Raymond YK Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2794--2802.
[30]
Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. 2018. Spectral Normalization for Generative Adversarial Networks. In International Conference on Learning Representations.
[31]
Sangwoo Mo, Minsu Cho, and Jinwoo Shin. 2020. Freeze the discriminator: a simple baseline for fine-tuning gans. arXiv preprint arXiv:2002.10964 (2020).
[32]
Atsuhiro Noguchi and Tatsuya Harada. 2019. Image generation from small datasets via batch statistics adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2750--2758.
[33]
Utkarsh Ojha, Yijun Li, Jingwan Lu, Alexei A Efros, Yong Jae Lee, Eli Shechtman, and Richard Zhang. 2021. Few-shot image generation via cross-domain correspondence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10743--10752.
[34]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2337--2346.
[35]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. 8024--8035.
[36]
Or Patashnik, Zongze Wu, Eli Shechtman, Daniel Cohen-Or, and Dani Lischinski. 2021. Styleclip: Text-driven manipulation of stylegan imagery. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2085--2094.
[37]
Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representa- tion learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).
[38]
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2287--2296.
[39]
Esther Robb, Wen-Sheng Chu, Abhishek Kumar, and Jia-Bin Huang. 2020. Few-shot adaptation of generative adversarial networks. arXiv preprint arXiv:2010.11943 (2020).
[40]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. Advances in Neural Information Processing Systems 29 (2016).
[41]
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 1--12.
[42]
Tamar Rott Shaham, Tali Dekel, and Tomer Michaeli. 2019. Singan: Learning a generative model from a single natural image. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4570--4580.
[43]
Yujun Shen and Bolei Zhou. 2021. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1532--1540.
[44]
Assaf Shocher, Shai Bagon, Phillip Isola, and Michal Irani. 2018. Ingan: Capturing and remapping the" dna" of a natural image. arXiv preprint arXiv:1812.00231 (2018).
[45]
Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, and Alexei A Efros. 2011. Data-driven visual similarity for cross-domain image matching. In Proceed- ings of the 2011 SIGGRAPH Asia Conference. 1--10.
[46]
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2020. Score-Based Generative Modeling through Stochastic Differential Equations. In International Conference on Learning Repre- sentations.
[47]
Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.
[48]
Shuhan Tan, Yujun Shen, and Bolei Zhou. 2020. Improving the fairness of deep generative models without retraining. arXiv preprint arXiv:2012.04842 (2020).
[49]
Aaron Van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation learning with contrastive predictive coding. arXiv e-prints (2018), arXiv--1807.
[50]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al . 2016. Matching networks for one shot learning. Advances in Neural Information Processing Systems 29 (2016).
[51]
Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. 2021. Sketch your own gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14050--14060.
[52]
Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020. Minegan: effective knowledge transfer from gans to target domains with few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9332--9341.
[53]
Yaxing Wang, Chenshen Wu, Luis Herranz, Joost van de Weijer, Abel Gonzalez- Garcia, and Bogdan Raducanu. 2018. Transferring gans: generating images from limited data. In Proceedings of the European Conference on Computer Vision (ECCV). 218--234.
[54]
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. 2015. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015).
[55]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S Huang. 2018. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5505--5514.
[56]
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.
[57]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2019. Self- attention generative adversarial networks. In International Conference on Machine Learning. PMLR, 7354--7363.
[58]
Richard Zhang, Phillip Isola, and Alexei A Efros. 2016. Colorful image colorization. In European Conference on Computer Vision. Springer, 649--666.
[59]
Miaoyun Zhao, Yulai Cong, and Lawrence Carin. 2020. On leveraging pretrained GANs for generation with limited data. In International Conference on Machine Learning. PMLR, 11340--11351.
[60]
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Advances in Neural Information Processing Systems 33 (2020), 7559--7570.
[61]
Jiapeng Zhu, Yujun Shen, Deli Zhao, and Bolei Zhou. 2020. In-domain gan inversion for real image editing. In European Conference on Computer Vision. Springer, 592--608.
[62]
Jun-Yan Zhu, Philipp Krähenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In European Conference on Computer Vision. Springer, 597--613.
[63]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232

Cited By

View all
  • (2024)SMFS‐GAN: Style‐Guided Multi‐class Freehand Sketch‐to‐Image SynthesisComputer Graphics Forum10.1111/cgf.1519043:6Online publication date: 7-Aug-2024
  • (2024)CFTS-GAN: Continual Few-Shot Teacher Student for Generative Adversarial NetworksPattern Recognition10.1007/978-3-031-78389-0_17(249-262)Online publication date: 5-Dec-2024
  • (2023)When is a Tool a Tool? User Perceptions of System Agency in Human–AI Co-Creative DrawingProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3595977(1978-1996)Online publication date: 10-Jul-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '22: Proceedings of the 30th ACM International Conference on Multimedia
October 2022
7537 pages
ISBN:9781450392037
DOI:10.1145/3503161
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain adaptation
  2. generative adversarial network
  3. image synthesis

Qualifiers

  • Research-article

Conference

MM '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)61
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)SMFS‐GAN: Style‐Guided Multi‐class Freehand Sketch‐to‐Image SynthesisComputer Graphics Forum10.1111/cgf.1519043:6Online publication date: 7-Aug-2024
  • (2024)CFTS-GAN: Continual Few-Shot Teacher Student for Generative Adversarial NetworksPattern Recognition10.1007/978-3-031-78389-0_17(249-262)Online publication date: 5-Dec-2024
  • (2023)When is a Tool a Tool? User Perceptions of System Agency in Human–AI Co-Creative DrawingProceedings of the 2023 ACM Designing Interactive Systems Conference10.1145/3563657.3595977(1978-1996)Online publication date: 10-Jul-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media