research-article

Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation

Authors:

Ting XiaoAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7837 - 7848

https://doi.org/10.1145/3581783.3611763

Published: 27 October 2023 Publication History

Abstract

Few-shot image generation, which aims to produce plausible and diverse images for one category given a few images from this category, has drawn extensive attention. Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients. However, such an intuitive combination of images/features only exploits the most relevant information for generation, leading to poor diversity and coarse-grained semantic fusion. To remedy this, this paper proposes a novel textural modulation (TexMod) mechanism to inject external semantic signals into internal local representations. Parameterized by the feedback from the discriminator, our TexMod enables more fined-grained semantic injection while maintaining the synthesis fidelity. Moreover, a global structural discriminator (StructD) is developed to explicitly guide the model to generate images with reasonable layout and outline. Furthermore, the frequency awareness of the model is reinforced by encouraging the model to distinguish frequency signals. Together with these techniques, we build a novel and effective model for few-shot image generation. The effectiveness of our model is identified by extensive experiments on three popular datasets and various settings. Besides achieving state-of-the-art synthesis performance on these datasets, our proposed techniques could be seamlessly integrated into existing models for a further performance boost. Our code and models are available at \hrefhttps://github.com/kobeshegu/SDTM-GAN-ACMMM-2023 here.

References

[1]

Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).

[2]

Sergey Bartunov and Dmitry Vetrov. 2018a. Few-shot generative modelling with generative matching networks. In AISTATS.

[3]

Sergey Bartunov and Dmitry P. Vetrov. 2018b. Few-shot Generative Modelling with Generative Matching Networks. In International Conference on Artificial Intelligence and Statistics.

[4]

Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Int. Conf. Learn. Represent.

[5]

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition. IEEE, 67--74.

Digital Library

[6]

Louis Clouâtre and Marc Demers. 2019. FIGR: Few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019).

[7]

Ingrid Daubechies. 1990. The wavelet transform, time-frequency localization and signal analysis. IEEE transactions on information theory, Vol. 36, 5 (1990), 961--1005.

Digital Library

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog. 248--255.

[9]

Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, and Qingming Huang. 2023. Stable Attribute Group Editing for Reliable Few-shot Image Generation. arXiv preprint arXiv:2302.00179 (2023).

[10]

Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu, and Qingming Huang. 2022. Attribute Group Editing for Reliable Few-shot Image Generation. In IEEE Conf. Comput. Vis. Pattern Recog. 11194--11203.

[11]

Qiaole Dong, Chenjie Cao, and Yanwei Fu. 2022. Incremental transformer structure enhanced image inpainting with masking positional encoding. In IEEE Conf. Comput. Vis. Pattern Recog. 11358--11368.

[12]

Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Int. Conf. Mach. Learn. 1126--1135.

[13]

Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, and Zhouhui Lian. 2021. High-fidelity and arbitrary face editing. In IEEE Conf. Comput. Vis. Pattern Recog. 16115--16124.

[14]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Adv. Neural Inform. Process. Syst., Vol. 27 (2014).

[15]

Zheng Gu, Wenbin Li, Jing Huo, Lei Wang, and Yang Gao. 2021. Lofgan: Fusing local representations for few-shot image generation. In ICCV. 8463--8471.

[16]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Adv. Neural Inform. Process. Syst.

[17]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2020a. Matchinggan: Matching-Based Few-Shot Image Generation. In Int. Conf. Multimedia and Expo.

[18]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022a. DeltaGAN: Towards diverse few-shot image generation with sample-specific delta. Eur. Conf. Comput. Vis. (2022).

Digital Library

[19]

Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022b. Few-shot Image Generation Using Discrete Content Representation. In ACM Int. Conf. Multimedia. 2796--2804.

[20]

Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, and Liqing Zhang. 2020b. F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation. In ACM Int. Conf. Multimedia.

Digital Library

[21]

Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Int. Conf. Comput. Vis. 1501--1510.

[22]

Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021a. Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data. Adv. Neural Inform. Process. Syst., Vol. 34 (2021).

[23]

Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021b. Focal frequency loss for image reconstruction and synthesis. In Int. Conf. Comput. Vis. 13919--13929.

[24]

Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Adv. Neural Inform. Process. Syst. 12104--12114.

[25]

Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Adv. Neural Inform. Process. Syst. (2021), 852--863.

[26]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog. 4401--4410.

[27]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In IEEE Conf. Comput. Vis. Pattern Recog. 8110--8119.

[28]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[29]

Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, and Bin Li. 2023. A systematic survey of regularization and normalization in GANs. ACM Comput. Surv., Vol. 55, 11 (2023), 1--37.

Digital Library

[30]

Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, and Bin Li. 2022a. FakeCLR: Exploring contrastive learning for solving latent discontinuity in data-efficient GANs. In Eur. Conf. Comput. Vis. 598--615.

Digital Library

[31]

Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, and Bin Li. 2022b. A comprehensive survey on data-efficient GANs in image generation. arXiv preprint arXiv:2204.08329 (2022).

[32]

Weixin Liang, Zixuan Liu, and Can Liu. 2020. DAWSON: A domain adaptive few shot generation framework. arXiv preprint arXiv:2001.00576 (2020).

[33]

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021. Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis. In ICLR.

[34]

Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Int. Conf. Comput. Vis. 10551--10560.

[35]

Wuyang Luo, Su Yang, Hong Wang, Bo Long, and Weishan Zhang. 2022. Context-Consistent Semantic Image Editing with Style-Preserved Modulation. In Eur. Conf. Comput. Vis. 561--578.

[36]

Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 722--729.

Digital Library

[37]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In IEEE Conf. Comput. Vis. Pattern Recog. 2337--2346.

[38]

Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Assoc. Adv. Artif. Intell., Vol. 32.

[39]

Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In IEEE Conf. Comput. Vis. Pattern Recog. 2287--2296.

[40]

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog. 10684--10695.

[41]

Kuniaki Saito, Kate Saenko, and Ming-Yu Liu. 2020. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Eur. Conf. Comput. Vis. 382--398.

Digital Library

[42]

Axel Sauer, Katja Schwarz, and Andreas Geiger. 2022. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH. 1--10.

[43]

Divya Saxena, Jiannong Cao, Jiahao Xu, and Tarun Kulshrestha. 2023. Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration. In IEEE Conf. Comput. Vis. Pattern Recog. 16230--16240.

[44]

Katja Schwarz, Yiyi Liao, and Andreas Geiger. 2021. On the frequency bias of generative models. Adv. Neural Inform. Process. Syst., Vol. 34 (2021), 18126--18136.

[45]

Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 4 (2020), 2004--2018.

[46]

Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. In IEEE Conf. Comput. Vis. Pattern Recog. 1532--1540.

[47]

Ivan Skorokhodov, Sergey Tulyakov, and Mohamed Elhoseiny. 2022. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In IEEE Conf. Comput. Vis. Pattern Recog. 3626--3636.

[48]

Lucas J Van Vliet, Ian T Young, and Guus L Beckers. 1989. A nonlinear Laplace operator as edge detector in noisy images. Computer vision, graphics, and image processing, Vol. 45, 2 (1989), 167--195.

[49]

Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning. In Adv. Neural Inform. Process. Syst. 3637--3645.

[50]

Cairong Wang, Yiming Zhu, and Chun Yuan. 2022. Diverse Image Inpainting with Normalizing Flow. In Eur. Conf. Comput. Vis. Springer, 53--69.

[51]

Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon, et al. 2019. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog. 10081--10090.

[52]

Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020. Minegan: effective knowledge transfer from gans to target domains with few images. In CVPR. 9332--9341.

[53]

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. 2019. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523 (2019).

[54]

Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, and Bolei Zhou. 2022b. Improving GANs with A Dynamic Discriminator. In Adv. Neural Inform. Process. Syst.

[55]

Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou. 2021. Data-efficient instance generation from instance discrimination. Adv. Neural Inform. Process. Syst. (2021), 9378--9390.

[56]

Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenli Du. 2023. ProtoGAN: Towards high diversity and fidelity image synthesis under limited data. Information Sciences, Vol. 632 (2023), 698--714.

Digital Library

[57]

Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenyi Feng. 2022c. WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation. In ECCV. 1--17.

[58]

Mengping Yang, Zhe Wang, Ziqiu Chi, Yanbing Zhang, et al. 2022d. FreGAN: Exploiting Frequency Components for Training GANs under Limited Data. Adv. Neural Inform. Process. Syst., Vol. 35 (2022).

[59]

Shuai Yang, Liming Jiang, Ziwei Liu, and Chen Change Loy. 2022a. Unsupervised image-to-image translation with generative prior. In IEEE Conf. Comput. Vis. Pattern Recog. 18332--18341.

[60]

Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou. 2023. Towards Smooth Video Composition. Int. Conf. Learn. Represent. (2023).

[61]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog. 586--595.

[62]

Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Adv. Neural Inform. Process. Syst., Vol. 33 (2020).

Cited By

Yang QZhao ZPu YPan SGu JXu D(2025)FNContra: Frequency-domain Negative Sample Mining in Contrastive Learning for limited-data image generationExpert Systems with Applications10.1016/j.eswa.2024.125676263(125676)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125676

Index Terms

Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations

Recommendations

F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

In order to generate images for a given category, existing deep generative models generally rely on abundant training images. However, extensive data acquisition is expensive and fast learning ability from limited data is necessarily required in real-...
WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation
Computer Vision – ECCV 2022
Abstract
Existing few-shot image generation approaches typically employ fusion-based strategies, either on the image or the feature level, to produce new images. However, previous approaches struggle to synthesize high-frequency signals with fine details, ... $_{}$
Few-shot Image Generation Using Discrete Content Representation
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Few-shot image generation and few-shot image translation are two related tasks, both of which aim to generate new images for an unseen category with only a few images. In this work, we make the first attempt to adapt few-shot image translation method to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

China Aerospace Science and Technology Corporation Industry-University-Research Cooperation Foundation of the Eighth Research Institute
Shanghai Science and Technology Program
Natural Science Foundation of China
Chinese Defense Program of Science and Technology

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
124
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yang QZhao ZPu YPan SGu JXu D(2025)FNContra: Frequency-domain Negative Sample Mining in Contrastive Learning for limited-data image generationExpert Systems with Applications10.1016/j.eswa.2024.125676263(125676)Online publication date: Mar-2025
https://doi.org/10.1016/j.eswa.2024.125676

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten