Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3581783.3611763acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation

Published: 27 October 2023 Publication History

Abstract

Few-shot image generation, which aims to produce plausible and diverse images for one category given a few images from this category, has drawn extensive attention. Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients. However, such an intuitive combination of images/features only exploits the most relevant information for generation, leading to poor diversity and coarse-grained semantic fusion. To remedy this, this paper proposes a novel textural modulation (TexMod) mechanism to inject external semantic signals into internal local representations. Parameterized by the feedback from the discriminator, our TexMod enables more fined-grained semantic injection while maintaining the synthesis fidelity. Moreover, a global structural discriminator (StructD) is developed to explicitly guide the model to generate images with reasonable layout and outline. Furthermore, the frequency awareness of the model is reinforced by encouraging the model to distinguish frequency signals. Together with these techniques, we build a novel and effective model for few-shot image generation. The effectiveness of our model is identified by extensive experiments on three popular datasets and various settings. Besides achieving state-of-the-art synthesis performance on these datasets, our proposed techniques could be seamlessly integrated into existing models for a further performance boost. Our code and models are available at \hrefhttps://github.com/kobeshegu/SDTM-GAN-ACMMM-2023 here.

References

[1]
Antreas Antoniou, Amos Storkey, and Harrison Edwards. 2017. Data augmentation generative adversarial networks. arXiv preprint arXiv:1711.04340 (2017).
[2]
Sergey Bartunov and Dmitry Vetrov. 2018a. Few-shot generative modelling with generative matching networks. In AISTATS.
[3]
Sergey Bartunov and Dmitry P. Vetrov. 2018b. Few-shot Generative Modelling with Generative Matching Networks. In International Conference on Artificial Intelligence and Statistics.
[4]
Andrew Brock, Jeff Donahue, and Karen Simonyan. 2019. Large Scale GAN Training for High Fidelity Natural Image Synthesis. In Int. Conf. Learn. Represent.
[5]
Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. 2018. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition. IEEE, 67--74.
[6]
Louis Clouâtre and Marc Demers. 2019. FIGR: Few-shot image generation with reptile. arXiv preprint arXiv:1901.02199 (2019).
[7]
Ingrid Daubechies. 1990. The wavelet transform, time-frequency localization and signal analysis. IEEE transactions on information theory, Vol. 36, 5 (1990), 961--1005.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In IEEE Conf. Comput. Vis. Pattern Recog. 248--255.
[9]
Guanqi Ding, Xinzhe Han, Shuhui Wang, Xin Jin, Dandan Tu, and Qingming Huang. 2023. Stable Attribute Group Editing for Reliable Few-shot Image Generation. arXiv preprint arXiv:2302.00179 (2023).
[10]
Guanqi Ding, Xinzhe Han, Shuhui Wang, Shuzhe Wu, Xin Jin, Dandan Tu, and Qingming Huang. 2022. Attribute Group Editing for Reliable Few-shot Image Generation. In IEEE Conf. Comput. Vis. Pattern Recog. 11194--11203.
[11]
Qiaole Dong, Chenjie Cao, and Yanwei Fu. 2022. Incremental transformer structure enhanced image inpainting with masking positional encoding. In IEEE Conf. Comput. Vis. Pattern Recog. 11358--11368.
[12]
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Int. Conf. Mach. Learn. 1126--1135.
[13]
Yue Gao, Fangyun Wei, Jianmin Bao, Shuyang Gu, Dong Chen, Fang Wen, and Zhouhui Lian. 2021. High-fidelity and arbitrary face editing. In IEEE Conf. Comput. Vis. Pattern Recog. 16115--16124.
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. Adv. Neural Inform. Process. Syst., Vol. 27 (2014).
[15]
Zheng Gu, Wenbin Li, Jing Huo, Lei Wang, and Yang Gao. 2021. Lofgan: Fusing local representations for few-shot image generation. In ICCV. 8463--8471.
[16]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Adv. Neural Inform. Process. Syst.
[17]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2020a. Matchinggan: Matching-Based Few-Shot Image Generation. In Int. Conf. Multimedia and Expo.
[18]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022a. DeltaGAN: Towards diverse few-shot image generation with sample-specific delta. Eur. Conf. Comput. Vis. (2022).
[19]
Yan Hong, Li Niu, Jianfu Zhang, and Liqing Zhang. 2022b. Few-shot Image Generation Using Discrete Content Representation. In ACM Int. Conf. Multimedia. 2796--2804.
[20]
Yan Hong, Li Niu, Jianfu Zhang, Weijie Zhao, Chen Fu, and Liqing Zhang. 2020b. F2GAN: Fusing-and-Filling GAN for Few-shot Image Generation. In ACM Int. Conf. Multimedia.
[21]
Xun Huang and Serge Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In Int. Conf. Comput. Vis. 1501--1510.
[22]
Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021a. Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited Data. Adv. Neural Inform. Process. Syst., Vol. 34 (2021).
[23]
Liming Jiang, Bo Dai, Wayne Wu, and Chen Change Loy. 2021b. Focal frequency loss for image reconstruction and synthesis. In Int. Conf. Comput. Vis. 13919--13929.
[24]
Tero Karras, Miika Aittala, Janne Hellsten, Samuli Laine, Jaakko Lehtinen, and Timo Aila. 2020a. Training Generative Adversarial Networks with Limited Data. In Adv. Neural Inform. Process. Syst. 12104--12114.
[25]
Tero Karras, Miika Aittala, Samuli Laine, Erik Härkönen, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2021. Alias-free generative adversarial networks. Adv. Neural Inform. Process. Syst. (2021), 852--863.
[26]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architecture for generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog. 4401--4410.
[27]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020b. Analyzing and improving the image quality of stylegan. In IEEE Conf. Comput. Vis. Pattern Recog. 8110--8119.
[28]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[29]
Ziqiang Li, Muhammad Usman, Rentuo Tao, Pengfei Xia, Chaoyue Wang, Huanhuan Chen, and Bin Li. 2023. A systematic survey of regularization and normalization in GANs. ACM Comput. Surv., Vol. 55, 11 (2023), 1--37.
[30]
Ziqiang Li, Chaoyue Wang, Heliang Zheng, Jing Zhang, and Bin Li. 2022a. FakeCLR: Exploring contrastive learning for solving latent discontinuity in data-efficient GANs. In Eur. Conf. Comput. Vis. 598--615.
[31]
Ziqiang Li, Beihao Xia, Jing Zhang, Chaoyue Wang, and Bin Li. 2022b. A comprehensive survey on data-efficient GANs in image generation. arXiv preprint arXiv:2204.08329 (2022).
[32]
Weixin Liang, Zixuan Liu, and Can Liu. 2020. DAWSON: A domain adaptive few shot generation framework. arXiv preprint arXiv:2001.00576 (2020).
[33]
Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed Elgammal. 2021. Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis. In ICLR.
[34]
Ming-Yu Liu, Xun Huang, Arun Mallya, Tero Karras, Timo Aila, Jaakko Lehtinen, and Jan Kautz. 2019. Few-shot unsupervised image-to-image translation. In Int. Conf. Comput. Vis. 10551--10560.
[35]
Wuyang Luo, Su Yang, Hong Wang, Bo Long, and Weishan Zhang. 2022. Context-Consistent Semantic Image Editing with Style-Preserved Modulation. In Eur. Conf. Comput. Vis. 561--578.
[36]
Maria-Elena Nilsback and Andrew Zisserman. 2008. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing. IEEE, 722--729.
[37]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In IEEE Conf. Comput. Vis. Pattern Recog. 2337--2346.
[38]
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. 2018. Film: Visual reasoning with a general conditioning layer. In Assoc. Adv. Artif. Intell., Vol. 32.
[39]
Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, and Daniel Cohen-Or. 2021. Encoding in style: a stylegan encoder for image-to-image translation. In IEEE Conf. Comput. Vis. Pattern Recog. 2287--2296.
[40]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In IEEE Conf. Comput. Vis. Pattern Recog. 10684--10695.
[41]
Kuniaki Saito, Kate Saenko, and Ming-Yu Liu. 2020. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Eur. Conf. Comput. Vis. 382--398.
[42]
Axel Sauer, Katja Schwarz, and Andreas Geiger. 2022. Stylegan-xl: Scaling stylegan to large diverse datasets. In ACM SIGGRAPH. 1--10.
[43]
Divya Saxena, Jiannong Cao, Jiahao Xu, and Tarun Kulshrestha. 2023. Re-GAN: Data-Efficient GANs Training via Architectural Reconfiguration. In IEEE Conf. Comput. Vis. Pattern Recog. 16230--16240.
[44]
Katja Schwarz, Yiyi Liao, and Andreas Geiger. 2021. On the frequency bias of generative models. Adv. Neural Inform. Process. Syst., Vol. 34 (2021), 18126--18136.
[45]
Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. 2020. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 44, 4 (2020), 2004--2018.
[46]
Yujun Shen and Bolei Zhou. 2021. Closed-Form Factorization of Latent Semantics in GANs. In IEEE Conf. Comput. Vis. Pattern Recog. 1532--1540.
[47]
Ivan Skorokhodov, Sergey Tulyakov, and Mohamed Elhoseiny. 2022. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In IEEE Conf. Comput. Vis. Pattern Recog. 3626--3636.
[48]
Lucas J Van Vliet, Ian T Young, and Guus L Beckers. 1989. A nonlinear Laplace operator as edge detector in noisy images. Computer vision, graphics, and image processing, Vol. 45, 2 (1989), 167--195.
[49]
Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning. In Adv. Neural Inform. Process. Syst. 3637--3645.
[50]
Cairong Wang, Yiming Zhu, and Chun Yuan. 2022. Diverse Image Inpainting with Normalizing Flow. In Eur. Conf. Comput. Vis. Springer, 53--69.
[51]
Lin Wang, Yo-Sung Ho, Kuk-Jin Yoon, et al. 2019. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In IEEE Conf. Comput. Vis. Pattern Recog. 10081--10090.
[52]
Yaxing Wang, Abel Gonzalez-Garcia, David Berga, Luis Herranz, Fahad Shahbaz Khan, and Joost van de Weijer. 2020. Minegan: effective knowledge transfer from gans to target domains with few images. In CVPR. 9332--9341.
[53]
Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, and Zheng Ma. 2019. Frequency principle: Fourier analysis sheds light on deep neural networks. arXiv preprint arXiv:1901.06523 (2019).
[54]
Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, and Bolei Zhou. 2022b. Improving GANs with A Dynamic Discriminator. In Adv. Neural Inform. Process. Syst.
[55]
Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou. 2021. Data-efficient instance generation from instance discrimination. Adv. Neural Inform. Process. Syst. (2021), 9378--9390.
[56]
Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenli Du. 2023. ProtoGAN: Towards high diversity and fidelity image synthesis under limited data. Information Sciences, Vol. 632 (2023), 698--714.
[57]
Mengping Yang, Zhe Wang, Ziqiu Chi, and Wenyi Feng. 2022c. WaveGAN: Frequency-Aware GAN for High-Fidelity Few-Shot Image Generation. In ECCV. 1--17.
[58]
Mengping Yang, Zhe Wang, Ziqiu Chi, Yanbing Zhang, et al. 2022d. FreGAN: Exploiting Frequency Components for Training GANs under Limited Data. Adv. Neural Inform. Process. Syst., Vol. 35 (2022).
[59]
Shuai Yang, Liming Jiang, Ziwei Liu, and Chen Change Loy. 2022a. Unsupervised image-to-image translation with generative prior. In IEEE Conf. Comput. Vis. Pattern Recog. 18332--18341.
[60]
Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, and Bolei Zhou. 2023. Towards Smooth Video Composition. Int. Conf. Learn. Represent. (2023).
[61]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In IEEE Conf. Comput. Vis. Pattern Recog. 586--595.
[62]
Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, and Song Han. 2020. Differentiable augmentation for data-efficient gan training. Adv. Neural Inform. Process. Syst., Vol. 33 (2020).

Cited By

View all
  • (2025)FNContra: Frequency-domain Negative Sample Mining in Contrastive Learning for limited-data image generationExpert Systems with Applications10.1016/j.eswa.2024.125676263(125676)Online publication date: Mar-2025

Index Terms

  1. Improving Few-shot Image Generation by Structural Discrimination and Textural Modulation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. few-shot learning
    2. image generation
    3. structural discrimination
    4. textural modulation

    Qualifiers

    • Research-article

    Funding Sources

    • China Aerospace Science and Technology Corporation Industry-University-Research Cooperation Foundation of the Eighth Research Institute
    • Shanghai Science and Technology Program
    • Natural Science Foundation of China
    • Chinese Defense Program of Science and Technology

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)68
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 10 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)FNContra: Frequency-domain Negative Sample Mining in Contrastive Learning for limited-data image generationExpert Systems with Applications10.1016/j.eswa.2024.125676263(125676)Online publication date: Mar-2025

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media