Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3604321.3604382acmotherconferencesArticle/Chapter ViewAbstractPublication PagesimxConference Proceedingsconference-collections
research-article

VSGAN: Visual Saliency guided Generative Adversarial Network for data augmentation

Published: 27 October 2023 Publication History

Abstract

Deep learning approaches have allowed for a great leap in the performances of visual saliency models. However, the lack of annotated data remains the main challenge for visual saliency prediction. In this paper, we leverage image inpainting methods to synthesize augmented images, which is done by completing the weakly-salient areas, and propose a Visual Saliency guided Generative Adversarial Network (VSGAN) that contains a dual encoder to extract multi-scale features and a generator equipped with visual saliency guided modulation to synthesize high fidelity and diversity results. Extensive experimental results show that our method outperforms state-of-the-art methods for image inpainting on visual saliency datasets, and demonstrate the effectiveness of VSGAN for visual saliency data augmentation both quantitatively and qualitatively.

References

[1]
Ali Borji. 2019. Saliency prediction in the deep learning era: Successes and limitations. IEEE transactions on pattern analysis and machine intelligence 43, 2 (2019), 679–700.
[2]
Ali Borji and Laurent Itti. 2015. Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015).
[3]
Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, and Patrick Le Callet. 2019. How is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29 (2019), 2287–2300.
[4]
Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Transactions on Image Processing 27, 10 (2018), 5142–5154.
[5]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[6]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).
[7]
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134.
[8]
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072–1080.
[9]
Tilke Judd, Fredo Durand, and Antonio Torralba. 2011. Fixations on low-resolution images. Journal of Vision 11, 4 (2011), 14–14.
[10]
Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In 2009 IEEE 12th international conference on computer vision. IEEE, 2106–2113.
[11]
Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
[12]
Chelhwon Kim and Peyman Milanfar. 2013. Visual saliency in noisy images. Journal of vision 13, 4 (2013), 5–5.
[13]
A KingaD. 2015. A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR (2015).
[14]
Alexander Kroner, Mario Senden, Kurt Driessens, and Rainer Goebel. 2020. Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129 (2020), 261–270.
[15]
Matthias Kummerer, Thomas SA Wallis, and Matthias Bethge. 2018. Saliency benchmarking made easy: Separating models, maps and metrics. In Proceedings of the European Conference on Computer Vision (ECCV). 770–787.
[16]
Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85–100.
[17]
Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, and Jing Liao. 2021. Pd-gan: Probabilistic diverse gan for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9371–9381.
[18]
Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe, and Hantao Liu. 2021. TranSalNet: Visual saliency prediction using transformers. CoRR abs/2110.03593 (2021). arXiv:2110.03593https://arxiv.org/abs/2110.03593
[19]
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2337–2346.
[20]
Xu Wang, Lin Ma, Sam Kwong, and Yu Zhou. 2018. Quaternion representation based visual saliency for stereoscopic image quality assessment. Signal Processing 145 (2018), 202–213.
[21]
Juan Xu, Ming Jiang, Shuo Wang, Mohan S Kankanhalli, and Qi Zhao. 2014. Predicting human gaze beyond pixels. Journal of vision 14, 1 (2014), 28–28.
[22]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.
[23]
Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. 2021. Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021).
[24]
Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa Smolic. 2019. Stada: Style transfer as data augmentation. arXiv preprint arXiv:1909.01056 (2019).
[25]
Xinyue Zhu, Yifan Liu, Jiahong Li, Tao Wan, and Zengchang Qin. 2018. Emotion classification with data augmentation using generative adversarial networks. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 349–360.

Index Terms

  1. VSGAN: Visual Saliency guided Generative Adversarial Network for data augmentation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    IMXw '23: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences Workshops
    June 2023
    143 pages
    ISBN:9798400708459
    DOI:10.1145/3604321
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. GAN
    2. Visual saliency
    3. data augmentation
    4. image inpainting

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    IMXw '23

    Acceptance Rates

    Overall Acceptance Rate 69 of 245 submissions, 28%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 16
      Total Downloads
    • Downloads (Last 12 months)16
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 11 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media