research-article

VSGAN: Visual Saliency guided Generative Adversarial Network for data augmentation

Authors:

Alexandre Bruckert,

Patrick Le Callet,

Hongmin CaiAuthors Info & Claims

IMXw '23: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences Workshops

Pages 69 - 75

https://doi.org/10.1145/3604321.3604382

Published: 27 October 2023 Publication History

Abstract

Deep learning approaches have allowed for a great leap in the performances of visual saliency models. However, the lack of annotated data remains the main challenge for visual saliency prediction. In this paper, we leverage image inpainting methods to synthesize augmented images, which is done by completing the weakly-salient areas, and propose a Visual Saliency guided Generative Adversarial Network (VSGAN) that contains a dual encoder to extract multi-scale features and a generator equipped with visual saliency guided modulation to synthesize high fidelity and diversity results. Extensive experimental results show that our method outperforms state-of-the-art methods for image inpainting on visual saliency datasets, and demonstrate the effectiveness of VSGAN for visual saliency data augmentation both quantitatively and qualitatively.

References

[1]

Ali Borji. 2019. Saliency prediction in the deep learning era: Successes and limitations. IEEE transactions on pattern analysis and machine intelligence 43, 2 (2019), 679–700.

[2]

Ali Borji and Laurent Itti. 2015. Cat2000: A large scale fixation dataset for boosting saliency research. arXiv preprint arXiv:1505.03581 (2015).

[3]

Zhaohui Che, Ali Borji, Guangtao Zhai, Xiongkuo Min, Guodong Guo, and Patrick Le Callet. 2019. How is gaze influenced by image transformations? dataset and model. IEEE Transactions on Image Processing 29 (2019), 2287–2300.

Digital Library

[4]

Marcella Cornia, Lorenzo Baraldi, Giuseppe Serra, and Rita Cucchiara. 2018. Predicting human eye fixations via an lstm-based saliency attentive model. IEEE Transactions on Image Processing 27, 10 (2018), 5142–5154.

[5]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[6]

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017).

[7]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125–1134.

[8]

Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1072–1080.

[9]

Tilke Judd, Fredo Durand, and Antonio Torralba. 2011. Fixations on low-resolution images. Journal of Vision 11, 4 (2011), 14–14.

[10]

Tilke Judd, Krista Ehinger, Frédo Durand, and Antonio Torralba. 2009. Learning to predict where humans look. In 2009 IEEE 12th international conference on computer vision. IEEE, 2106–2113.

[11]

Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. 2020. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.

[12]

Chelhwon Kim and Peyman Milanfar. 2013. Visual saliency in noisy images. Journal of vision 13, 4 (2013), 5–5.

[13]

A KingaD. 2015. A methodforstochasticoptimization. Anon. InternationalConferenceon Learning Representations. SanDego: ICLR (2015).

[14]

Alexander Kroner, Mario Senden, Kurt Driessens, and Rainer Goebel. 2020. Contextual encoder–decoder network for visual saliency prediction. Neural Networks 129 (2020), 261–270.

[15]

Matthias Kummerer, Thomas SA Wallis, and Matthias Bethge. 2018. Saliency benchmarking made easy: Separating models, maps and metrics. In Proceedings of the European Conference on Computer Vision (ECCV). 770–787.

Digital Library

[16]

Guilin Liu, Fitsum A Reda, Kevin J Shih, Ting-Chun Wang, Andrew Tao, and Bryan Catanzaro. 2018. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV). 85–100.

Digital Library

[17]

Hongyu Liu, Ziyu Wan, Wei Huang, Yibing Song, Xintong Han, and Jing Liao. 2021. Pd-gan: Probabilistic diverse gan for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9371–9381.

[18]

Jianxun Lou, Hanhe Lin, David Marshall, Dietmar Saupe, and Hantao Liu. 2021. TranSalNet: Visual saliency prediction using transformers. CoRR abs/2110.03593 (2021). arXiv:2110.03593https://arxiv.org/abs/2110.03593

[19]

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. 2019. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2337–2346.

[20]

Xu Wang, Lin Ma, Sam Kwong, and Yu Zhou. 2018. Quaternion representation based visual saliency for stereoscopic image quality assessment. Signal Processing 145 (2018), 202–213.

[21]

Juan Xu, Ming Jiang, Shuo Wang, Mohan S Kankanhalli, and Qi Zhao. 2014. Predicting human gaze beyond pixels. Journal of vision 14, 1 (2014), 28–28.

[22]

Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586–595.

[23]

Shengyu Zhao, Jonathan Cui, Yilun Sheng, Yue Dong, Xiao Liang, Eric I Chang, and Yan Xu. 2021. Large scale image completion via co-modulated generative adversarial networks. arXiv preprint arXiv:2103.10428 (2021).

[24]

Xu Zheng, Tejo Chalasani, Koustav Ghosal, Sebastian Lutz, and Aljosa Smolic. 2019. Stada: Style transfer as data augmentation. arXiv preprint arXiv:1909.01056 (2019).

[25]

Xinyue Zhu, Yifan Liu, Jiahong Li, Tao Wan, and Zengchang Qin. 2018. Emotion classification with data augmentation using generative adversarial networks. In Pacific-Asia conference on knowledge discovery and data mining. Springer, 349–360.

Digital Library

Index Terms

VSGAN: Visual Saliency guided Generative Adversarial Network for data augmentation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Interest point and salient region detections

Recommendations

Linking visual saliency deviation to image quality degradation: A saliency deviation-based image quality index
Abstract
Advances in image quality research have shown the benefits of modeling functional components of the human visual system in image quality metrics. Recently, visual saliency, an important aspect of the human visual system, is ...
Highlights
- Visual quality degradation is linked and modelled with saliency deviation for the first time.
Visual saliency guided video compression algorithm

Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is ...
Visual saliency detection based on region descriptors and prior knowledge

Visual saliency detection not only plays a significant role, but it is also a challenging task in computer vision. In this paper we propose a new method for saliency detection. It incorporates visual features and spatial information with a guidance of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IMXw '23: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences Workshops

June 2023

143 pages

ISBN:9798400708459

DOI:10.1145/3604321

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IMXw '23

IMXw '23: ACM International Conference on Interactive Media Experiences Workshops

June 12 - 15, 2023

Nantes, France

Acceptance Rates

Overall Acceptance Rate 69 of 245 submissions, 28%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
16
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 11 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents