Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Double Attention Based on Graph Attention Network for Image Multi-Label Classification

Published: 05 January 2023 Publication History

Abstract

The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are inflexible and may degrade model generalizability. In addition, these methods fail to capture the semantic correlation between the channel feature maps to further improve model performance. To address these issues, we propose DA-GAT (a Double Attention framework based on the Graph Attention neTwork) to effectively learn the correlation between labels from training data. First, we devise a new channel attention mechanism to enhance the semantic correlation between channel feature maps, so as to implicitly capture the correlation between labels. Second, we propose a new label attention mechanism to avoid the adverse impact of a manually constructed label co-occurrence matrix. It only needs to leverage the label embedding as the input of network, then automatically constructs the label relation matrix to explicitly establish the correlation between labels. Finally, we effectively fuse the output of these two attention mechanisms to further improve model performance. Extensive experiments are conducted on three public multi-label classification benchmarks. Our DA-GAT model achieves mean average precision of 87.1%, 96.6%, and 64.3% on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE, respectively, and obviously outperforms other existing state-of-the-art methods. In addition, visual analysis experiments demonstrate that each attention mechanism can capture the correlation between labels well and significantly promote the model performance.

References

[1]
Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, and Lihi Zelnik-Manor. 2020. Asymmetric loss for multi-label classification. arXiv preprint arXiv:2009.14119 (2020).
[2]
Hakan Cevikalp, Burak Benligiray, Ömer Nezih Gerek, and Hasan Saribas. 2020. Semi-supervised robust deep neural networks for multi-label classification. Pattern Recognition 100 (2020), 107164.
[3]
Tianshui Chen, Liang Lin, Xiaolu Hui, Riquan Chen, and Hefeng Wu. 2022. Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (2022), 1371–1384.
[4]
Tianshui Chen, Muxin Xu, Xiaolu Hui, Hefeng Wu, and Liang Lin. 2019. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE International Conference on Computer Vision. 522–531.
[5]
Zhao-Min Chen, Xiu-Shen Wei, Xin Jin, and Yanwen Guo. 2019. Multi-label image recognition with joint class-aware map disentangling and label correlation embedding. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME’19). IEEE, Los Alamitos, CA, 622–627.
[6]
Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177–5186.
[7]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval. 1–9.
[8]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 248–255.
[9]
Yao Ding, Yanzhao Zhou, Yi Zhu, Qixiang Ye, and Jianbin Jiao. 2019. Selective sparse sampling for fine-grained image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6599–6608.
[10]
Thibaut Durand, Taylor Mordan, Nicolas Thome, and Matthieu Cord. 2017. WILDCAT: Weakly supervised learning of deep ConvNets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 642–651.
[11]
Thibaut Durand, Nicolas Thome, and Matthieu Cord. 2018. Exploiting negative evidence for deep latent structured models. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 2 (2018), 337–351.
[12]
Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The Pascal Visual Object Classes (VOC) challenge. International Journal of Computer Vision 88, 2 (2010), 303–338.
[13]
Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, and Hanqing Lu. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3146–3154.
[14]
Bin-Bin Gao and Hong-Yu Zhou. 2020. Multi-label image recognition with multi-class attentional regions. arXiv preprint arXiv:2007.01755 (2020).
[15]
Hao Guo, Kang Zheng, Xiaochuan Fan, Hongkai Yu, and Song Wang. 2019. Visual attention consistency under image transforms for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 729–739.
[16]
Shikha Gupta, Krishan Sharma, Dileep Aroor Dinesh, and Veena Thenkanidiyoor. 2021. Visual semantic-based representation learning using deep CNNs for scene recognition. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 2 (2021), 1–24.
[17]
Mohammed Hassanin, Ibrahim Radwan, Salman Khan, and Murat Tahtali. 2021. Learning discriminative representations for multi-label image recognition. arXiv preprint arXiv:2107.11159 (2021).
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[19]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141.
[20]
Jiren Jin and Hideki Nakayama. 2016. Annotation order matters: Recurrent image annotator for arbitrary length image tagging. In Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR’16). IEEE, Los Alamitos, CA, 2452–2457.
[21]
Thomas N. Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[22]
Junbing Li, Changqing Zhang, Xueman Wang, and Ling Du. 2020. Multi-scale cross-modal spatial attention fusion for multi-label image recognition. In Proceedings of the International Conference on Artificial Neural Networks. 736–747.
[23]
Liang Li, Xinge Zhu, Yiming Hao, Shuhui Wang, Xingyu Gao, and Qingming Huang. 2019. A hierarchical CNN-RNN approach for visual emotion classification. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 3s (2019), 1–17.
[24]
Qing Li, Xiaojiang Peng, Yu Qiao, and Qiang Peng. 2019. Learning category correlations for multi-label image recognition with graph networks. arXiv preprint arXiv:1909.13005 (2019).
[25]
Xiangtai Li, Xia Li, Ansheng You, Li Zhang, Guangliang Cheng, Kuiyuan Yang, Yunhai Tong, and Zhouchen Lin. 2021. Towards efficient scene understanding via squeeze reasoning. IEEE Transactions on Image Processing 30 (2021), 7050–7063.
[26]
Zhixin Li, Lan Lin, Canlong Zhang, Huifang Ma, Weizhong Zhao, and Zhiping Shi. 2021. A semi-supervised learning approach based on adaptive weighted fusion for automatic image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 1 (2021), 1–23.
[27]
Zhaoju Li, Zongwei Zhou, Nan Jiang, Zhenjun Han, Junliang Xing, and Jianbin Jiao. 2020. Spatial preserved graph convolution networks for person re-identification. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 1s (2020), 1–14.
[28]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. 740–755.
[29]
Lingqiao Liu, Peng Wang, Chunhua Shen, Lei Wang, Anton Van Den Hengel, Chao Wang, and Heng Tao Shen. 2017. Compositional model based Fisher vector coding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2335–2348.
[30]
Shilong Liu, Lei Zhang, Xiao Yang, Hang Su, and Jun Zhu. 2021. Query2Label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 (2021).
[31]
Fan Lyu, Qi Wu, Fuyuan Hu, Qingyao Wu, and Mingkui Tan. 2019. Attend and imagine: Multi-label image classification with visual attention and recurrent neural networks. IEEE Transactions on Multimedia 21, 8 (2019), 1971–1981.
[32]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9 (Nov. 2008), 2579–2605.
[33]
Quanling Meng and Weigang Zhang. 2019. Multi-label image classification with attention mechanism and graph convolutional networks. In Proceedings of the ACM Multimedia Asia Conference (MMAsia’19). Article 41, 6 pages.
[34]
Hoang D. Nguyen, Xuan-Son Vu, and Duc-Trong Le. 2021. Modular graph transformer networks for multi-label image classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 9092–9100.
[35]
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in PyTorch. In Proceedings of the31st Conference on Neural Information Processing Systems (NIPS’17). 1–4.
[36]
Siyuan Qi, Wenguan Wang, Baoxiong Jia, Jianbing Shen, and Song-Chun Zhu. 2018. Learning human-object interactions by graph parsing neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 401–417.
[37]
Shengsheng Qian, Jun Hu, Quan Fang, and Changsheng Xu. 2021. Knowledge-aware multi-modal adaptive graph convolutional networks for fake news detection. ACM Transactions on Multimedia Computing, Communications, and Applications 17, 3 (2021), 1–23.
[38]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision. 618–626.
[39]
Gargi Srivastava and Rajeev Srivastava. 2020. Design, analysis, and implementation of efficient framework for image annotation. ACM Transactions on Multimedia Computing, Communications, and Applications 16, 3 (2020), 1–24.
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems. 5998–6008.
[41]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[42]
Xuan-Son Vu, Duc-Trong Le, Christoffer Edlund, Lili Jiang, and Hoang D. Nguyen. 2020. Privacy-preserving visual content tagging using graph transformer networks. In Proceedings of the 28th ACM International Conference on Multimedia. 2299–2307.
[43]
Jiang Wang, Yi Yang, Junhua Mao, Zhiheng Huang, Chang Huang, and Wei Xu. 2016. CNN-RNN: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2285–2294.
[44]
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. 2020. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11534–11542.
[45]
Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7794–7803.
[46]
Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, and Junzhou Huang. 2020. Deep multimodal fusion by channel exchanging. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS’20). 1–11.
[47]
Yuxin Wang, Hongtao Xie, Zhengjun Zha, Youliang Tian, Zilong Fu, and Yongdong Zhang. 2020. R-Net: A relationship network for efficient and accurate scene text detection. IEEE Transactions on Multimedia 23 (2020), 1316–1329.
[48]
Yangtao Wang, Yanzhao Xie, Yu Liu, Ke Zhou, and Xiaocui Li. 2020. Fast graph convolution network based multi-label image recognition via cross-modal fusion. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management. 1575–1584.
[49]
Zhouxia Wang, Tianshui Chen, Guanbin Li, Ruijia Xu, and Liang Lin. 2017. Multi-label image recognition by recurrently discovering attentional regions. In Proceedings of the IEEE International Conference on Computer Vision. 464–472.
[50]
Keyu Wen, Xiaodong Gu, and Qingrong Cheng. 2020. Learning dual semantic relations with graph attention for image-text matching. IEEE Transactions on Circuits and Systems for Video Technology PP, 99 (2020), 1.
[51]
Shiping Wen, Weiwei Liu, Yin Yang, Pan Zhou, Zhenyuan Guo, Zheng Yan, Yiran Chen, and Tingwen Huang. 2021. Multilabel image classification via feature/label co-projection. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51, 11 (2021), 7250–7259.
[52]
Xiangping Wu, Qingcai Chen, Wei Li, Yulun Xiao, and Baotian Hu. 2020. AdaHGNN: Adaptive hypergraph neural networks for multi-label image classification. In Proceedings of the 28th ACM International Conference on Multimedia. 284–293.
[53]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.
[54]
Zheng Yan, Weiwei Liu, Shiping Wen, and Yin Yang. 2019. Multi-label image classification by feature attention network. IEEE Access 7 (2019), 98005–98013.
[55]
Jin Ye, Junjun He, Xiaojiang Peng, Wenhao Wu, and Yu Qiao. 2020. Attention-driven dynamic graph convolutional network for multi-label image recognition. In Proceedings of the European Conference on Computer Vision. 649–665.
[56]
Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, and Shilei Wen. 2020. Cross-modality attention with semantic graph embedding for multi-label classification. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20). 12709–12716.
[57]
Wan-Jin Yu, Zhen-Duo Chen, Xin Luo, Wu Liu, and Xin-Shun Xu. 2019. DELTA: A deep dual-stream network for multi-label image classification. Pattern Recognition 91 (2019), 322–331.
[58]
Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, and Jianfeng Lu. 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 2801–2813.
[59]
Kaihua Zhang, Tengpeng Li, Shiwen Shen, Bo Liu, Jin Chen, and Qingshan Liu. 2020. Adaptive graph convolutional network with attention graph clustering for co-saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9050–9059.
[60]
Haiying Zhao, Wei Zhou, Xiaogang Hou, and Hui Zhu. 2020. Double attention for multi-label image classification. IEEE Access 8 (2020), 225539–225550.
[61]
Fengtao Zhou, Sheng Huang, and Yun Xing. 2020. Deep semantic dictionary learning for multi-label image classification. arXiv preprint arXiv:2012.12509 (2020).
[62]
Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, and Xiaogang Wang. 2017. Learning spatial regularization with image-level supervisions for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5513–5522.

Cited By

View all
  • (2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
  • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
  • (2024)Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image ClassificationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658005(740-748)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. Double Attention Based on Graph Attention Network for Image Multi-Label Classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
      January 2023
      505 pages
      ISSN:1551-6857
      EISSN:1551-6865
      DOI:10.1145/3572858
      • Editor:
      • Abdulmotaleb El Saddik
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 January 2023
      Online AM: 12 March 2022
      Accepted: 14 February 2022
      Revised: 29 January 2022
      Received: 04 September 2021
      Published in TOMM Volume 19, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Multi-label classification
      2. label correlation
      3. channel attention mechanism
      4. graph attention network
      5. visual analysis

      Qualifiers

      • Research-article
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)606
      • Downloads (Last 6 weeks)42
      Reflects downloads up to 15 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Cross-modality semantic guidance for multi-label image classificationIntelligent Data Analysis10.3233/IDA-23023928:3(633-646)Online publication date: 28-May-2024
      • (2024)Illumination-Aware Low-Light Image Enhancement with Transformer and Auto-Knee CurveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366465320:8(1-23)Online publication date: 29-Jun-2024
      • (2024)Pyramidal Cross-Modal Transformer with Sustained Visual Guidance for Multi-Label Image ClassificationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658005(740-748)Online publication date: 30-May-2024
      • (2024)Suitable and Style-Consistent Multi-Texture Recommendation for Cartoon IllustrationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365251820:7(1-26)Online publication date: 16-May-2024
      • (2024)NSDIE: Noise Suppressing Dark Image Enhancement Using Multiscale Retinex and Low-Rank MinimizationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363877220:6(1-22)Online publication date: 8-Mar-2024
      • (2024)A Multitask Dynamic Graph Attention Autoencoder for Imbalanced Multilabel Time Series ClassificationIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2024.336906435:9(11829-11842)Online publication date: Sep-2024
      • (2024)Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image ClassificationIEEE Transactions on Multimedia10.1109/TMM.2023.327727926(1143-1157)Online publication date: 1-Jan-2024
      • (2024)A novel deep learning graph attention network for Alzheimer’s disease image segmentationHealthcare Analytics10.1016/j.health.2024.1003105(100310)Online publication date: Jun-2024
      • (2024)Detection of small foreign objects in Pu-erh sun-dried green tea: An enhanced YOLOv8 neural network model based on deep LearningFood Control10.1016/j.foodcont.2024.110890(110890)Online publication date: Sep-2024
      • (2024)AIoT-Enabled Precision Agriculture for Sustainable Crop Disease Management: Advancing SDGs Through Graph Attention Neural NetworksArtificial Intelligence of Things for Achieving Sustainable Development Goals10.1007/978-3-031-53433-1_8(143-160)Online publication date: 9-Mar-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media