Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-030-01234-2_1guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CBAM: Convolutional Block Attention Module

Published: 08 September 2018 Publication History

Abstract

We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

References

[1]
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2009)
[2]
Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images
[3]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[4]
LeCun Y, Bottou L, Bengio Y, and Haffner P Gradient-based learning applied to document recognition Proc. IEEE 1998 86 11 2278-2324
[5]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)
[6]
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
[7]
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)
[8]
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) (2017)
[9]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
[10]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2015)
[11]
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016)
[12]
Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the Neural Information Processing Systems (NIPS). Advances in Neural Information Processing Systems (2014)
[13]
Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014)
[14]
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014)
[15]
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention (2015)
[16]
Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: DRAW: a recurrent neural network for image generation (2015)
[17]
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2015)
[18]
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 618–626 (2017)
[19]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2012)
[20]
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2017)
[21]
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
[22]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)
[23]
Itti L., Koch C., and Niebur E. A model of saliency-based visual attention for rapid scene analysis IEEE Transactions on Pattern Analysis and Machine Intelligence 1998 20 11 1254-1259
[24]
Rensink RA The dynamic representation of scenes Vis. Cogn. 2000 7 1–3 17-42
[25]
Corbetta M and Shulman GL Control of goal-directed and stimulus-driven attention in the brain Nat. Rev. Neurosci. 2002 3 3 201-215
[26]
Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Proceedings of the Neural Information Processing Systems (NIPS) (2010)
[27]
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904 (2017)
[28]
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)
[29]
Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2017)
[30]
Sanghyun, W., Soonmin, H., So, K.I.: StairNet: top-down semantic aggregation for accurate one shot detection. In: Proceedings of the Winter Conference on Applications of Computer Vision (WACV) (2018)
[31]
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)
[32]
Zeiler MD and Fergus R Fleet D, Pajdla T, Schiele B, and Tuytelaars T Visualizing and understanding convolutional networks Computer Vision – ECCV 2014 2014 Cham Springer 818-833
[33]
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929. IEEE (2016)
[34]
Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)
[35]
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
[36]
: PyTorch. http://pytorch.org/. Accessed 08 Nov 2017
[37]
He K, Zhang X, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Identity mappings in deep residual networks Computer Vision – ECCV 2016 2016 Cham Springer 630-645
[38]
Huang G, Sun Y, Liu Z, Sedra D, and Weinberger KQ Leibe B, Matas J, Sebe N, and Welling M Deep networks with stochastic depth Computer Vision – ECCV 2016 2016 Cham Springer 646-661
[39]
Bell, S., Lawrence Zitnick, C., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)
[40]
Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot multibox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37
[41]
Chen, X., Gupta, A.: An implementation of faster RCNN with study for region sampling. arXiv preprint arXiv:1702.02138 (2017)
[42]
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2015)

Cited By

View all
  • (2024)Drug Recognition Detection Based on Deep Learning and Improved YOLOv8Journal of Organizational and End User Computing10.4018/JOEUC.35977036:1(1-21)Online publication date: 7-Nov-2024
  • (2024)Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian DetectionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.34565120:1(1-18)Online publication date: 30-Jul-2024
  • (2024)ResNet Combined with Attention Mechanism for Genomic Deletion Variant PredictionAutomatic Control and Computer Sciences10.3103/S014641162470014758:3(252-264)Online publication date: 1-Jun-2024
  • Show More Cited By

Index Terms

  1. CBAM: Convolutional Block Attention Module
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Guide Proceedings
        Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII
        Sep 2018
        849 pages
        ISBN:978-3-030-01233-5
        DOI:10.1007/978-3-030-01234-2

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 08 September 2018

        Author Tags

        1. Object recognition
        2. Attention mechanism
        3. Gated convolution

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 10 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Drug Recognition Detection Based on Deep Learning and Improved YOLOv8Journal of Organizational and End User Computing10.4018/JOEUC.35977036:1(1-21)Online publication date: 7-Nov-2024
        • (2024)Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian DetectionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.34565120:1(1-18)Online publication date: 30-Jul-2024
        • (2024)ResNet Combined with Attention Mechanism for Genomic Deletion Variant PredictionAutomatic Control and Computer Sciences10.3103/S014641162470014758:3(252-264)Online publication date: 1-Jun-2024
        • (2024)Presegmenter Cascaded Framework for Mammogram Mass SegmentationJournal of Biomedical Imaging10.1155/2024/94220832024Online publication date: 1-Jan-2024
        • (2024)FLDATNInternational Journal of Intelligent Systems10.1155/2024/84362162024Online publication date: 1-Jan-2024
        • (2024)Automatic Detection of Electric Motorcycle Based on Improved YOLOv5s NetworkJournal of Electrical and Computer Engineering10.1155/2024/48897072024Online publication date: 1-Jan-2024
        • (2024)PPA-ResNetScientific Programming10.1155/2024/42009802024Online publication date: 1-Jan-2024
        • (2024)Adaptive Attention Module for Image Recognition Systems in Autonomous DrivingInternational Journal of Intelligent Systems10.1155/2024/39342702024Online publication date: 1-Jan-2024
        • (2024)Incorporating Adaptive Sparse Graph Convolutional Neural Networks for Segmentation of Organs at Risk in RadiotherapyInternational Journal of Intelligent Systems10.1155/2024/17288012024Online publication date: 1-Jan-2024
        • (2024)Dual-path Imbalanced Feature Compensation Network for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3700135Online publication date: 11-Oct-2024
        • Show More Cited By

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media