Article

CBAM: Convolutional Block Attention Module

Authors:

Joon-Young Lee,

In So KweonAuthors Info & Claims

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII

Pages 3 - 19

https://doi.org/10.1007/978-3-030-01234-2_1

Published: 08 September 2018 Publication History

Abstract

We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

References

[1]

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2009)

[2]

Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images

[3]

Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755

[4]

LeCun Y, Bottou L, Bengio Y, and Haffner P Gradient-based learning applied to document recognition Proc. IEEE 1998 86 11 2278-2324

[5]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)

[6]

Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)

[7]

Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)

[8]

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI) (2017)

[9]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

[10]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2015)

[11]

Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv preprint arXiv:1610.02357 (2016)

[12]

Mnih, V., Heess, N., Graves, A., et al.: Recurrent models of visual attention. In: Proceedings of the Neural Information Processing Systems (NIPS). Advances in Neural Information Processing Systems (2014)

[13]

Ba, J., Mnih, V., Kavukcuoglu, K.: Multiple object recognition with visual attention (2014)

[14]

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014)

[15]

Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention (2015)

[16]

Gregor, K., Danihelka, I., Graves, A., Rezende, D.J., Wierstra, D.: DRAW: a recurrent neural network for image generation (2015)

[17]

Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2015)

[18]

Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 618–626 (2017)

[19]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2012)

[20]

Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2017)

[21]

Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)

[22]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)

[23]

Itti L., Koch C., and Niebur E. A model of saliency-based visual attention for rapid scene analysis IEEE Transactions on Pattern Analysis and Machine Intelligence 1998 20 11 1254-1259

[24]

Rensink RA The dynamic representation of scenes Vis. Cogn. 2000 7 1–3 17-42

[25]

Corbetta M and Shulman GL Control of goal-directed and stimulus-driven attention in the brain Nat. Rev. Neurosci. 2002 3 3 201-215

[26]

Larochelle, H., Hinton, G.E.: Learning to combine foveal glimpses with a third-order Boltzmann machine. In: Proceedings of the Neural Information Processing Systems (NIPS) (2010)

[27]

Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. arXiv preprint arXiv:1704.06904 (2017)

[28]

Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507 (2017)

[29]

Chen, L., Zhang, H., Xiao, J., Nie, L., Shao, J., Chua, T.S.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2017)

[30]

Sanghyun, W., Soonmin, H., So, K.I.: StairNet: top-down semantic aggregation for accurate one shot detection. In: Proceedings of the Winter Conference on Applications of Computer Vision (WACV) (2018)

[31]

Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. In: Proceedings of the British Machine Vision Conference (BMVC) (2018)

[32]

Zeiler MD and Fergus R Fleet D, Pajdla T, Schiele B, and Tuytelaars T Visualizing and understanding convolutional networks Computer Vision – ECCV 2014 2014 Cham Springer 818-833

[33]

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929. IEEE (2016)

[34]

Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)

[35]

Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

[36]

: PyTorch. http://pytorch.org/. Accessed 08 Nov 2017

[37]

He K, Zhang X, Ren S, and Sun J Leibe B, Matas J, Sebe N, and Welling M Identity mappings in deep residual networks Computer Vision – ECCV 2016 2016 Cham Springer 630-645

[38]

Huang G, Sun Y, Liu Z, Sedra D, and Weinberger KQ Leibe B, Matas J, Sebe N, and Welling M Deep networks with stochastic depth Computer Vision – ECCV 2016 2016 Cham Springer 646-661

[39]

Bell, S., Lawrence Zitnick, C., Bala, K., Girshick, R.: Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2016)

[40]

Liu W et al. Leibe B, Matas J, Sebe N, Welling M, et al. SSD: single shot multibox detector Computer Vision – ECCV 2016 2016 Cham Springer 21-37

[41]

Chen, X., Gupta, A.: An implementation of faster RCNN with study for region sampling. arXiv preprint arXiv:1702.02138 (2017)

[42]

Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Neural Information Processing Systems (NIPS) (2015)

Cited By

Ip AYung KZhu DHuang Z(2024)Drug Recognition Detection Based on Deep Learning and Improved YOLOv8Journal of Organizational and End User Computing10.4018/JOEUC.35977036:1(1-21)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.4018/JOEUC.359770
Tao SWang Z(2024)Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian DetectionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.34565120:1(1-18)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.4018/IJSWIS.345651
Hai Yang Kao WLi JLiu CBai JWu CGeng F(2024)ResNet Combined with Attention Mechanism for Genomic Deletion Variant PredictionAutomatic Control and Computer Sciences10.3103/S014641162470014758:3(252-264)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.3103/S0146411624700147
Show More Cited By

Index Terms

CBAM: Convolutional Block Attention Module
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

An Attention Module for Convolutional Neural Networks
Artificial Neural Networks and Machine Learning – ICANN 2021
Abstract
Attention mechanism has been regarded as an advanced technique to capture long-range feature interactions and to boost the representation capability for convolutional neural networks. However, we found two ignored problems in current attentional ...
Weld Defect Detection based on Improved Multi-Scale CNN with CBAM Attention
SPML '23: Proceedings of the 2023 6th International Conference on Signal Processing and Machine Learning

Welding is an important joint technology, but the weld defects have great influence on product quality. Because of the diversity of weld defect characteristics, weld defect detection is a complicated work in industry. However, it is a challenging task ...
Convolutional neural network optimization via channel reassessment attention module
Abstract
The performance of convolutional neural networks (CNNs) can be improved by adjusting the interrelationship between channels with attention mechanism. However, attention mechanism in recent advance ignores the effects of different ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2018: 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII

Sep 2018

849 pages

ISBN:978-3-030-01233-5

DOI:10.1007/978-3-030-01234-2

Editors:
Vittorio Ferrari
Google Research, Zurich, Switzerland
,
Martial Hebert
Carnegie Mellon University, Pittsburgh, PA, USA
,
Cristian Sminchisescu
Google Research, Zurich, Switzerland
,
Yair Weiss
Hebrew University of Jerusalem, Jerusalem, Israel

© Springer Nature Switzerland AG 2018.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 September 2018

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,052
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ip AYung KZhu DHuang Z(2024)Drug Recognition Detection Based on Deep Learning and Improved YOLOv8Journal of Organizational and End User Computing10.4018/JOEUC.35977036:1(1-21)Online publication date: 7-Nov-2024
https://dl.acm.org/doi/10.4018/JOEUC.359770
Tao SWang Z(2024)Differential Feature Fusion, Triplet Global Attention, and Web Semantic for Pedestrian DetectionInternational Journal on Semantic Web & Information Systems10.4018/IJSWIS.34565120:1(1-18)Online publication date: 30-Jul-2024
https://dl.acm.org/doi/10.4018/IJSWIS.345651
Hai Yang Kao WLi JLiu CBai JWu CGeng F(2024)ResNet Combined with Attention Mechanism for Genomic Deletion Variant PredictionAutomatic Control and Computer Sciences10.3103/S014641162470014758:3(252-264)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.3103/S0146411624700147
Oza UGohel BKumar POza P(2024)Presegmenter Cascaded Framework for Mammogram Mass SegmentationJournal of Biomedical Imaging10.1155/2024/94220832024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/9422083
Peng YLiu JLong MPeng F(2024)FLDATNInternational Journal of Intelligent Systems10.1155/2024/84362162024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/8436216
He Y(2024)Automatic Detection of Electric Motorcycle Based on Improved YOLOv5s NetworkJournal of Electrical and Computer Engineering10.1155/2024/48897072024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4889707
Kong JZhao ZZhang XChang HLiang H(2024)PPA-ResNetScientific Programming10.1155/2024/42009802024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/4200980
Ma XHu KSun XChen S(2024)Adaptive Attention Module for Image Recognition Systems in Autonomous DrivingInternational Journal of Intelligent Systems10.1155/2024/39342702024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/3934270
Hu JYu CZhu SZhang H(2024)Incorporating Adaptive Sparse Graph Convolutional Neural Networks for Segmentation of Organs at Risk in RadiotherapyInternational Journal of Intelligent Systems10.1155/2024/17288012024Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1155/2024/1728801
Cheng XWang ZJiang YLiu XYu HShi JYu Z(2024)Dual-path Imbalanced Feature Compensation Network for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/3700135Online publication date: 11-Oct-2024
https://doi.org/10.1145/3700135
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents