Global contextual attention for pure regression object detection

Fan, Bingbing; Shao, Mingwen; Li, Yunhao; Li, Cunhe

doi:10.1007/s13042-022-01514-w

Global contextual attention for pure regression object detection

Original Article
Published: 01 March 2022

Volume 13, pages 2189–2197, (2022)
Cite this article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Bingbing Fan¹,
Mingwen Shao ORCID: orcid.org/0000-0001-7323-5896¹,
Yunhao Li¹ &
…
Cunhe Li¹

383 Accesses
1 Altmetric
Explore all metrics

Abstract

Most object detection frameworks rely on rectangular bounding boxes and recognizing object instances individually. However, the bounding box provides only a coarse localization of objects and the context information between objects is not fully utilized, which result in a degradation of classification performance. In this paper, combining a lightweight contextual attention module with the representation of pure regression points, we present a novel context-based pure regression object detector. Moreover, a threshold filter mask module is designed to speed up the detector by removing a few insignificant points and keeping meaningful positions. Nonetheless, both of them do not require handcrafted clustering or post-processing steps and are easy to embed in networks. The proposed contextual attention module and threshold filter mask not only improve detection performance, but also promote training speed. We show through experiments that the proposed context-based pure regression detector can improve the representation of the regression points method about 1.5–1.8 AP on the COCO test-dev detection benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-branch Bounding Box Regression for Object Detection

Article 05 January 2022

Global context aware RCNN for object detection

Article 10 March 2021

Multi-scale global context feature pyramid network for object detector

Article 03 September 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Cai ZW, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Cao Y, Xu JR, Lin S, Wei FY, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE international conference on computer vision (ICCV), pp 1971–1980
Chen K, Wang JQ, Pang JM, Cao YH, Xiong Y, Li XX (2019) Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Cho K, Merrienboer BV, Bahdanau D (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Empirical methods in natural language processing (EMNLP), pp 1724–1734
Dai JF, Li Y, He KM, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural information processing systems (NIPS), pp 379–387
Dai JF, Qi HZ, Xiong YW, Li Y, Zhang GD, Hu H, Wei YC (2017) Deformable convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 764–773
Gehring J, Auli M, Grangier D, and Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Association for Computational Linguistics (ACL), pp 123–135
Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448
He KM, Gkioxari G, Girshick R (2017) Mask R-CNN. In: IEEE international conference on computer vision (ICCV), pp 2980–2988
He KM, Zhang XY, Ren SQ, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Hu H, Gu JY, Zhang Z, Dai JF, Wei YC (2017) Relation networks for object detection. arXiv preprint arXiv:1711.11575
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Huang ZL, Wang XG, Huang LC, Huang C, Wei YC, Liu WY (2019) Ccnet: Criss-cross attention for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 603–612
Kong T, Sun FC, Liu HP, Jiang YN, Shi JB (2019) Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European conference on computer vision (ECCV), pp 765–781
Li JN, Wei YC, Liang XD, Dong J, Xu TF (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954
Article Google Scholar
Lin TY, Dollár P, Girshick R, He KM (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Lin TY, Goyal P, Girshick R, He KM (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J (2014) Microsoft COCO: common objects in context. In: European conference on computer vision (ECCV), pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37
Pato L, Negrinho RM, Aguiar PM (2020) Seeing without looking: Contextual rescoring of object detections for AP maximization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 14598–14606
Pinheiro PH, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Neural information processing systems (NIPS), pp 1990–1998
Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Ren SQ, He KM, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS), pp 91–99
Stewart R, Andriluka M (2016) End-to-end people detection in crowded scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2325–2333
Tian Z, Shen CH, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: IEEE international conference on computer vision (ICCV), pp 9626–9635
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1653–1660
Vaswani A, Shazeer N, Parmar N, Uszkoreit J (2017) Attention is all you need. In: Neural information processing systems (NIPS), pp 5998–6008
Wang XL, Girshick R, Gupta A, He KM (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Xu H, Jiang CH, Liang XD, Lin L, Li ZG (2019) Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6419–6428
Yang Z, Liu SH, Hu H, Wang LW, Lin S (2019) Reppoints: Point set representation for object detection. In: IEEE international conference on computer vision (ICCV), pp 9656–9665
Zhou XY, Wang DQ, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Zhou XY, Zhuo JC, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859
Zhu CC, He YH, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849
Ke W, Zhang TL, Huang ZY, Ye QX, Liu ZJ, Huang D (2020) Multiple anchor learning for visual object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 10203–10212
Shao MW, Zhang GZ, Zuo WM, Meng DY (2021) Target attack on biomedical image segmentation model based on multi-scale gradients. Inf Sci 554:33–46
Article MathSciNet Google Scholar
Li YH, Shao MW, Fan BB, Zhang W (2021) Multi-scale global context feature pyramid network for object detector. Signal Image Video Pro 1-9
Yang Y, Zhuang YT, Pan YH (2021) Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Front Inf Technol Electr Eng 22(12):1551–1684
Article Google Scholar

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by National Key Research and development Program of China (2021YFA1000102), and in part by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245).

Author information

Authors and Affiliations

School of Computer Science and Technology, China University of Petroleum (East China), Qingdao, 266000, China
Bingbing Fan, Mingwen Shao, Yunhao Li & Cunhe Li

Authors

Bingbing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Mingwen Shao
View author publications
You can also search for this author in PubMed Google Scholar
Yunhao Li
View author publications
You can also search for this author in PubMed Google Scholar
Cunhe Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mingwen Shao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, B., Shao, M., Li, Y. et al. Global contextual attention for pure regression object detection. Int. J. Mach. Learn. & Cyber. 13, 2189–2197 (2022). https://doi.org/10.1007/s13042-022-01514-w

Download citation

Received: 17 May 2021
Accepted: 18 January 2022
Published: 01 March 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s13042-022-01514-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global contextual attention for pure regression object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-branch Bounding Box Regression for Object Detection

Global context aware RCNN for object detection

Multi-scale global context feature pyramid network for object detector

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Global contextual attention for pure regression object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-branch Bounding Box Regression for Object Detection

Global context aware RCNN for object detection

Multi-scale global context feature pyramid network for object detector

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation