Abstract
Most object detection frameworks rely on rectangular bounding boxes and recognizing object instances individually. However, the bounding box provides only a coarse localization of objects and the context information between objects is not fully utilized, which result in a degradation of classification performance. In this paper, combining a lightweight contextual attention module with the representation of pure regression points, we present a novel context-based pure regression object detector. Moreover, a threshold filter mask module is designed to speed up the detector by removing a few insignificant points and keeping meaningful positions. Nonetheless, both of them do not require handcrafted clustering or post-processing steps and are easy to embed in networks. The proposed contextual attention module and threshold filter mask not only improve detection performance, but also promote training speed. We show through experiments that the proposed context-based pure regression detector can improve the representation of the regression points method about 1.5–1.8 AP on the COCO test-dev detection benchmark.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Cai ZW, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162
Cao Y, Xu JR, Lin S, Wei FY, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE international conference on computer vision (ICCV), pp 1971–1980
Chen K, Wang JQ, Pang JM, Cao YH, Xiong Y, Li XX (2019) Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155
Cho K, Merrienboer BV, Bahdanau D (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Empirical methods in natural language processing (EMNLP), pp 1724–1734
Dai JF, Li Y, He KM, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural information processing systems (NIPS), pp 379–387
Dai JF, Qi HZ, Xiong YW, Li Y, Zhang GD, Hu H, Wei YC (2017) Deformable convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 764–773
Gehring J, Auli M, Grangier D, and Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Association for Computational Linguistics (ACL), pp 123–135
Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448
He KM, Gkioxari G, Girshick R (2017) Mask R-CNN. In: IEEE international conference on computer vision (ICCV), pp 2980–2988
He KM, Zhang XY, Ren SQ, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Hu H, Gu JY, Zhang Z, Dai JF, Wei YC (2017) Relation networks for object detection. arXiv preprint arXiv:1711.11575
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141
Huang ZL, Wang XG, Huang LC, Huang C, Wei YC, Liu WY (2019) Ccnet: Criss-cross attention for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 603–612
Kong T, Sun FC, Liu HP, Jiang YN, Shi JB (2019) Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European conference on computer vision (ECCV), pp 765–781
Li JN, Wei YC, Liang XD, Dong J, Xu TF (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954
Lin TY, Dollár P, Girshick R, He KM (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Lin TY, Goyal P, Girshick R, He KM (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327
Lin TY, Maire M, Belongie S, Hays J (2014) Microsoft COCO: common objects in context. In: European conference on computer vision (ECCV), pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37
Pato L, Negrinho RM, Aguiar PM (2020) Seeing without looking: Contextual rescoring of object detections for AP maximization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 14598–14606
Pinheiro PH, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Neural information processing systems (NIPS), pp 1990–1998
Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Ren SQ, He KM, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS), pp 91–99
Stewart R, Andriluka M (2016) End-to-end people detection in crowded scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2325–2333
Tian Z, Shen CH, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: IEEE international conference on computer vision (ICCV), pp 9626–9635
Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1653–1660
Vaswani A, Shazeer N, Parmar N, Uszkoreit J (2017) Attention is all you need. In: Neural information processing systems (NIPS), pp 5998–6008
Wang XL, Girshick R, Gupta A, He KM (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803
Xu H, Jiang CH, Liang XD, Lin L, Li ZG (2019) Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6419–6428
Yang Z, Liu SH, Hu H, Wang LW, Lin S (2019) Reppoints: Point set representation for object detection. In: IEEE international conference on computer vision (ICCV), pp 9656–9665
Zhou XY, Wang DQ, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Zhou XY, Zhuo JC, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859
Zhu CC, He YH, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849
Ke W, Zhang TL, Huang ZY, Ye QX, Liu ZJ, Huang D (2020) Multiple anchor learning for visual object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 10203–10212
Shao MW, Zhang GZ, Zuo WM, Meng DY (2021) Target attack on biomedical image segmentation model based on multi-scale gradients. Inf Sci 554:33–46
Li YH, Shao MW, Fan BB, Zhang W (2021) Multi-scale global context feature pyramid network for object detector. Signal Image Video Pro 1-9
Yang Y, Zhuang YT, Pan YH (2021) Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Front Inf Technol Electr Eng 22(12):1551–1684
Acknowledgements
The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by National Key Research and development Program of China (2021YFA1000102), and in part by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Fan, B., Shao, M., Li, Y. et al. Global contextual attention for pure regression object detection. Int. J. Mach. Learn. & Cyber. 13, 2189–2197 (2022). https://doi.org/10.1007/s13042-022-01514-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01514-w