research-article

Improving small object detection via context-aware and feature-enhanced plug-and-play modules

Authors:

Xiaolong Zheng,

Xiangming Zhou,

Lihuan ShaoAuthors Info & Claims

Journal of Real-Time Image Processing, Volume 21, Issue 2

https://doi.org/10.1007/s11554-024-01426-8

Published: 01 March 2024 Publication History

Abstract

Detecting small objects is a challenging task in computer vision due to the objects only occupying a limited number of pixels and having blurred contours. These factors result in minimal discriminative features being available to effectively model the objects. In this paper, we propose three lightweight plug-and-play modules that can be seamlessly integrated into object detection algorithms, particularly those in the YOLO series, to improve the accuracy of detecting small objects. The Spatially Enhanced Convolutional Block Attention Module (SE-CBAM) is integrated into the feature extraction layer of the network to enhance the feature extraction capability of neural networks. Additionally, a Contextual Information Pooling Enhancement Module (CIE-Pool) is included at the multi-scale feature fusion stage to extract and improve object background information, which enhances the recognition rate of small objects. To improve the detection of small objects, a new layer is added to the detection head, which incorporates the shallow feature map obtained from the feature extraction network after Adaptive Feature Processing (AFP), thereby obtaining more and richer information about small objects. The efficacy of the algorithm has been evaluated on the VisDrone2021 and AI-TOD datasets. The experimental results demonstrate that the method proposed in this paper greatly improves the detection accuracy of small objects while maintaining real-time capabilities. Furthermore, it maintains high accuracy and speed even when dealing with complex background conditions and detecting small objects with high blur.

References

[1]

Jiao L, Zhang F, Liu F, Yang S, Li L, Feng Z, and Qu R A survey of deep learning-based object detection IEEE Access 2019 7 128837-128868

[2]

Zou, Z., Chen, K., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: A survey (2023) arXiv:1905.05055 [cs.CV]

[3]

Cheng G, Yuan X, Yao X, Yan K, Zeng Q, Xie X, and Han J Towards large-scale small object detection: survey and benchmarks IEEE Trans. Pattern Anal. Mach. Intell. 2023

Digital Library

[4]

Chen G, Wang H, Chen K, Li Z, Song Z, Liu Y, Chen W, and Knoll A A survey of the four pillars for small object detection: multiscale representation, contextual information, super-resolution, and region proposal IEEE Trans. Syst., Man., Cybern.: Syst. 2022 52 2 936-953

[5]

Cha Y, Choi W, Suh G, Mahmoudkhani S, and Büyüköztürk O Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types Comput. Aided Civ. Infrastruct. Eng. 2018

Digital Library

[6]

Arnold E, Al-Jarrah OY, Dianati M, Fallah S, Oxtoby D, and Mouzakitis A A survey on 3d object detection methods for autonomous driving applications IEEE Trans. Intell. Transp. Syst. 2019 20 10 3782-3795

[7]

Wang T, Chen Y, Qiao M, and Snoussi H A fast and robust convolutional neural network-based defect detection model in product quality control Int. J. Adv. Manuf. Technol. 2018 94 3465-3471

[8]

Zhu P, Wen L, Du D, Bian X, Fan H, Hu Q, and Ling H Detection and tracking meet drones challenge IEEE Trans. Pattern Anal. Mach. Intell. 2021

[9]

Wang, J., Yang, W., Guo, H., Zhang, R., Xia, G.-S.: Tiny object detection in aerial images. In: 25th International Conference on Pattern Recognition (ICPR), pp. 3791–3798 (2021).

[10]

Uijlings JRR, Sande KEA, Gevers T, and Smeulders AWM Selective search for object recognition Int. J. Comput. Vis. 2013 104 154-171

Digital Library

[11]

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014).

Digital Library

[12]

He K, Zhang X, Ren S, and Sun J Spatial pyramid pooling in deep convolutional networks for visual recognition IEEE Trans. Pattern Anal. Mach. Intell. 2015 37 9 1904-1916

Digital Library

[13]

Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015).

Digital Library

[14]

Ren S, He K, Girshick R, and Sun J Faster R-CNN: towards real-time object detection with region proposal networks IEEE Trans. Pattern Anal. Mach. Intell. 2017 39 6 1137-1149

Digital Library

[15]

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016).

[16]

Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017).

[17]

Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. CoRR abs/1804.02767 (2018) 1804.02767

[18]

Bochkovskiy, A., Wang, C., Liao, H.M.: YOLOv4: Optimal speed and accuracy of object detection. CoRR abs/2004.10934 (2020) 2004.10934

[19]

Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017).

[20]

Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: IEEE International Conference on Computer Vision (ICCV), pp. 8439–8448 (2019).

[21]

Jocher, G.: YOLOv5 by Ultralytics. . https://github.com/ultralytics/yolov5

[22]

Jocher, G., Chaurasia, A., Qiu, J.: YOLO by Ultralytics. https://github.com/ultralytics/ultralytics

[23]

Ghodrati, A., Diba, A., Pedersoli, M., Tuytelaars, T., Van Gool, L.: Deepproposal: Hunting objects by cascading deep convolutional layers. In: IEEE International Conference on Computer Vision (ICCV), pp. 2578–2586 (2015).

Digital Library

[24]

Cai, Z., Vasconcelos, N.: Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6154–6162 (2018).

[25]

Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4203–4212 (2018).

[26]

Etten, A.V.: You Only Look Twice: Rapid multi-scale object detection in satellite imagery. CoRR abs/1805.09512 (2018) 1805.09512

[27]

Bell, S., Zitnick, C.L., Bala, K., Girshick, R.: Inside-Outside Net: Detecting objects in context with skip pooling and recurrent neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2874–2883 (2016).

[28]

Yuan Y, Xiong Z, and Wang Q VSSA-NET: vertical spatial sequence attention network for traffic sign detection IEEE Trans. Image Process. 2019 28 7 3423-3434

Digital Library

[29]

Müller, J., Dietmayer, K.: Detecting traffic lights by single shot detection. In: 21st International Conference on Intelligent Transportation Systems (ITSC), pp. 266–273 (2018).

Digital Library

[30]

Yan B, Li J, Yang Z, Zhang X, and Hao X AIE-YOLO: auxiliary information enhanced YOLO for small object detection Sensors 2022 22 21 8221

[31]

Wang M, Yang W, Wang L, Chen D, Wei F, KeZiErBieKe H, and Liao Y FE-YOLOv5: Feature enhancement network based on YOLOv5 for small object detection J. Vis. Commun. Image Represent. 2023 90

Digital Library

[32]

Hu, J., Shen, L., Sun, G.: squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018).

[33]

Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: Efficient channel attention for deep convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531–11539 (2020).

[34]

Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

[35]

Yang R, Li W, Shang X, Zhu D, and Man X KPE-YOLOv5: an improved small target detection algorithm based on YOLOv5 Electronics 2023 12 4 817

[36]

Zhou W, Cai C, Zheng L, Li C, and Zeng D ASSD-YOLO: a small object detection method based on improved YOLOv7 for airport surface surveillance Multimed. Tools Appl. 2023

[37]

Lim, J.-S., Astrid, M., Yoon, H.-J., Lee, S.-I.: Small object detection using context and attention. In: International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp. 181–186 (2021).

[38]

Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Neural Information Processing Systems (NeurIPS), vol. 29 (2016). https://proceedings.neurips.cc/paper_files/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf

[39]

Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: Keypoint triplets for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 6568–6577 (2019).

[40]

Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3500–3509 (2021).

[41]

Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 2999–3007 (2017).

[42]

Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

[43]

Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7464–7475 (2023)

Recommendations

Addressing scale imbalance for small object detection with dense detector
Abstract
There are severe challenges on small object detection when using general object detector, especially scale imbalance on samples and features. Anchor-based detector performs poorly on small object detection because IoUs are too low to ...
UEFPN: Unified and Enhanced Feature Pyramid Networks for Small Object Detection
Object detection models based on feature pyramid networks have made significant progress in general object detection. However, small object detection is still a challenge for the existing models. In this paper, we think that two factors in the existing ...
An Anchor-free Small Object Detection Algorithm Based On Feature Enhancement And Feature Fusion
ICCAI '23: Proceedings of the 2023 9th International Conference on Computing and Artificial Intelligence

Small object detection has the problem of not being able to obtain enough semantic information and rich detail information at the same time, and it is prone to missed detection and false detection. Based on this, we propose an anchor-free small object ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Real-Time Image Processing

Journal of Real-Time Image Processing Volume 21, Issue 2

Apr 2024

529 pages

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 March 2024

Accepted: 23 January 2024

Received: 17 October 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents